scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Computer Vision in 2001"


Journal ArticleDOI
TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.
Abstract: Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can be easily extended to include new algorithms. We have also produced several new multiframe stereo data sets with ground truth, and are making both the code and data sets available on the Web.

7,458 citations


Journal ArticleDOI
TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Abstract: In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

6,882 citations


Journal ArticleDOI
TL;DR: A unified model to construct a vocabulary of prototype tiny surface patches with associated local geometric and photometric properties, represented as a set of linear Gaussian derivative filter outputs, under different lighting and viewing conditions is provided.
Abstract: We study the recognition of surfaces made from different materials such as concrete, rug, marble, or leather on the basis of their textural appearance. Such natural textures arise from spatial variation of two surface attributes: (1) reflectance and (2) surface normal. In this paper, we provide a unified model to address both these aspects of natural texture. The main idea is to construct a vocabulary of prototype tiny surface patches with associated local geometric and photometric properties. We call these 3D textons. Examples might be ridges, grooves, spots or stripes or combinations thereof. Associated with each texton is an appearance vector, which characterizes the local irradiance distribution, represented as a set of linear Gaussian derivative filter outputs, under different lighting and viewing conditions. Given a large collection of images of different materials, a clustering approach is used to acquire a small (on the order of 100) 3D texton vocabulary. Given a few (1 to 4) images of any material, it can be characterized using these textons. We demonstrate the application of this representation for recognition of the material viewed under novel lighting and viewing conditions. We also illustrate how the 3D texton model can be used to predict the appearance of materials under novel conditions.

1,762 citations


Journal ArticleDOI
TL;DR: A multiscale algorithm for the selection of salient regions of an image is introduced and its application to matching type problems such as tracking, object recognition and image retrieval is demonstrated.
Abstract: Many computer vision problems can be considered to consist of two main tasks: the extraction of image content descriptions and their subsequent matching. The appropriate choice of type and level of description is of course task dependent, yet it is generally accepted that the low-level or so called early vision layers in the Human Visual System are context independent. This paper concentrates on the use of low-level approaches for solving computer vision problems and discusses three inter-related aspects of this: saliencys scale selection and content description. In contrast to many previous approaches which separate these tasks, we argue that these three aspects are intrinsically related. Based on this observation, a multiscale algorithm for the selection of salient regions of an image is introduced and its application to matching type problems such as tracking, object recognition and image retrieval is demonstrated.

1,317 citations


Journal ArticleDOI
TL;DR: This paper provides an algorithm for partitioning grayscale images into disjoint regions of coherent brightness and texture, and introduces a gating operator based on the texturedness of the neighborhood at a pixel to facilitate cue combination.
Abstract: This paper provides an algorithm for partitioning grayscale images into disjoint regions of coherent brightness and texture. Natural images contain both textured and untextured regions, so the cues of contour and texture differences are exploited simultaneously. Contours are treated in the intervening contour framework, while texture is analyzed using textons. Each of these cues has a domain of applicability, so to facilitate cue combination we introduce a gating operator based on the texturedness of the neighborhood at a pixel. Having obtained a local measure of how likely two nearby pixels are to belong to the same region, we use the spectral graph theoretic framework of normalized cuts to find partitions of the image into regions of coherent texture and brightness. Experimental results on a wide range of images are shown.

1,253 citations


Journal ArticleDOI
TL;DR: An accurate and robust face recognition system was developed and tested that exploits the feature extraction capabilities of the discrete cosine transform and invokes certain normalization techniques that increase its robustness to variations in facial geometry and illumination.
Abstract: An accurate and robust face recognition system was developed and tested. This system exploits the feature extraction capabilities of the discrete cosine transform (DCT) and invokes certain normalization techniques that increase its robustness to variations in facial geometry and illumination. The method was tested on a variety of available face databases, including one collected at McGill University. The system was shown to perform very well when compared to other approaches.

496 citations


Journal ArticleDOI
TL;DR: The current algorithms for recognizing persons by their iris patterns are reviewed and the results of 2.3 million comparisons among eye images acquired in trials in Britain, the USA, and Japan are presented, and aspects of the process still in need of improvement are discussed.
Abstract: Algorithms first described in 1993 for recognizing persons by their iris patterns have now been tested in several public field trials, producing no false matches in several million comparison tests. The underlying recognition principle is the failure of a test of statistical independence on texture phase structure as encoded by multi-scale quadrature wavelets. The combinatorial complexity of this phase information across different persons spans about 244 degrees of freedom and generates a discrimination entropy of about 3.2 bits/mm2 over the iris, enabling real-time decisions about personal identity with extremely high confidence. This paper reviews the current algorithms and presents the results of 2.3 million comparisons among eye images acquired in trials in Britain, the USA, and Japan, and it discusses aspects of the process still in need of improvement.

463 citations


Journal ArticleDOI
TL;DR: It is shown that a central catadioptric projection is equivalent to a two-step mapping via the sphere, and it is proved that for each catadi optric projection there exists a dual catadiOptric projection based on the duality between points and line images (conics).
Abstract: Catadioptric sensors are devices which utilize mirrors and lenses to form a projection onto the image plane of a camera. Central catadioptric sensors are the class of these devices having a single effective viewpoint. In this paper, we propose a unifying model for the projective geometry induced by these devices and we study its properties as well as its practical implications. We show that a central catadioptric projection is equivalent to a two-step mapping via the sphere. The second step is equivalent to a stereographic projection in the case of parabolic mirrors. Conventional lens-based perspective cameras are also central catadioptric devices with a virtual planar mirror and are, thus, covered by the unifying model. We prove that for each catadioptric projection there exists a dual catadioptric projection based on the duality between points and line images (conics). It turns out that planar and parabolic mirrors build a dual catadioptric projection pair. As a practical example we describe a procedure to estimate focal length and image center from a single view of lines in arbitrary position for a parabolic catadioptric system.

434 citations


Journal ArticleDOI
TL;DR: Left-invariant metrics are defined on the product G × I thus allowing the generation of transformations of the background geometry as well as the image values, and structural generation in which image values are changed supporting notions such as tissue creation in carrying one image to another.
Abstract: This paper constructs metrics on the space of images {\cal I} defined as orbits under group actions {\cal G}. The groups studied include the finite dimensional matrix groups and their products, as well as the infinite dimensional diffeomorphisms examined in Trouve (1999, Quaterly of Applied Math.) and Dupuis et al. (1998). Quaterly of Applied Math.). Left-invariant metrics are defined on the product {\cal G} \times {\cal I} thus allowing the generation of transformations of the background geometry as well as the image values. Examples of the application of such metrics are presented for rigid object matching with and without signature variation, curves and volume matching, and structural generation in which image values are changed supporting notions such as tissue creation in carrying one image to another.

379 citations


Journal ArticleDOI
TL;DR: This taxonomy provides a unifying framework for data-driven and flow-driven, isotropic and anisotropic, as well as spatial and spatio-temporal regularizers, and proves that all these methods are well-posed.
Abstract: Many differential methods for the recovery of the optic flow field from an image sequence can be expressed in terms of a variational problem where the optic flow minimizes some energy. Typically, these energy functionals consist of two terms: a data term, which requires e.g. that a brightness constancy assumption holds, and a regularizer that encourages global or piecewise smoothness of the flow field. In this paper we present a systematic classification of rotation invariant convex regularizers by exploring their connection to diffusion filters for multichannel images. This taxonomy provides a unifying framework for data-driven and flow-driven, isotropic and anisotropic, as well as spatial and spatio-temporal regularizers. While some of these techniques are classic methods from the literature, others are derived here for the first time. We prove that all these methods are well-posed: they posses a unique solution that depends in a continuous way on the initial data. An interesting structural relation between isotropic and anisotropic flow-driven regularizers is identified, and a design criterion is proposed for constructing anisotropic flow-driven regularizers in a simple and direct way from isotropic ones. Its use is illustrated by several examples.

343 citations


Journal ArticleDOI
TL;DR: The approach is sequential testing which is coarse-to-fine in both in the exploration of poses and the representation of objects, and the spatial distribution of processing is highly skewed and detection is rapid, but at the expense of (isolated) false alarms which could be eliminated with localized, more intensive, processing.
Abstract: We study visual selection: Detect and roughly localize all instances of a generic object class, such as a face, in a greyscale scene, measuring performance in terms of computation and false alarms. Our approach is sequential testing which is coarse-to-fine in both in the exploration of poses and the representation of objects. All the tests are binary and indicate the presence or absence of loose spatial arrangements of oriented edge fragments. Starting from training examples, we recursively find larger and larger arrangements which are “decomposable,” which implies the probability of an arrangement appearing on an object decays slowly with its size. Detection means finding a sufficient number of arrangements of each size along a decreasing sequence of pose cells. At the beginning, the tests are simple and universal, accommodating many poses simultaneously, but the false alarm rate is relatively high. Eventually, the tests are more discriminating, but also more complex and dedicated to specific poses. As a result, the spatial distribution of processing is highly skewed and detection is rapid, but at the expense of (isolated) false alarms which, presumably, could be eliminated with localized, more intensive, processing.

Journal ArticleDOI
TL;DR: An efficient implementation of correlation based disparity calculation with high speed and reasonable quality that can be used in a wide range of applications or to provide an initial solution for more sophisticated methods is presented.
Abstract: This paper presents an efficient implementation for correlation based stereo. Research in this area can roughly be divided in two classes: improving accuracy regardless of computing time and scene reconstruction in real-time. Algorithms achieving video frame rates must have strong limitations in image size and disparity search range, whereas high quality results often need several minutes per image pair. This paper tries to fill the gap, it provides instructions how to implement correlation based disparity calculation with high speed and reasonable quality that can be used in a wide range of applications or to provide an initial solution for more sophisticated methods. Left-right consistency checking and uniqueness validation are used to eliminate false matches. Optionally, a fast median filter can be applied to the results to further remove outliers. Source code will be made publicly available as contribution to the Open Source Computer Vision Library, further acceleration with SIMD instructions is planned for the near future.

Journal ArticleDOI
TL;DR: This work shows how to find people by finding candidate body segments, and then constructing assemblies of segments that are consistent with the constraints on the appearance of a person that result from kinematic properties, using an efficient projection algorithm for one popular classifier.
Abstract: Finding people in pictures presents a particularly difficult object recognition problem. We show how to find people by finding candidate body segments, and then constructing assemblies of segments that are consistent with the constraints on the appearance of a person that result from kinematic properties. Since a reasonable model of a person requires at least nine segments, it is not possible to inspect every group, due to the huge combinatorial complexity. We propose two approaches to this problem. In one, the search can be pruned by using projected versions of a classifier that accepts groups corresponding to people. We describe an efficient projection algorithm for one popular classifier, and demonstrate that our approach can be used to determine whether images of real scenes contain people. The second approach employs a probabilistic framework, so that we can draw samples of assemblies, with probabilities proportional to their likelihood, which allows to draw human-like assemblies more often than the non-person ones. The main performance problem is in segmentation of images, but the overall results of both approaches on real images of people are encouraging.

Journal ArticleDOI
TL;DR: A scale-invariant version of Matheron's “dead leaves model” for the statistics of natural images that takes occlusions into account and resembles the image formation process by randomly adding independent elementary shapes, such as disks, in layers.
Abstract: We develop a scale-invariant version of Matheron's “dead leaves model” for the statistics of natural images. The model takes occlusions into account and resembles the image formation process by randomly adding independent elementary shapes, such as disks, in layers. We compare the empirical statistics of two large databases of natural images with the statistics of the occlusion model, and find an excellent qualitative, and good quantitative agreement. At this point, this is the only image model which comes close to duplicating the simplest, elementary statistics of natural images—such as, the scale invariance property of marginal distributions of filter responses, the full co-occurrence statistics of two pixels, and the joint statistics of pairs of Haar wavelet responses.

Journal ArticleDOI
TL;DR: A Riemannian Newton algorithm is proposed to solve the motion and structure recovery problem, making use of the natural differential geometric structure of the essential manifold, the so-called “essential manifold.”
Abstract: Prevailing efforts to study the standard formulation of motion and structure recovery have recently been focused on issues of sensitivity and robustness of existing techniques. While many cogent observations have been made and verified experimentally, many statements do not hold in general settings and make a comparison of existing techniques difficult. With an ultimate goal of clarifying these issues, we study the main aspects of motion and structure recovery: the choice of objective function, optimization techniques and sensitivity and robustness issues in the presence of noise. We clearly reveal the relationship among different objective functions, such as “(normalized) epipolar constraints,” “reprojection error” or “triangulation,” all of which can be unified in a new “optimal triangulation” procedure. Regardless of various choices of the objective function, the optimization problems all inherit the same unknown parameter space, the so-called “essential manifold.” Based on recent developments of optimization techniques on Riemannian manifolds, in particular on Stiefel or Grassmann manifolds, we propose a Riemannian Newton algorithm to solve the motion and structure recovery problem, making use of the natural differential geometric structure of the essential manifold. We provide a clear account of sensitivity and robustness of the proposed linear and nonlinear optimization techniques and study the analytical and practical equivalence of different objective functions. The geometric characterization of critical points and the simulation results clarify the difference between the effect of bas-relief ambiguity, rotation and translation confounding and other types of local minima. This leads to consistent interpretations of simulation results over a large range of signal-to-noise ratio and variety of configurations.

Journal ArticleDOI
TL;DR: This paper shows that a combination of physical and statistical knowledge leads to a surprisingly simple and powerful colour constancy algorithm, one that also works well for images of natural scenes and is significantly better than previous dichromatic algorithms.
Abstract: Statistics-based colour constancy algorithms work well as long as there are many colours in a scene, they fail however when the encountering scenes comprise few surfaces. In contrast, physics-based algorithms, based on an understanding of physical processes such as highlights and interreflections, are theoretically able to solve for colour constancy even when there are as few as two surfaces in a scene. Unfortunately, physics-based theories rarely work outside the lab. In this paper we show that a combination of physical and statistical knowledge leads to a surprisingly simple and powerful colour constancy algorithm, one that also works well for images of natural scenes. From a physical standpoint we observe that given the dichromatic model of image formation the colour signals coming from a single uniformly-coloured surface are mapped to a line in chromaticity space. One component of the line is defined by the colour of the illuminant (i.e. specular highlights) and the other is due to its matte, or Lambertian, reflectance. We then make the statistical observation that the chromaticities of common light sources all follow closely the Planckian locus of black-body radiators. It follows that by intersecting the dichromatic line with the Planckian locus we can estimate the chromaticity of the illumination. We can solve for colour constancy even when there is a single surface in the scene. When there are many surfaces in a scene the individual estimates from each surface are averaged together to improve accuracy. In a set of experiments on real images we show our approach delivers very good colour constancy. Moreover, performance is significantly better than previous dichromatic algorithms.

Journal ArticleDOI
TL;DR: A symmetric shape-from-shading (SFS) approach to recover both shape and albedo for symmetric objects and the introduction of the self-ratio image facilitates the direct use of symmetry cue is presented.
Abstract: In this paper, we present a symmetric shape-from-shading (SFS) approach to recover both shape and albedo for symmetric objects. Lambertian surfaces with unknown varying albedo and orthographic projections are assumed. In our formulation of symmetric SFS, we have two image irradiance equations. One is the standard equation used in SFS, and the other is a self-ratio image irradiance equation. This new image irradiance equation relates the self-ratio image which is defined as the ratio of two-halves of the input image to light source and surface shape. The introduction of the self-ratio image facilitates the direct use of symmetry cue. Based on the self-ratio image, a new model-based symmetric source-from-shading algorithm is also presented. We then propose symmetric SFS algorithms to recover both shape and albedo from a single image and present experimental results. The new symmetric SFS scheme has one important property: the existence of a unique (global) solution which consists of unique (local) solutions at each point simultaneously obtained using the intensity information at that point and the surrounding local region and the assumption of a \cal {C}^2 surface. Proofs for the existence of a unique solution in the cases of unknown constant and non-constant albedos are provided.

Journal ArticleDOI
TL;DR: In this paper, the authors describe the theory and practice of self-calibration of cameras which are fixed in location and may freely rotate while changing their internal parameters by zooming.
Abstract: In this paper we describe the theory and practice of self-calibration of cameras which are fixed in location and may freely rotate while changing their internal parameters by zooming. The basis of ...

Journal ArticleDOI
TL;DR: It is shown that the relative orientation of a catadioptric stereo rig is restricted to the class of planar motions thus reducing the number of external calibration parameters from 6 to 5, and how focal length can be recovered from a single catadi optric image solely from a set of stereo correspondences.
Abstract: By using mirror reflections of a scene, stereo images can be captured with a single camera (catadioptric stereo). In addition to simplifying data acquisition single camera stereo provides both geometric and radiometric advantages over traditional two camera stereo. In this paper, we discuss the geometry and calibration of catadioptric stereo with two planar mirrors. In particular, we will show that the relative orientation of a catadioptric stereo rig is restricted to the class of planar motions thus reducing the number of external calibration parameters from 6 to 5. Next we derive the epipolar geometry for catadioptric stereo and show that it has 6 degrees of freedom rather than 7 for traditional stereo. Furthermore, we show how focal length can be recovered from a single catadioptric image solely from a set of stereo correspondences. To test the accuracy of the calibration we present a comparison to Tsai camera calibration and we measure the quality of Euclidean reconstruction. In addition, we will describe a real-time system which demonstrates the viability of stereo with mirrors as an alternative to traditional two camera stereo.

Journal ArticleDOI
TL;DR: The proposed method is very effective at recovering both sharp object edges at their correct locations and smooth object surfaces and presents a sound analysis of the boundary overreach which has not been clearly explained in the past.
Abstract: In area-based stereo matching, there is a problem called "boundary overreach", i.e. the recovered object boundary turns out to be wrongly located away from the real one. This is especially harmful to segmenting objects using depth information. A few approaches have been proposed to solve this problem. However, these techniques tend to degrade on smooth surfaces. That is, there seems to be a trade-off problem between recovering precise object edges and obtaining smooth surfaces. In this paper, we propose a new simple method to solve this problem. Using multiple stereo pairs and multiple windowing, our method detects the region where the boundary overreach is likely to occur (let us call it "BO region") and adopts appropriate methods for the BO and non-BO regions. Although the proposed method is quite simple, the experimental results have shown that it is very effective at recovering both sharp object edges at their correct locations and smooth object surfaces.

Journal ArticleDOI
TL;DR: This work introduces the oblique stereo geometry, which has non-intersecting double ruled epipolar surfaces, and introduces linear oblique cameras as those which can be generated by a linear mapping from points in space to camera rays and characterize those collineations which generate them.
Abstract: Mosaics acquired by pushbroom cameras, stereo panoramas, omnivergent mosaics, and spherical mosaics can be viewed as images taken by non-central cameras, i.e. cameras that project along rays that do not all intersect at one point. It has been shown that in order to reduce the correspondence search in mosaics to a one-parametric search along curves, the rays of the non-central cameras have to lie in double ruled epipolar surfaces. In this work, we introduce the oblique stereo geometry, which has nonintersecting double ruled epipolar surfaces. We analyze the configurations of mutually oblique rays that see every point in space. We call such configurations oblique cameras. We argue that oblique cameras are important because they are the most non-central cameras among all cameras. We show that oblique cameras, and the corresponding oblique stereo geometry, exist and give an example of a physically realizable oblique stereo geometry.

Journal ArticleDOI
TL;DR: This paper proposes a dense 3-D reconstruction method that first estimates extrinsic camera parameters of a hand-held video camera, and then reconstructs a dense3-D model of a scene.
Abstract: Three-dimensional (3-D) models of outdoor scenes are widely used for object recognition, navigation, mixed reality, and so on. Because such models are often made manually with high costs, automatic and dense 3-D reconstruction is widely investigated. In related work, a dense 3-D model is generated by using a stereo method. However these methods cannot use several hundreds images together for dense depth estimation because it is difficult to accurately calibrate a large number of cameras. In this paper we propose a dense 3-D reconstruction method that first estimates extrinsic camera parameters of a hand-held video camera, and then reconstructs a dense 3-D model of a scene. We can acquire a model of the scene accurately by using several hundreds input images.

Journal ArticleDOI
TL;DR: The method presented incorporates the available image information in a unified framework and automatically reconstructs accurate spatio-temporal representations of complex non-rigidly moving objects.
Abstract: We present a method to automatically extract spatiotemporal descriptions of moving objects from synchronized and calibrated multi-view sequences. The object is modeled by a time-varying multi-resolution subdivision surface that is fitted to the image data using spatio-temporal multiview stereo information, as well as contour constraints. The stereo data is utilized by computing the normalized correlation between corresponding spatio-temporal image trajectories of surface patches, while the contour information is determined using incremental segmentation of the viewing volume into object and background. We globally optimize the shape of the spatio-temporal surface in a coarse-to-fine manner using the multi-resolution structure of the subdivision mesh. The method presented incorporates the available image information in a unified framework and automatically reconstructs accurate spatio-temporal representations of complex non-rigidly moving objects.

Journal ArticleDOI
TL;DR: A Bayesian approach to intensity-based object localisation is presented that employs a learned probabilistic model of image filter-bank output, applied via Monte Carlo methods, to escape the inefficiency of exhaustive search.
Abstract: A Bayesian approach to intensity-based object localisation is presented that employs a learned probabilistic model of image filter-bank output, applied via Monte Carlo methods, to escape the inefficiency of exhaustive search. An adequate probabilistic account of image data requires intensities both in the foreground (i.e. over the object), and in the background, to be modelled. Some previous approaches to object localisation by Monte Carlo methods have used models which, we claim, do not fully address the issue of the statistical independence of image intensities. It is addressed here by applying to each image a bank of filters whose outputs are approximately statistically independent. Distributions of the responses of individual filters, over foreground and background, are learned from training data. These distributions are then used to define a joint distribution for the output of the filter bank, conditioned on object configuration, and this serves as an observation likelihood for use in probabilistic inference about localisation. The effectiveness of probabilistic object localisation in image clutter, using Bayesian Localisation, is illustrated. Because it is a Monte Carlo method, it produces not simply a single estimate of object configuration, but an entire sample from the posterior distribution for the configuration. This makes sequential inference of configuration possible. Two examples are illustrated here: coarse to fine scale inference, and propagation of configuration estimates over time, in image sequences.

Journal ArticleDOI
TL;DR: To achieve the required speed and accuracy, trinocular stereo, a matching algorithm based on the sum of modified normalized cross-correlations, and subpixel disparity interpolation is used, as well as a four-processor parallelization.
Abstract: In telepresence applications each user is immersed in a rendered 3D-world composed from representations transmitted from remote sites. The challenge is to compute dense range data at high frame rates, since participants cannot easily communicate if the processing cycle or network latencies are long. Moreover errors in new stereoscopic views of the remote 3D-world should be hardly perceptible. To achieve the required speed and accuracy, we use trinocular stereo, a matching algorithm based on the sum of modified normalized cross-correlations, and subpixel disparity interpolation. To increase speed we use Intel IPL functions in the pre-processing steps of background subtraction and image rectification as well as a four-processor parallelization. To evaluate our system we have developed a testbed which provides a set of registered dense "ground-truth" laser data and image data from multiple views.

Journal ArticleDOI
TL;DR: This paper describes a general method for solving problems where a model is extracted from, or fit to, data that draws benefits from both generate-and-test methods and those based on the Hough transform, yielding a method superior to both.
Abstract: Popular algorithms for feature matching and model extraction fall into two broad categories: generate-and-test and Hough transform variations. However, both methods suffer from problems in practical implementations. Generate-and-test methods are sensitive to noise in the data. They often fail when the generated model fit is poor due to error in the data used to generate the model position. Hough transform variations are less sensitive to noise, but implementations for complex problems suffer from large time and space requirements and from the detection of false positives. This paper describes a general method for solving problems where a model is extracted from, or fit to, data that draws benefits from both generate-and-test methods and those based on the Hough transform, yielding a method superior to both. An important component of the method is the subdivision of the problem into many subproblems. This allows efficient generate-and-test techniques to be used, including the use of randomization to limit the number of subproblems that must be examined. Each subproblem is solved using pose space analysis techniques similar to the Hough transform, which lowers the sensitivity of the method to noise. This strategy is easy to implement and results in practical algorithms that are efficient and robust. We describe case studies of the application of this method to object recognition, geometric primitive extraction, robust regression, and motion segmentation.

Journal ArticleDOI
TL;DR: A one-parameter algorithm for the extraction of line networks from images is presented, and the parameter indicates the extracted saliency level from a hierarchical graph, thereby supporting error proof interpretation of the network functionality.
Abstract: The extraction and interpretation of networks of lines from images yields important organizational information of the network under consideration. In this paper, a one-parameter algorithm for the extraction of line networks from images is presented. The parameter indicates the extracted saliency level from a hierarchical graph. Input for the algorithm is the domain specific knowledge of interconnection points. Graph morphological tools are used to extract the minimum cost graph which best segments the network. We give an extensive error analysis for the general case of line extraction. Our method is shown to be robust against gaps in lines, and against spurious vertices at lines, which we consider as the most prominent source of error in line detection. The method indicates detection confidence, thereby supporting error proof interpretation of the network functionality. The method is demonstrated to be applicable on a broad variety of line networks, including dashed lines. Hence, the proposed method yields a major step towards general line tracking algorithms.

Journal ArticleDOI
TL;DR: The notion of affine shape is extended from finite point sets to curves, introduced by Sparr, and makes it possible to reconstruct 3D-curves up to projective transformations, from a number of their 2D-projections.
Abstract: In this paper, we extend the notion of affine shape, introduced by Sparr, from finite point sets to curves. The extension makes it possible to reconstruct 3D-curves up to projective transformations, from a number of their 2D-projections. We also extend the bundle adjustment technique from point features to curves. The first step of the curve reconstruction algorithm is based on affine shape. It is independent of choice of coordinates, is robust, does not rely on any preselected parameters and works for an arbitrary number of images. In particular this means that, except for a small set of curves (e.g. a moving line), a solution is given to the aperture problem of finding point correspondences between curves. The second step takes advantage of any knowledge of measurement errors in the images. This is possible by extending the bundle adjustment technique to curves. Finally, experiments are performed on both synthetic and real data to show the performance and applicability of the algorithm.

Journal ArticleDOI
TL;DR: This work systematically developed and investigated driver assistance functions which make explicit use of the lane structure represented by lane borders and lane markings, and integrated their lane keeping assistant in some experimental cars and performed systematic experiments in real traffic situations.
Abstract: Image sequences recorded with cameras mounted in a moving vehicle provide information about the vehicle's environment which has to be analysed in order to really support the driver in actual traffic situations. One type of information is the lane structure surrounding a vehicle. Therefore, we systematically developed and investigated driver assistance functions which make explicit use of the lane structure represented by lane borders and lane markings. With increasing computing power of standard PCs it was possible to realize more complex driver assistance with general purpose hardware. Investigations with a video-based lane departure warning system and a lane change assistant for highways will be discussed in detail. We integrated our lane keeping assistant in some experimental cars and performed systematic experiments in real traffic situations which enables the experience of video-based driver assistance on a high level—the action of a system. This allows us to assess whether a driver assistance system really understands the actual traffic situation which is the basis for reliable systems accepted by the user.

Journal ArticleDOI
TL;DR: An information-based methodology for view selection that actively exploits prior knowledge about the objects to be found in a scene to implement an active recognition strategy which effectively puts prior constraints from the object database into the gaze control (planning) loop.
Abstract: This paper introduces an information-based methodology for view selection that actively exploits prior knowledge about the objects to be found in a scene. The methodology is used to implement an active recognition strategy which effectively puts prior constraints from the object database into the gaze control (planning) loop. Theoretical results are presented and discussed along with promising experimental data.