scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Computer Vision in 1997"


Journal ArticleDOI
TL;DR: In this article, a geodesic approach based on active contours evolving in time according to intrinsic geometric measures of the image is presented. But this approach is not suitable for 3D object segmentation.
Abstract: A novel scheme for the detection of object boundaries is presented. The technique is based on active contours evolving in time according to intrinsic geometric measures of the image. The evolving contours naturally split and merge, allowing the simultaneous detection of several objects and both interior and exterior boundaries. The proposed approach is based on the relation between active contours and the computation of geodesics or minimal distance curves. The minimal distance curve lays in a Riemannian space whose metric is defined by the image content. This geodesic approach for object segmentation allows to connect classical “snakes” based on energy minimization and geometric active contours based on the theory of curve evolution. Previous models of geometric active contours are improved, allowing stable boundary detection when their gradients suffer from large variations, including gaps. Formal results concerning existence, uniqueness, stability, and correctness of the evolution are presented as well. The scheme was implemented using an efficient algorithm for curve evolution. Experimental results of applying the scheme to real images including objects with holes and medical data imagery demonstrate its power. The results may be extended to 3D object segmentation as well.

4,967 citations


Journal ArticleDOI
TL;DR: This paper describes a new approach to low level image processing; in particular, edge and corner detection and structure preserving noise reduction and the resulting methods are accurate, noise resistant and fast.
Abstract: This paper describes a new approach to low level image processing; in particular, edge and corner detection and structure preserving noise reduction. Non-linear filtering is used to define which parts of the image are closely related to each individual pixel; each pixel has associated with it a local image region which is of similar brightness to that pixel. The new feature detectors are based on the minimization of this local image region, and the noise reduction method uses this region as the smoothing neighbourhood. The resulting methods are accurate, noise resistant and fast. Details of the new feature detectors and of the new noise reduction method are described, along with test results.

3,669 citations


Journal ArticleDOI
TL;DR: A new information-theoretic approach is presented for finding the pose of an object in an image that works well in domains where edge or gradient-magnitude based methods have difficulty, yet it is more robust than traditional correlation.
Abstract: A new information-theoretic approach is presented for finding the pose of an object in an image. The technique does not require information about the surface properties of the object, besides its shape, and is robust with respect to variations of illumination. In our derivation few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and may foreseeably be used in a wide variety of imaging situations. Experiments are presented that demonstrate the approach registering magnetic resonance (MR) images, aligning a complex 3D object model to real scenes including clutter and occlusion, tracking a human head in a video sequence and aligning a view-based 2D object model to real images. The method is based on a formulation of the mutual information between the model and the image. As applied here the technique is intensity-based, rather than feature-based. It works well in domains where edge or gradient-magnitude based methods have difficulty, yet it is more robust than traditional correlation. Additionally, it has an efficient implementation that is based on stochastic approximation.

3,584 citations


Journal ArticleDOI
TL;DR: A variety of robust methods for the computation of the Fundamental Matrix, the calibration-free representation of camera motion, are developed from the principal categories of robust estimators, viz. case deletion diagnostics, M-estimators and random sampling, and the theory required to apply them to non-linear orthogonal regression problems is developed.
Abstract: This paper has two goals The first is to develop a variety of robust methods for the computation of the Fundamental Matrix, the calibration-free representation of camera motion The methods are drawn from the principal categories of robust estimators, viz case deletion diagnostics, M-estimators and random sampling, and the paper develops the theory required to apply them to non-linear orthogonal regression problems Although a considerable amount of interest has focussed on the application of robust estimation in computer vision, the relative merits of the many individual methods are unknown, leaving the potential practitioner to guess at their value The second goal is therefore to compare and judge the methods Comparative tests are carried out using correspondences generated both synthetically in a statistically controlled fashion and from feature matching in real imagery In contrast with previously reported methods the goodness of fit to the synthetic observations is judged not in terms of the fit to the observations per se but in terms of fit to the ground truth A variety of error measures are examined The experiments allow a statistically satisfying and quasi-optimal method to be synthesized, which is shown to be stable with up to 50 percent outlier contamination, and may still be used if there are more than 50 percent outliers Performance bounds are established for the method, and a variety of robust methods to estimate the standard deviation of the error and covariance matrix of the parameters are examined The results of the comparison have broad applicability to vision algorithms where the input data are corrupted not only by noise but also by gross outliers

844 citations


Journal ArticleDOI
TL;DR: A new boundary detection approach for shape modeling that detects the global minimum of an active contour model’s energy between two end points and explores the relation between the maximum curvature along the resulting contour and the potential generated from the image.
Abstract: A new boundary detection approach for shape modeling is presented. It detects the global minimum of an active contour model‘s energy between two end points. Initialization is made easier and the curve is not trapped at a local minimum by spurious edges. We modify the “snake” energy by including the internal regularization term in the external potential term. Our method is based on finding a path of minimal length in a Riemannian metric. We then make use of a new efficient numerical method to find this shortest path. It is shown that the proposed energy, though based only on a potential integrated along the curve, imposes a regularization effect like snakes. We explore the relation between the maximum curvature along the resulting contour and the potential generated from the image. The method is capable to close contours, given only one point on the objects‘ boundary by using a topology-based saddle search routine. We show examples of our method applied to real aerial and medical images.

736 citations


Journal ArticleDOI
TL;DR: A new form of point representation for describing 3D free-form surfaces is proposed, which serves to describe the structural neighbourhood of a point in a more complete manner than just using the 3D coordinates of the point.
Abstract: Few systems capable of recognizing complex objects with free-form (sculptured) surfaces have been developed. The apparent lack of success is mainly due to the lack of a competent modelling scheme for representing such complex objects. In this paper, a new form of point representation for describing 3D free-form surfaces is proposed. This representation, which we call the point signature, serves to describe the structural neighbourhood of a point in a more complete manner than just using the 3D coordinates of the point. Being invariant to rotation and translation, the point signature can be used directly to hypothesize the correspondence to model points with similar signatures. Recognition is achieved by matching the signatures of data points representing the sensed surface to the signatures of data points representing the model surface. The use of point signatures is not restricted to the recognition of a single-object scene to a small library of models. Instead, it can be extended naturally to the recognition of scenes containing multiple partially-overlapping objects (which may also be juxtaposed with each other) against a large model library. No preliminary phase of segmenting the scene into the component objects is required. In searching for the appropriate candidate model, recognition need not proceed in a linear order which can become prohibitive for a large model library. For a given scene, signatures are extracted at arbitrarily spaced seed points. Each of these signatures is used to vote for models that contain points having similar signatures. Inappropriate models with low votes can be rejected while the remaining candidate models are ordered according to the votes they received. In this way, efficient verification of the hypothesized candidates can proceed by testing the most likely model first. Experiments using real data obtained from a range finder have shown fast recognition from a library of fifteen models whose complexities vary from that of simple piecewise quadric shapes to complicated face masks. Results from the recognition of both single-object and multiple-object scenes are presented.

653 citations


Journal ArticleDOI
TL;DR: A new registration algorithm based on spline representations of the displacement field which can be specialized to solve all of the problems in multiframe image analysis, including the computation of optic flow, stereo correspondence, structure from motion, and feature tracking.
Abstract: The problem of image registration subsumes a number of problems and techniques in multiframe image analysis, including the computation of optic flow (general pixel-based motion), stereo correspondence, structure from motion, and feature tracking. We present a new registration algorithm based on spline representations of the displacement field which can be specialized to solve all of the above mentioned problems. In particular, we show how to compute local flow, global (parametric) flow, rigid flow resulting from camera egomotion, and multiframe versions of the above problems. Using a spline-based description of the flow removes the need for overlapping correlation windows, and produces an explicit measure of the correlation between adjacent flow estimates. We demonstrate our algorithm on multiframe image registration and the recovery of 3D projective scene geometry. We also provide results on a number of standard motion sequences.

535 citations


Journal ArticleDOI
TL;DR: In this article, the use of local parametrized models of image motion for recovering and recognizing the non-rigid and articulated motion of human faces was explored, showing that within local regions in space and time, such models not only accurately model nonrigid facial motions but also provide a concise description of the motion in terms of a small number of parameters.
Abstract: This paper explores the use of local parametrized models of image motion for recovering and recognizing the non-rigid and articulated motion of human faces. Parametric flow models (for example affine) are popular for estimating motion in rigid scenes. We observe that within local regions in space and time, such models not only accurately model non-rigid facial motions but also provide a concise description of the motion in terms of a small number of parameters. These parameters are intuitively related to the motion of facial features during facial expressions and we show how expressions such as anger, happiness, surprise, fear, disgust, and sadness can be recognized from the local parametric motions in the presence of significant head motion. The motion tracking and expression recognition approach performed with high accuracy in extensive laboratory experiments involving 40 subjects as well as in television and movie sequences.

532 citations


Journal ArticleDOI
TL;DR: It is shown that point correspondences between three images, and the fundamental matrices computed from these point correspondence are sufficient to recover the internal orientation of the camera, the motion parameters, and to compute coherent perspective projection matrices which enable us to reconstruct 3-D structure up to a similarity.
Abstract: We address the problem of estimating three-dimensional motion, and structure from motion with an uncalibrated moving camera We show that point correspondences between three images, and the fundamental matrices computed from these point correspondences, are sufficient to recover the internal orientation of the camera (its calibration), the motion parameters, and to compute coherent perspective projection matrices which enable us to reconstruct 3-D structure up to a similarity In contrast with other methods, no calibration object with a known 3-D shape is needed, and no limitations are put upon the unknown motions to be performed or the parameters to be recovered, as long as they define a projective camera The theory of the method, which is based on the constraint that the observed points are part of a static scene, thus allowing us to link the intrinsic parameters and the fundamental matrix via the absolute conic, is first detailed Several algorithms are then presented, and their performances compared by means of extensive simulations and illustrated by several experiments with real images

419 citations


Journal ArticleDOI
TL;DR: A technique is developed for separating the specular and diffuse components of reflection from images that can handle highlights on surfaces with substantial texture, smoothly varying diffuse reflectance, and varying material properties.
Abstract: Specular reflections and interreflections produce strong highlights in brightness images. These highlights can cause vision algorithms for segmentation, shape from shading, binocular stereo, and motion estimation to produce erroneous results. A technique is developed for separating the specular and diffuse components of reflection from images. The approach is to use color and polarization information, simultaneously, to obtain constraints on the reflection components at each image point. Polarization yields local and independent estimates of the color of specular reflection. The result is a linear subspace in color space in which the local diffuse component must lie. This subspace constraint is applied to neighboring image points to determine the diffuse component. In contrast to previous separation algorithms, the proposed method can handle highlights on surfaces with substantial texture, smoothly varying diffuse reflectance, and varying material properties. The separation algorithm is applied to several complex scenes with textured objects and strong interreflections. The separation results are then used to solve three problems pertinent to visual perception; determining illumination color, estimating illumination direction, and shape recovery.

371 citations


Journal ArticleDOI
TL;DR: A new practical method is given for the self-calibration of a camera where at least three images are taken from the same point in space with different orientations of the camera and calibration is computed from an analysis of point matches between the images.
Abstract: A new practical method is given for the self-calibration of a camera. In this method, at least three images are taken from the same point in space with different orientations of the camera and calibration is computed from an analysis of point matches between the images. The method requires no knowledge of the orientations of the camera. Calibration is based on the image correspondences only. This method differs fundamentally from previous results by Maybank and Faugeras on self-calibration using the epipolar structure of image pairs. In the method of this paper, there is no epipolar structure since all images are taken from the same point in space, and so Maybank and Faugeras‘s method does not apply. Since the images are all taken from the same point in space, determination of point matches is considerably easier than for images taken with a moving camera, since problems of occlusion or change of aspect or illumination do not occur. A non-iterative calibration algorithm is given that works with any number of images. An iterative refinement method that may be used with noisy data is also described. The algorithm is implemented and validated on several sets of synthetic and real image data.

Journal ArticleDOI
TL;DR: It is shown in this paper, that the trifocal tensor is essentially identical to a set of coefficients introduced by Shashua to effect point transfer in the three view case, which means that the 13-line algorithm may be extended to allow for the computation of the Trifocal Tensor given any mixture of sufficiently many line and point correspondences.
Abstract: This paper discusses the basic role of the trifocal tensor in scene reconstruction from three views. This 3\times 3\times 3 tensor plays a role in the analysis of scenes from three views analogous to the role played by the fundamental matrix in the two-view case. In particular, the trifocal tensor may be computed by a linear algorithm from a set of 13 line correspondences in three views. It is further shown in this paper, that the trifocal tensor is essentially identical to a set of coefficients introduced by Shashua to effect point transfer in the three view case. This observation means that the 13-line algorithm may be extended to allow for the computation of the trifocal tensor given any mixture of sufficiently many line and point correspondences. From the trifocal tensor the camera matrices of the images may be computed, and the scene may be reconstructed. For unrelated uncalibrated cameras, this reconstruction will be unique up to projectivity. Thus, projective reconstruction of a set of lines and points may be carried out linearly from three views.

Journal ArticleDOI
TL;DR: A specialisation of the algorithm to recover structure and camera position modulo an affine transformation is described, together with a method to periodically update the affine coordinate frame to prevent drift over time.
Abstract: A structure from motion algorithm is described which recovers structure and camera position, modulo a projective ambiguity. Camera calibration is not required, and camera parameters such as focal length can be altered freely during motion. The structure is updated sequentially over an image sequence, in contrast to schemes which employ a batch process. A specialisation of the algorithm to recover structure and camera position modulo an affine transformation is described, together with a method to periodically update the affine coordinate frame to prevent drift over time. We describe the constraint used to obtain this specialisation. Structure is recovered from image corners detected and matched automatically and reliably in real image sequences. Results are shown for reference objects and indoor environments, and accuracy of recovered structure is fully evaluated and compared for a number of reconstruction schemes. A specific application of the work is demonstrated—affine structure is used to compute free space maps enabling navigation through unstructured environments and avoidance of obstacles. The path planning involves only affine constructions.

Journal ArticleDOI
TL;DR: The paper is motivated by empirical evidence inspired by Mooney images of faces that suggest a relatively high level of visual processing is involved in compensating for photometric sources of variability, and furthermore, that certain limitations on the admissible representations of image information may exist.
Abstract: We describe the problem of recognition under changing illumination conditions and changing viewing positions from a computational and human vision perspective. On the computational side we focus on the mathematical problems of creating an equivalence class for images of the same 3D object undergoing certain groups of transformations—mostly those due to changing illumination, and briefly discuss those due to changing viewing positions. The computational treatment culminates in proposing a simple scheme for recognizing, via alignment, an image of a familiar object taken from a novel viewing position and a novel illumination condition. On the human vision aspect, the paper is motivated by empirical evidence inspired by Mooney images of faces that suggest a relatively high level of visual processing is involved in compensating for photometric sources of variability, and furthermore, that certain limitations on the admissible representations of image information may exist. The psychophysical observations and the computational results that follow agree in several important respects, such as the same (apparent) limitations on image representations.

Journal ArticleDOI
William J. Rucklidge1
TL;DR: This paper develops a rasterised approach to the search and a number of techniques that allow it to locate quickly all transformations of the model that satisfy two quality criteria; it can also efficiently locate only the best transformation.
Abstract: The Hausdorff distance is a measure defined between two point sets, here representing a model and an image. The Hausdorff distance is reliable even when the image contains multiple objects, noise, spurious features, and occlusions. In the past, it has been used to search images for instances of a model that has been translated, or translated and scaled, by finding transformations that bring a large number of model features close to image features, and vice versa. In this paper, we apply it to the task of locating an affine transformation of a model in an image; this corresponds to determining the pose of a planar object that has undergone weak-perspective projection. We develop a rasterised approach to the search and a number of techniques that allow us to locate quickly all transformations of the model that satisfy two quality criteria; we can also efficiently locate only the best transformation. We discuss an implementation of this approach, and present some examples of its use.

Journal ArticleDOI
TL;DR: It is shown, that repetitive motion is such a strong cue, that the moving actor can be segmented, normalized spatially and temporally, and recognized by matching against a spatio-temporal template of motion features.
Abstract: The recognition of nonrigid motion, particularly that arising from human movement (and by extension from the locomotory activity of animals) has typically made use of high-level parametric models representing the various body parts (legs, arms, trunk, head etc) and their connections to each other Such model-based recognition has been successful in some cases; however, the methods are often difficult to apply to real-world scenes, and are severely limited in their generalizability The first problem arises from the difficulty of acquiring and tracking the requisite model parts, usually specific joints such as knees, elbows or ankles This generally requires some prior high-level understanding and segmentation of the scene, or initialization by a human operator The second problem, with generalization, is due to the fact that the human model is not much good for dogs or birds, and for each new type of motion, a new model must be hand-crafted In this paper, we show that the recognition of human or animal locomotion, and, in fact, any repetitive activity can be done using low-level, non-parametric representations Such an approach has the advantage that the same underlying representation is used for all examples, and no individual tailoring of models or prior scene understanding is required We show in particular, that repetitive motion is such a strong cue, that the moving actor can be segmented, normalized spatially and temporally, and recognized by matching against a spatio-temporal template of motion features We have implemented a real-time system that can recognize and classify repetitive motion activities in normal gray-scale image sequences Results on a number of real-world sequences are described

Journal ArticleDOI
TL;DR: A technique for estimating affine transforms between nearby image patches which is based on solving a system of linear constraints derived from a differential analysis and yields predictions for both computer vision algorithms and human perception of shape from texture.
Abstract: Shape from texture is best analyzed in two stages, analogous to stereopsis and structure from motion: (a) Computing the ’texture distortion‘ from the image, and (b) Interpreting the ’texture distortion‘ to infer the orientation and shape of the surface in the scene. We model the texture distortion for a given point and direction on the image plane as an affine transformation and derive the relationship between the parameters of this transformation and the shape parameters. We have developed a technique for estimating affine transforms between nearby image patches which is based on solving a system of linear constraints derived from a differential analysis. One need not explicitly identify texels or make restrictive assumptions about the nature of the texture such as isotropy. We use non-linear minimization of a least squares error criterion to recover the surface orientation (slant and tilt) and shape (principal curvatures and directions) based on the estimated affine transforms in a number of different directions. A simple linear algorithm based on singular value decomposition of the linear parts of the affine transforms provides the initial guess for the minimization procedure. Experimental results on both planar and curved surfaces under perspective projection demonstrate good estimates for both orientation and shape. A sensitivity analysis yields predictions for both computer vision algorithms and human perception of shape from texture.

Journal ArticleDOI
TL;DR: The practical contribution of the paper is the validation of the transformation estimation method in the case of 3-D medical images, which shows that an accuracy of the registration far below the size of a voxel can be achieved, and in the cases of protein substructure matching, where frame features drastically improve both selectivity and complexity.
Abstract: In this paper, we propose and analyze several methods to estimate a rigid transformation from a set of 3-D matched points or matched frames, which are important features in geometric algorithms. We also develop tools to predict and verify the accuracy of these estimations. The theoretical contributions are: an intrinsic model of noise for transformations based on composition rather than addition; a unified formalism for the estimation of both the rigid transformation and its covariance matrix for points or frames correspondences, and a statistical validation method to verify the error estimation, which applies even when no “ground truth” is available. We analyze and demonstrate on synthetic data that our scheme is well behaved. The practical contribution of the paper is the validation of our transformation estimation method in the case of 3-D medical images, which shows that an accuracy of the registration far below the size of a voxel can be achieved, and in the case of protein substructure matching, where frame features drastically improve both selectivity and complexity.

Journal ArticleDOI
TL;DR: This paper presents a general framework for image-based analysis of 3D repeating motions that addresses two limitations in the state of the art, and derives necessary and sufficient conditions for an image sequence to be the projection of a3D repeating motion, accounting for changes in viewpoint and other camera parameters.
Abstract: This paper presents a general framework for image-based analysis of 3D repeating motions that addresses two limitations in the state of the art First, the assumption that a motion be perfectly even from one cycle to the next is relaxed Real repeating motions tend not to be perfectly even, ie, the length of a cycle varies through time because of physically important changes in the scene A generalization of period is defined for repeating motions that makes this temporal variation explicit This representation, called the period trace, is compact and purely temporal, describing the evolution of an object or scene without reference to spatial quantities such as position or velocity Second, the requirement that the observer be stationary is removed Observer motion complicates image analysis because an object that undergoes a 3D repeating motion will generally not produce a repeating sequence of images Using principles of affine invariance, we derive necessary and sufficient conditions for an image sequence to be the projection of a 3D repeating motion, accounting for changes in viewpoint and other camera parameters Unlike previous work in visual invariance, however, our approach is applicable to objects and scenes whose motion is highly non-rigid Experiments on real image sequences demonstrate how the approach may be used to detect several types of purely temporal motion features, relating to motion trends and irregularities Applications to athletic and medical motion analysis are discussed

Journal ArticleDOI
TL;DR: An effective stochastic gradient descent algorithm that automatically matches a model to a novel image and thereby perform image analysis is introduced.
Abstract: We describe a flexible model for representing images of objects of a certain class, known a priori, such as faces, and introduce a new algorithm for matching it to a novel image and thereby perform image analysis . The flexible model, known as a multidimensional morphable model , is learned from example images of objects of a class. In this paper we introduce an effective stochastic gradient descent algorithm that automatically matches a model to a novel image. Several experiments demonstrate the robustness and the broad range of applicability of morphable models. Our approach can provide novel solutions to several vision tasks, including the computation of image correspondence, object verification and image compression.

Journal ArticleDOI
TL;DR: An algebraic derivation of DeMenthon and Davis' method is given and it is shown that it belongs to a larger class of methods where the perspective camera model is approximated either at zero order (weak perspective) or first order (paraperspective).
Abstract: Recently, DeMenthon and Davis (1992, 1995) proposed a method for determining the pose of a 3-D object with respect to a camera from 3-D to 2-D point correspondences. The method consists of iteratively improving the pose computed with a weak perspective camera model to converge, at the limit, to a pose estimation computed with a perspective camera model. In this paper we give an algebraic derivation of DeMenthon and Davis‘ method and we show that it belongs to a larger class of methods where the perspective camera model is approximated either at zero order (weak perspective) or first order (paraperspective). We describe in detail an iterative paraperspective pose computation method for both non coplanar and coplanar object points. We analyse the convergence of these methods and we conclude that the iterative paraperspective method (proposed in this paper) has better convergence properties than the iterative weak perspective method. We introduce a simple way of taking into account the orthogonality constraint associated with the rotation matrix. We analyse the sensitivity to camera calibration errors and we define the optimal experimental setup with respect to imprecise camera calibration. We compare the results obtained with this method and with a non-linear optimization method.

Journal ArticleDOI
TL;DR: Evidence is presented indicating that, in some domains, normal (Gaussian) distributions are more accurate than uniform distributions for modeling feature fluctuations, which motivates the development of new maximum-likelihood and MAP recognition formulations which are based on normal feature models.
Abstract: This paper examines statistical approaches to model-based object recognition. Evidence is presented indicating that, in some domains, normal (Gaussian) distributions are more accurate than uniform distributions for modeling feature fluctuations. This motivates the development of new maximum-likelihood and MAP recognition formulations which are based on normal feature models. These formulations lead to an expression for the posterior probability of the pose and correspondences given an image. Several avenues are explored for specifying a recognition hypothesis. In the first approach, correspondences are included as a part of the hypotheses. Search for solutions may be ordered as a combinatorial search in correspondence space, or as a search over pose space, where the same criterion can equivalently be viewed as a robust variant of chamfer matching. In the second approach, correspondences are not viewed as being a part of the hypotheses. This leads to a criterion that is a smooth function of pose that is amenable to local search by continuous optimization methods. The criteria is also suitable for optimization via the Expectation-Maximization (EM) algorithm, which alternates between pose refinement and re-estimation of correspondence probabilities until convergence is obtained. Recognition experiments are described using the criteria with features derived from video images and from synthetic range images.

Journal ArticleDOI
TL;DR: This paper provides a means of directly extracting 3-D data covering a very wide field of view, thus by-passing the need for numerous depth map merging, and shows the results of the approach applied to both synthetic and real scenes.
Abstract: A traditional approach to extracting geometric information from a large scene is to compute multiple 3-D depth maps from stereo pairs or direct range finders, and then to merge the 3-D data. However, the resulting merged depth maps may be subject to merging errors if the relative poses between depth maps are not known exactly. In addition, the 3-D data may also have to be resampled before merging, which adds additional complexity and potential sources of errors. This paper provides a means of directly extracting 3-D data covering a very wide field of view, thus by-passing the need for numerous depth map merging. In our work, cylindrical images are first composited from sequences of images taken while the camera is rotated 360° about a vertical axis. By taking such image panoramas at different camera locations, we can recover 3-D data of the scene using a set of simple techniques: feature tracking, an 8-point structure from motion algorithm, and multibaseline stereo. We also investigate the effect of median filtering on the recovered 3-D point distributions, and show the results of our approach applied to both synthetic and real scenes.

Journal ArticleDOI
TL;DR: In this article, a theoretical framework for the perception of specular surface geometry is introduced, based on the notion of caustics, and a feature classification algorithm is developed that distinguishes real and virtual features from their image trajectories that result from observer motion.
Abstract: A theoretical framework is introduced for the perception of specular surface geometry. When an observer moves in three-dimensional space, real scene features such as surface markings remain stationary with respect to the surfaces they belong to. In contrast, a virtual feature which is the specular reflection of a real feature, travels on the surface. Based on the notion of caustics, a feature classification algorithm is developed that distinguishes real and virtual features from their image trajectories that result from observer motion. Next, using support functions of curves, a closed-form relation is derived between the image trajectory of a virtual feature and the geometry of the specular surface it travels on. It is shown that, in the 2D case, where camera motion and the surface profile are coplanar, the profile is uniquely recovered by tracking just two unknown virtual features. Finally, these results are generalized to the case of arbitrary 3D surface profiles that are traveled by virtual features when camera motion is not confined to a plane. This generalization includes a number of mathematical results that substantially enhance the present understanding of specular surface geometry. An algorithm is developed that uniquely recovers 3D surface profiles using a single virtual feature tracked from the occluding boundary of the object. All theoretical derivations and proposed algorithms are substantiated by experiments.

Journal ArticleDOI
TL;DR: This contribution addresses the problem of pose estimation and tracking of vehicles in image sequences from traffic scenes recorded by a stationary camera by directly matching polyhedral vehicle models to image gradients without an edge segment extraction process.
Abstract: This contribution addresses the problem of pose estimation and tracking of vehicles in image sequences from traffic scenes recorded by a stationary camera. In a new algorithm, the vehicle pose is estimated by directly matching polyhedral vehicle models to image gradients without an edge segment extraction process. The new approach is significantly more robust than approaches that rely on feature extraction since the new approach exploits more information from the image data. We successfully tracked vehicles that were partially occluded by textured objects, e.g., foliage, where a previous approach based on edge segment extraction failed. Moreover, the new pose estimation approach is also used to determine the orientation and position of the road relative to the camera by matching an intersection model directly to image gradients. Results from various experiments with real world traffic scenes are presented.

Journal ArticleDOI
TL;DR: A snake-based approach is proposed that allows a user to specify only the distant end points of the curve he wishes to delineate without having to supply an almost complete polygonal approximation, which greatly simplifies the initialization process and yields excellent convergence properties.
Abstract: We propose a snake-based approach that allows a user to specify only the distant end points of the curve he wishes to delineate without having to supply an almost complete polygonal approximation. This greatly simplifies the initialization process and yields excellent convergence properties. This is achieved by using the image information around the end points to provide boundary conditions and by introducing an optimization schedule that allows a snake to take image information into account first only near its extremities and then, progressively, toward its center. In effect, the snakes are clamped onto the image contour in a manner reminiscent of a ziplock being closed. These snakes can be used to alleviate the often repetitive task practitioners face when segmenting images by eliminating the need to sketch a feature of interest in its entirety, that is, to perform a painstaking, almost complete, manual segmentation.

Journal ArticleDOI
TL;DR: In this paper, the problem of reconstructing rigid motion from a sequence of perspective images is characterized as the estimation of the state of a nonlinear dynamical system, which is defined by the rigidity constraint and the perspective measurement map.
Abstract: The 3-D motion of a camera within a static environment produces a sequence of time-varying images that can be used for reconstructing the relative motion between the scene and the viewer. The problem of reconstructing rigid motion from a sequence of perspective images may be characterized as the estimation of the state of a nonlinear dynamical system, which is defined by the rigidity constraint and the perspective measurement map. The time-derivative of the measured output of such a system, which is called the “2-D motion field” and is approximated by the “optical flow”, is bilinear in the motion parameters, and may be used to specify a subspace constraint on the direction of heading independent of rotation and depth, and a pseudo-measurement for the rotational velocity as a function of the estimated heading. The subspace constraint may be viewed as an implicit dynamical model with parameters on a differentiable manifold, and the visual motion estimation problem may be cast in a system-theoretic framework as the identification of such an implicit model. We use techniques which pertain to nonlinear estimation and identification theory to recursively estimate 3-D rigid motion from a sequence of images independent of the structure of the scene. Such independence from scene-structure allows us to deal with a variable number of visible feature-points and occlusions in a principled way. The further decoupling of the direction of heading from the rotational velocity generates a filter with a state that belongs to a two-dimensional and highly constrained state-space. As a result, the filter exhibits robustness properties which are highlighted in a series of experiments on real and noisy synthetic image sequences. While the position of feature-points is not part of the state of the model, the innovation process of the filter describes how each feature is compatible with a rigid motion interpretation, which allows us to test for outliers and makes the filter robust with respect to errors in the feature tracking/optical flow, reflections, T-junctions. Once motion has been estimated, the 3-D structure of the scene follows easily. By releasing the constraint that the visible points lie in front of the viewer, one may explain some psychophysical effects on the nonrigid percept of rigidly moving objects.

Journal ArticleDOI
TL;DR: A new grey-scale measure, Δg, aiming to improve upon the most commongrey-scale error measure, the root-mean-square error, is introduced, an extension of the authors' recently developed binary error measure Δb, not only in structure, but also having both a theoretical and intuitive basis.
Abstract: Error measures can be used to numerically assess the differences between two images. Much work has been done on binary error measures, but little on objective metrics for grey-scale images. In our discussion here we introduce a new grey-scale measure, Δ_g, aiming to improve upon the most common grey-scale error measure, the root-mean-square error. Our new measure is an extension of the authors‘ recently developed binary error measure, Δ_b, not only in structure, but also having both a theoretical and intuitive basis. We consider the similarities between Δ_b and Δ_g when tested in practice on binary images, and present results comparing Δ_g to the root-mean-squared error and the Sobolev norm for various binary and grey-scale images. There are no previous examples where the last of these measures, the Sobolev norm, has been implemented for this purpose.

Journal ArticleDOI
TL;DR: A new approach is presented which deals with the 3D surface reconstruction problem directly from a discrete point of view, and a theoretical study of the epipolar correspondence between occluding contours is achieved.
Abstract: This paper addresses the problem of 3D surface reconstruction using image sequences. It has been shown that shape recovery from three or more occluding contours of the surface is possible given a known camera motion. Several algorithms, which have been recently proposed, allow such a reconstruction under the assumption of a linear camera motion. A new approach is presented which deals with the reconstruction problem directly from a discrete point of view. First, a theoretical study of the epipolar correspondence between occluding contours is achieved. A correct depth formulation is then derived from a local approximation of the surface up to order two. This allows the local shape to be estimated, given three consecutive contours, without any constraints on the camera motion. Experimental results are presented for both synthetic and real data.

Journal ArticleDOI
TL;DR: In this article, the shape-from-shading problem is formulated as an iterative, non-linear optimization problem and piecewise polynomial models of the 3D shape and albedo distribution are introduced to efficiently and stably compute the shape in practice.
Abstract: We address the problem of recovering the 3D shape of an unfolded book surface from the shading information in a scanner image. This shape-from-shading problem in a real world environment is made difficult by a proximal, moving light source, interreflections, specular reflections, and a nonuniform albedo distribution. Taking all these factors into account, we formulate the problem as an iterative, non-linear optimization problem. Piecewise polynomial models of the 3D shape and albedo distribution are introduced to efficiently and stably compute the shape in practice. Finally, we propose a method to restore the distorted scanner image based on the reconstructed 3D shape. The image restoration experiments for real book surfaces demonstrate that much of the geometric and photometric distortions are removed by our method.