scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Computer Vision in 2002"


Journal ArticleDOI
TL;DR: A new multiphase level set framework for image segmentation using the Mumford and Shah model, for piecewise constant and piecewise smooth optimal approximations, and validated by numerical results for signal and image denoising and segmentation.
Abstract: We propose a new multiphase level set framework for image segmentation using the Mumford and Shah model, for piecewise constant and piecewise smooth optimal approximations. The proposed method is also a generalization of an active contour model without edges based 2-phase segmentation, developed by the authors earlier in T. Chan and L. Vese (1999. In Scale-Space'99, M. Nilsen et al. (Eds.), LNCS, vol. 1682, pp. 141–151) and T. Chan and L. Vese (2001. IEEE-IP, 10(2):266–277). The multiphase level set formulation is new and of interest on its own: by construction, it automatically avoids the problems of vacuum and overlaps it needs only log n level set functions for n phases in the piecewise constant cases it can represent boundaries with complex topologies, including triple junctionss in the piecewise smooth case, only two level set functions formally suffice to represent any partition, based on The Four-Color Theorem. Finally, we validate the proposed models by numerical results for signal and image denoising and segmentation, implemented using the Osher and Sethian level set method.

2,649 citations


Journal ArticleDOI
TL;DR: This work studies the visual manifestations of different weather conditions, and model the chromatic effects of the atmospheric scattering and verify it for fog and haze, and derives several geometric constraints on scene color changes caused by varying atmospheric conditions.
Abstract: Current vision systems are designed to perform in clear weather. Needless to say, in any outdoor application, there is no escape from “bad” weather. Ultimately, computer vision systems must include mechanisms that enable them to function (even if somewhat less reliably) in the presence of haze, fog, rain, hail and snow. We begin by studying the visual manifestations of different weather conditions. For this, we draw on what is already known about atmospheric optics, and identify effects caused by bad weather that can be turned to our advantage. Since the atmosphere modulates the information carried from a scene point to the observer, it can be viewed as a mechanism of visual information coding. We exploit two fundamental scattering models and develop methods for recovering pertinent scene properties, such as three-dimensional structure, from one or two images taken under poor weather conditions. Next, we model the chromatic effects of the atmospheric scattering and verify it for fog and haze. Based on this chromatic model we derive several geometric constraints on scene color changes caused by varying atmospheric conditions. Finally, using these constraints we develop algorithms for computing fog or haze color, depth segmentation, extracting three-dimensional structure, and recovering “clear day” scene colors, from two or more images taken under different but unknown weather conditions.

1,325 citations


Journal ArticleDOI
TL;DR: A novel variational framework to deal with frame partition problems in Computer Vision that exploits boundary and region-based segmentation modules under a curve-based optimization objective function is presented.
Abstract: This paper presents a novel variational framework to deal with frame partition problems in Computer Vision. This framework exploits boundary and region-based segmentation modules under a curve-based optimization objective function. The task of supervised texture segmentation is considered to demonstrate the potentials of the proposed framework. The textured feature space is generated by filtering the given textured images using isotropic and anisotropic filters, and analyzing their responses as multi-component conditional probability density functions. The texture segmentation is obtained by unifying region and boundary-based information as an improved Geodesic Active Contour Model. The defined objective function is minimized using a gradient-descent method where a level set approach is used to implement the obtained PDE. According to this PDE, the curve propagation towards the final solution is guided by boundary and region-based segmentation forces, and is constrained by a regularity force. The level set implementation is performed using a fast front propagation algorithm where topological changes are naturally handled. The performance of our method is demonstrated on a variety of synthetic and real textured frames.

867 citations


Journal ArticleDOI
TL;DR: The implemented system shows that errors of simple stereo correlation, especially in object border regions, can be reduced in real-time using non-specialised computer hardware.
Abstract: This paper describes a real-time stereo vision system that is required to support high-level object based tasks in a tele-operated environment. Stereo vision is computationally expensive, due to having to find corresponding pixels. Correlation is a fast, standard way to solve the correspondence problem. This paper analyses the behaviour of correlation based stereo to find ways to improve its quality while maintaining its real-time suitability. Three methods are suggested. Two of them aim to improve the disparity image especially at depth discontinuities, while one targets the identification of possible errors in general. Results are given on real stereo images with ground truth. A comparison with five standard correlation methods is provided. All proposed algorithms are described in detail and performance issues and optimisation are discussed. Finally, performance results of individual parts of the stereo algorithm are shown, including rectification, filtering and correlation using all proposed methods. The implemented system shows that errors of simple stereo correlation, especially in object border regions, can be reduced in real-time using non-specialised computer hardware.

518 citations


Journal ArticleDOI
TL;DR: This paper presents a computational representation of human action to capture these dramatic changes using spatio-temporal curvature of 2-D trajectory that is compact, view-invariant, and capable of explaining an action in terms of meaningful action units called dynamic instants and intervals.
Abstract: Analysis of human perception of motion shows that information for representing the motion is obtained from the dramatic changes in the speed and direction of the trajectory. In this paper, we present a computational representation of human action to capture these dramatic changes using spatio-temporal curvature of 2-D trajectory. This representation is compact, view-invariant, and is capable of explaining an action in terms of meaningful action units called dynamic instants and intervals. A dynamic instant is an instantaneous entity that occurs for only one frame, and represents an important change in the motion characteristics. An interval represents the time period between two dynamic instants during which the motion characteristics do not change. Starting without a model, we use this representation for recognition and incremental learning of human actions. The proposed method can discover instances of the same action performed by different people from different view points. Experiments on 47 actions performed by 7 individuals in an environment with no constraints shows the robustness of the proposed method.

500 citations


Journal ArticleDOI
TL;DR: An active contour algorithm that is capable of using prior shapes is reported that is able to find boundaries that are similar in shape to the prior, even when the entire boundary is not visible in the image.
Abstract: In this paper, we report an active contour algorithm that is capable of using prior shapes. The energy functional of the contour is modified so that the energy depends on the image gradient as well as the prior shape. The model provides the segmentation and the transformation that maps the segmented contour to the prior shape. The active contour is able to find boundaries that are similar in shape to the prior, even when the entire boundary is not visible in the image (i.e., when the boundary has gaps). A level set formulation of the active contour is presented. The existence of the solution to the energy minimization is also established. We also report experimental results of the use of this contour on 2d synthetic images, ultrasound images and fMRI images. Classical active contours cannot be used in many of these images.

455 citations


Journal ArticleDOI
TL;DR: The thrust of this paper is that many of the existing methods for nonrigid monomodal registration that use simple criteria for comparing the intensities can be extended to the multimodal case where more complex intensity similarity measures are necessary.
Abstract: Matching images of different modalities can be achieved by the maximization of suitable statistical similarity measures within a given class of geometric transformations. Handling complex, nonrigid deformations in this context turns out to be particularly difficult and has attracted much attention in the last few years. The thrust of this paper is that many of the existing methods for nonrigid monomodal registration that use simple criteria for comparing the intensities (e.g. SSD) can be extended to the multimodal case where more complex intensity similarity measures are necessary. To this end, we perform a formal computation of the variational gradient of a hierarchy of statistical similarity measures, and use the results to generalize a recently proposed and very effective optical flow algorithm (L. Alvarez, J. Weickert, and J. Sanchez, 2000, Technical Report, and IJCV 39(1):41–56) to the case of multimodal image registration. Our method readily extends to the case of locally computed similarity measures, thus providing the flexibility to cope with spatial non-stationarities in the way the intensities in the two images are related. The well posedness of the resulting equations is proved in a complementary work (O.D. Faugeras and G. Hermosillo, 2001, Technical Report 4235, INRIA) using well established techniques in functional analysis. We briefly describe our numerical implementation of these equations and show results on real and synthetic data.

454 citations


Journal ArticleDOI
TL;DR: A method for describing human activities from video images based on concept hierarchies of actions based on semantic primitives, which demonstrates the performance of the proposed method by several experiments.
Abstract: We propose a method for describing human activities from video images based on concept hierarchies of actions. Major difficulty in transforming video images into textual descriptions is how to bridge a semantic gap between them, which is also known as inverse Hollywood problem. In general, the concepts of events or actions of human can be classified by semantic primitives. By associating these concepts with the semantic features extracted from video images, appropriate syntactic components such as verbs, objects, etc. are determined and then translated into natural language sentences. We also demonstrate the performance of the proposed method by several experiments.

364 citations


Journal ArticleDOI
TL;DR: A modification of the Mumford-Shah functional and its cartoon limit is presented which facilitates the incorporation of a statistical prior on the shape of the segmenting contour and a closed-form, parameter-free solution for incorporating invariance with respect to similarity transformations in the variational framework is proposed.
Abstract: We present a modification of the Mumford-Shah functional and its cartoon limit which facilitates the incorporation of a statistical prior on the shape of the segmenting contour. By minimizing a single energy functional, we obtain a segmentation process which maximizes both the grey value homogeneity in the separated regions and the similarity of the contour with respect to a set of training shapes. We propose a closed-form, parameter-free solution for incorporating invariance with respect to similarity transformations in the variational framework. We show segmentation results on artificial and real-world images with and without prior shape information. In the cases of noise, occlusion or strongly cluttered background the shape prior significantly improves segmentation. Finally we compare our results to those obtained by a level set implementation of geodesic active contours.

332 citations


Journal ArticleDOI
TL;DR: The Hamiltonian formulation is reviewed, which offers specific advantages when it comes to the detection of singularities or shocks, and a robust and accurate algorithm for computing skeletons in 2D as well as 3D, which has low computational complexity.
Abstract: The eikonal equation and variants of it are of significant interest for problems in computer vision and image processing. It is the basis for continuous versions of mathematical morphology, stereo, shape-from-shading and for recent dynamic theories of shape. Its numerical simulation can be delicate, owing to the formation of singularities in the evolving front and is typically based on level set methods. However, there are more classical approaches rooted in Hamiltonian physics which have yet to be widely used by the computer vision community. In this paper we review the Hamiltonian formulation, which offers specific advantages when it comes to the detection of singularities or shocks. We specialize to the case of Blum's grassfire flow and measure the average outward flux of the vector field that underlies the Hamiltonian system. This measure has very different limiting behaviors depending upon whether the region over which it is computed shrinks to a singular point or a non-singular one. Hence, it is an effective way to distinguish between these two cases. We combine the flux measurement with a homotopy preserving thinning process applied in a discrete lattice. This leads to a robust and accurate algorithm for computing skeletons in 2D as well as 3D, which has low computational complexity. We illustrate the approach with several computational examples.

295 citations


Journal ArticleDOI
Philip H. S. Torr1
TL;DR: This paper explores ways of automating the model selection process with specific emphasis on the least squares problem of fitting manifolds to data points, illustrated with respect to epipolar geometry.
Abstract: Computer vision often involves estimating models from visual input. Sometimes it is possible to fit several different models or hypotheses to a set of data, and a decision must be made as to which is most appropriate. This paper explores ways of automating the model selection process with specific emphasis on the least squares problem of fitting manifolds (in particular algebraic varieties e.g. lines, algebraic curves, planes etc.) to data points, illustrated with respect to epipolar geometry. The approach is Bayesian and the contribution three fold, first a new Bayesian description of the problem is laid out that supersedes the author's previous maximum likelihood formulations, this formulation will reveal some hidden elements of the problem. Second an algorithm, ‘MAPSAC’, is provided to obtain the robust MAP estimate of an arbitrary manifold. Third, a Bayesian model selection paradigm is proposed, the Bayesian formulation of the manifold fitting problem uncovers an elegant solution to this problem, for which a new method ‘GRIC’ for approximating the posterior probability of each putative model is derived. This approximations bears some similarity to the penalized likelihoods used by AIC, BIC and MDL however it is far more accurate in situations involving large numbers of latent variables whose number increases with the data. This will be empirically and theoretically demonstrated.

Journal ArticleDOI
TL;DR: A fast and reliable stereo matching algorithm which produces a dense disparity map by using fast cross correlation, rectangular subregioning (RSR) and 3D maximum-surface techniques in a coarse-to-fine scheme.
Abstract: This paper presents a fast and reliable stereo matching algorithm which produces a dense disparity map by using fast cross correlation, rectangular subregioning (RSR) and 3D maximum-surface techniques in a coarse-to-fine scheme. Fast correlation is achieved by using the box-filtering technique whose speed is invariant to the size of the correlation window and by segmenting the stereo images into rectangular subimages at different levels of the pyramid. By working with rectangular subimages, not only can the speed of the correlation be further increased, the intermediate memory storage requirement can also be reduced. The disparity map for the stereo images is found in the 3D correlation coefficient volume by obtaining the global 3D maximum-surface rather than simply choosing the position that gives the local maximum correlation coefficient value for each pixel. The 3D maximum-surface is obtained using our new two-stage dynamic programming (TSDP) technique. There are two original contributions in this paper: (1) development of the RSR technique for fast similarity measures and (2) development of the TSDP technique for efficiently obtaining 3D maximum-surface in a 3D volume. Typical running time of our algorithm implemented in the C language on a 512 × 512 image is in the order of a few seconds on a 500 MHz PC. A variety of synthetic and real images have been tested, and good results have been obtained.

Journal ArticleDOI
TL;DR: The classical epipolar geometry of perspective cameras is extended to all central catadioptric cameras and it is shown that the corresponding points lie on epipolar conics.
Abstract: Central catadioptric cameras are cameras which combine lenses and mirrors to capture a very wide field of view with a central projection. In this paper we extend the classical epipolar geometry of perspective cameras to all central catadioptric cameras. Epipolar geometry is formulated as the geometry of corresponding rays in a three-dimensional space. Using the model of image formation of central catadioptric cameras, the constraint on corresponding image points is then derived. It is shown that the corresponding points lie on epipolar conics. In addition, the shape of the conics for all types of central catadioptric cameras is classified. Finally, the theory is verified by experiments with real central catadioptric cameras.

Journal ArticleDOI
TL;DR: A level set method to segment MR cardiac images is proposed based on a coupled propagation of two cardiac contours and integrates visual information with anatomical constraints and uses the Additive Operator Splitting scheme.
Abstract: In this paper we propose a level set method to segment MR cardiac images. Our approach is based on a coupled propagation of two cardiac contours and integrates visual information with anatomical constraints. The visual information is expressed through a gradient vector flow-based boundary component and a region term that aims at best separating the cardiac contours/regions according to their global intensity properties. In order to deal with misleading visual support, an anatomical constraint is considered that couples the propagation of the cardiac contours according to their relative distance. The resulting motion equations are implemented using a level set approach and a fast and stable numerical approximation scheme, the Additive Operator Splitting. Encouraging experimental results are provided using real data.

Journal ArticleDOI
TL;DR: The theory and practice of self-calibration of cameras which are fixed in location and may freely rotate while changing their internal parameters by zooming is described and some near-ambiguities that arise under rotational motions are identified.
Abstract: In this paper we describe the theory and practice of self-calibration of cameras which are fixed in location and may freely rotate while changing their internal parameters by zooming. The basis of our approach is to make use of the so-called infinite homography constraint which relates the unknown calibration matrices to the computed inter-image homographies. In order for the calibration to be possible some constraints must be placed on the internal parameters of the camera. We present various self-calibration methods. First an iterative non-linear method is described which is very versatile in terms of the constraints that may be imposed on the camera calibration: each of the camera parameters may be assumed to be known, constant throughout the sequence but unknown, or free to vary. Secondly, we describe a fast linear method which works under the minimal assumption of zero camera skew or the more restrictive conditions of square pixels (zero skew and known aspect ratio) or known principal point. We show experimental results on both synthetic and real image sequences (where ground truth data was available) to assess the accuracy and the stability of the algorithms and to compare the result of applying different constraints on the camera parameters. We also derive an optimal Maximum Likelihood estimator for the calibration and the motion parameters. Prior knowledge about the distribution of the estimated parameters (such as the location of the principal point) may also be incorporated via Maximum a Posteriori estimation. We then identify some near-ambiguities that arise under rotational motions showing that coupled changes of certain parameters are barely observable making them indistinguishable. Finally we study the negative effect of radial distortion in the self-calibration process and point out some possible solutions to it.

Journal ArticleDOI
TL;DR: The robust cost functions and the associated hierarchical minimization techniques that are proposed mix efficiently non-parametric (dense) representations, local interacting parametric representations, and global non-interacting parametric representation related to a partition into regions.
Abstract: In this paper we present a comprehensive energy-based framework for the estimation and the segmentation of the apparent motion in image sequences. The robust cost functions and the associated hierarchical minimization techniques that we propose mix efficiently non-parametric (dense) representations, local interacting parametric representations, and global non-interacting parametric representations related to a partition into regions. Experimental comparisons, both on synthetic and real images, demonstrate the merit of the approach on different types of photometric and kinematic contents ranging from moving rigid objects to moving fluids.

Journal ArticleDOI
Kentaro Toyama1, Andrew Blake1
TL;DR: A new, exemplar-based, probabilistic paradigm for visual tracking is presented, which provides alternatives to standard learning algorithms by allowing the use of metrics that are not embedded in a vector space and uses a noise model that is learned from training data.
Abstract: A new, exemplar-based, probabilistic paradigm for visual tracking is presented. Probabilistic mechanisms are attractive because they handle fusion of information, especially temporal fusion, in a principled manner. Exemplars are selected representatives of raw training data, used here to represent probabilistic mixture distributions of object configurations. Their use avoids tedious hand-construction of object models, and problems with changes of topology. Using exemplars in place of a parameterized model poses several challenges, addressed here with what we call the “Metric Mixture” (M2) approach, which has a number of attractions. Principally, it provides alternatives to standard learning algorithms by allowing the use of metrics that are not embedded in a vector space. Secondly, it uses a noise model that is learned from training data. Lastly, it eliminates any need for an assumption of probabilistic pixelwise independence. Experiments demonstrate the effectiveness of the M2 model in two domains: tracking walking people using “chamfer” distances on binary edge images, and tracking mouth movements by means of a shuffle distance.

Journal ArticleDOI
TL;DR: It is shown that complete surfel-based reconstructions can be created by repeatedly applying an algorithm called Surfel Sampling that combines sampling and parameter estimation to fit a single surfel to a small, bounded region of space-time.
Abstract: In this paper we study the problem of recovering the 3D shape, reflectance, and non-rigid motion properties of a dynamic 3D scene. Because these properties are completely unknown and because the scene's shape and motion may be non-smooth, our approach uses multiple views to build a piecewise-continuous geometric and radiometric representation of the scene's trace in space-time. A basic primitive of this representation is the dynamic surfel, which (1) encodes the instantaneous local shape, reflectance, and motion of a small and bounded region in the scene, and (2) enables accurate prediction of the region's dynamic appearance under known illumination conditions. We show that complete surfel-based reconstructions can be created by repeatedly applying an algorithm called Surfel Sampling that combines sampling and parameter estimation to fit a single surfel to a small, bounded region of space-time. Experimental results with the Phong reflectance model and complex real scenes (clothing, shiny objects, skin) illustrate our method's ability to explain pixels and pixel variations in terms of their underlying causes—shape, reflectance, motion, illumination, and visibility.

Journal ArticleDOI
TL;DR: The approach works by extending operations like image painting, scissoring, and morphing so that they alter a scene's plenoptic function in a physically-consistent way, thereby affecting scene appearance from all viewpoints simultaneously.
Abstract: This paper presents a new class of interactive image editing operations designed to maintain consistency between multiple images of a physical 3D scene. The distinguishing feature of these operations is that edits to any one image propagate automatically to all other images as if the (unknown) 3D scene had itself been modified. The modified scene can then be viewed interactively from any other camera viewpoint and under different scene illuminations. The approach is useful first as a power-assist that enables a user to quickly modify many images by editing just a few, and second as a means for constructing and editing image-based scene representations by manipulating a set of photographs. The approach works by extending operations like image painting, scissoring, and morphing so that they alter a scene's plenoptic function in a physically-consistent way, thereby affecting scene appearance from all viewpoints simultaneously. A key element in realizing these operations is a new volumetric decomposition technique for reconstructing an scene's plenoptic function from an incomplete set of camera viewpoints.

Journal ArticleDOI
TL;DR: Applications to functional MRI data analysis for human brain mapping, dynamic contrast-enhanced perfusion MRI for the diagnosis of cerebrovascular disease, and magnetic resonance mammography for the analysis of suspicious lesions in patients with breast cancer are presented.
Abstract: In this paper, we present neural network clustering by deterministic annealing as a powerful strategy for self-organized segmentation of biomedical image time-series data identifying groups of pixels sharing common properties of local signal dynamics. After introducing the theoretical concept of minimal free energy vector quantization and related clustering techniques, we discuss its potential to serve as a multi-purpose computer vision strategy to image time-series analysis and visualization for many fields of medicine ranging from biomedical basic research to clinical assessment of patient data. In particular, we present applications to (i) functional MRI data analysis for human brain mapping, (ii) dynamic contrast-enhanced perfusion MRI for the diagnosis of cerebrovascular disease, and (iii) magnetic resonance mammography for the analysis of suspicious lesions in patients with breast cancer. This wide scope of completely different medical applications illustrates the flexibility and conceptual power of neural network vector quantization in this context. Although there are obvious methodological similarities, each application requires specific careful consideration w.r.t. data preprocessing, postprocessing and interpretation. This challenge can only be managed by close interdisciplinary cooperation of medical doctors, engineers, and computer scientists. Hence, this field of research can serve as an example for lively cross-fertilization between computer vision and related research.

Journal ArticleDOI
TL;DR: In this paper, a new hierarchical stereo algorithm that matches individual pixels in corresponding scanlines by minimizing a cost function is presented and it is shown that this complexity is independent of the disparityrange.
Abstract: In this paper, a new hierarchical stereo algorithm is presented. The algorithm matches individual pixels in corresponding scanlines by minimizing a cost function. Several cost functions are compared. The algorithm achieves a tremendous gain in speed and memory requirements by implementing it hierarchically. The images are downsampled an optimal number of times and the disparity map of a lower level is used as ‘offset’ disparity map at a higher level. An important contribution consists of the complexity analysis of the algorithm. It is shown that this complexity is independent of the disparityrange. This result is also used to determine the optimal number of downsample levels. This speed gain results in the ability to use more complex (compute intensive) cost functions that deliver high quality disparity maps. Another advantage of this algorithm is that cost functions can be chosen independent of the optimisation algorithm. The algorithm in this paper is symmetric, i.e. exactly the same matches are found if left and right image are swapped. Finally, the algorithm was carefully implemented so that a minimal amount of memory is used. It has proven its efficiency on large images with a high disparity range as well as its quality. Examples are given in this paper.

Journal ArticleDOI
TL;DR: A new approach to covariance-weighted factorization, which can factor noisy feature correspondences with high degree of directional uncertainty into structure and motion and provides a unified approach for treating corner-like points together with points along linear structures in the image.
Abstract: Factorization using Singular Value Decomposition (SVD) is often used for recovering 3D shape and motion from feature correspondences across multiple views. SVD is powerful at finding the global solution to the associated least-square-error minimization problem. However, this is the correct error to minimize only when the x and y positional errors in the features are uncorrelated and identically distributed. But this is rarely the case in real data. Uncertainty in feature position depends on the underlying spatial intensity structure in the image, which has strong directionality to it. Hence, the proper measure to minimize is covariance-weighted squared-error (or the Mahalanobis distance). In this paper, we describe a new approach to covariance-weighted factorization, which can factor noisy feature correspondences with high degree of directional uncertainty into structure and motion. Our approach is based on transforming the raw-data into a covariance-weighted data space, where the components of noise in the different directions are uncorrelated and identically distributed. Applying SVD to the transformed data now minimizes a meaningful objective function in this new data space. This is followed by a linear but suboptimal second step to recover the shape and motion in the original data space. We empirically show that our algorithm gives very good results for varying degrees of directional uncertainty. In particular, we show that unlike other SVD-based factorization algorithms, our method does not degrade with increase in directionality of uncertainty, even in the extreme when only normal-flow data is available. It thus provides a unified approach for treating corner-like points together with points along linear structures in the image.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a multinocular stereopsis that exploits the Helmholtz reciprocity of the light source and camera positions to reconstruct the geometry of objects from a collection of images.
Abstract: We present a method—termed Helmholtz stereopsis—for reconstructing the geometry of objects from a collection of images. Unlike existing methods for surface reconstruction (e.g., stereo vision, structure from motion, photometric stereopsis), Helmholtz stereopsis makes no assumptions about the nature of the bidirectional reflectance distribution functions (BRDFs) of objects. This new method of multinocular stereopsis exploits Helmholtz reciprocity by choosing pairs of light source and camera positions that guarantee that the ratio of the emitted radiance to the incident irradiance is the same for corresponding points in the two images. The method provides direct estimates of both depth and surface normals, and consequently weds the advantages of both conventional stereopsis and photometric stereopsis. Results from our implementation lend empirical support to our technique.

Journal ArticleDOI
TL;DR: This paper shows how the multi-frame subspace constraints can be used for constraining the 2D correspondence estimation process itself, and shows that these constraints are valid not only for affine cameras, but also for a variety of imaging models, scene models, and motion models.
Abstract: When a rigid scene is imaged by a moving camera, the set of all displacements of all points across multiple frames often resides in a low-dimensional linear subspace. Linear subspace constraints have been used successfully in the past for recovering 3D structure and 3D motion information from multiple frames (e.g., by using the factorization method of Tomasi and Kanade (1992, International Journal of Computer Vision, 9:137–154)). These methods assume that the 2D correspondences have been precomputed. However, correspondence estimation is a fundamental problem in motion analysis. In this paper we show how the multi-frame subspace constraints can be used for constraining the 2D correspondence estimation process itself. We show that the multi-frame subspace constraints are valid not only for affine cameras, but also for a variety of imaging models, scene models, and motion models. The multi-frame subspace constraints are first translated from constraints on correspondences to constraints directly on image measurements (e.g., image brightness quantities). These brightness-based subspace constraints are then used for estimating the correspondences, by requiring that all corresponding points across all video frames reside in the appropriate low-dimensional linear subspace. The multi-frame subspace constraints are geometrically meaningful, and are {not} violated at depth discontinuities, nor when the camera-motion changes abruptly. These constraints can therefore replace {heuristic} constraints commonly used in optical-flow estimation, such as spatial or temporal smoothness.

Journal ArticleDOI
TL;DR: A new framework to automatically group similar shots into one scene, where a scene is generally referred to as a group of shots taken place in the same site, is presented.
Abstract: In this paper, we present a new framework to automatically group similar shots into one scene, where a scene is generally referred to as a group of shots taken place in the same site. Two major components in this framework are based on the motion characterization and background segmentation. The former component leads to an effective video representation scheme by adaptively selecting and forming keyframes. The later is considered novel in that background reconstruction is incorporated into the detection of scene change. These two components, combined with the color histogram intersection, establish our basic concept on assessing the similarity of scenes.

Journal ArticleDOI
TL;DR: A geometric model and a computational method for segmentation of images with missing boundaries are presented and an algorithm which tries to build missing information on the basis of the given point of view and the available information as boundary data to the algorithm is developed.
Abstract: We present a geometric model and a computational method for segmentation of images with missing boundaries. In many situations, the human visual system fills in missing gaps in edges and boundaries, building and completing information that is not present. Boundary completion presents a considerable challenge in computer vision, since most algorithms attempt to exploit existing data. A large body of work concerns completion models, which postulate how to construct missing datas these models are often trained and specific to particular images. In this paper, we take the following, alternative perspective: we consider a given reference point within the image, and then develop an algorithm which tries to build missing information on the basis of the given point of view and the available information as boundary data to the algorithm. Starting from this point of view, a surface is constructed. It is then evolved with the mean curvature flow in the metric induced by the image until a piecewise constant solution is reached. We test the computational model on modal completion, amodal completion, and texture segmentation. We extend the geometric model and the algorithm to 3D in order to extract shapes from low signal/noise ratio ultrasound image volumes. Results in 3D echocardiography and 3D fetal echography are also presented.

Journal ArticleDOI
TL;DR: A new set of techniques for modeling and animating realistic faces from photographs and videos that allows the interactive recovery of a textured 3D face model and can be linearly combined to express a wide range of expressions.
Abstract: We present a new set of techniques for modeling and animating realistic faces from photographs and videos. Given a set of face photographs taken simultaneously, our modeling technique allows the interactive recovery of a textured 3D face model. By repeating this process for several facial expressions, we acquire a set of face models that can be linearly combined to express a wide range of expressions. Given a video sequence, this linear face model can be used to estimate the face position, orientation, and facial expression at each frame. We illustrate these techniques on several datasets and demonstrate robust estimations of detailed face geometry and motion.

Journal ArticleDOI
TL;DR: A new method for 3D object recognition which uses segment-based stereo vision to identify objects in a cluttered environment and its position and orientation are determined accurately enabling a robot to pick up the object and manipulate it.
Abstract: We propose a new method for 3D object recognition which uses segment-based stereo vision. An object is identified in a cluttered environment and its position and orientation (6 dof) are determined accurately enabling a robot to pick up the object and manipulate it. The object can be of any shape (planar figures, polyhedra, free-form objects) and partially occluded by other objects. Segment-based stereo vision is employed for 3D sensing. Both CAD-based and sensor-based object modeling subsystems are available. Matching is performed by calculating candidates for the object position and orientation using local features, verifying each candidate, and improving the accuracy of the position and orientation by an iteration method. Several experimental results are presented to demonstrate the usefulness of the proposed method.

Journal ArticleDOI
TL;DR: In this paper, a variational framework was developed to solve the problem of regularizing fields of orthonormal vector sets, using constraint-preserving anisotropic diffusion PDE's.
Abstract: We are interested in regularizing fields of orthonormal vector sets, using constraint-preserving anisotropic diffusion PDE's. Each point of such a field is defined by multiple orthogonal and unitary vectors and can indeed represent a lot of interesting orientation features such as direction vectors or orthogonal matrices (among other examples). We first develop a general variational framework that solves this regularization problem, thanks to a constrained minimization of φ-functionals. This leads to a set of coupled vector-valued PDE's preserving the orthonormal constraints. Then, we focus on particular applications of this general framework, including the restoration of noisy direction fields, noisy chromaticity color images, estimated camera motions and DT-MRI (Diffusion Tensor MRI) datasets.

Journal ArticleDOI
TL;DR: By simultaneously reconstructing points and views, this paper can exploit the numerical stabilizing effect of having wide spread cameras with large mutual baselines, and is demonstrated by reconstructing the outsideand inside of a building on the basis of 35 views in one single Singular Value Decomposition.
Abstract: This paper presents a linear algorithm for simultaneous computation of 3D points and camera positions from multiple perspective views based on having a reference plane visible in all views. The reconstruction and camera recovery is achieved in a single step by finding the null-space of a matrix built from image data using Singular Value Decomposition. Contrary to factorization algorithms this approach does not need to have all points visible in all views. This paper investigates two reference plane configurations: Finite reference planes defined by four coplanar points and infinite reference planes defined by vanishing points. A further contribution of this paper is the study of critical configurations for configurations with four coplanar points. By simultaneously reconstructing points and views we can exploit the numerical stabilizing effect of having wide spread cameras with large mutual baselines. This is demonstrated by reconstructing the outside and inside (courtyard) of a building on the basis of 35 views in one single Singular Value Decomposition.