scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Computer Vision in 2006"


Journal ArticleDOI
TL;DR: This application epitomizes the best features of combinatorial graph cuts methods in vision: global optima, practical efficiency, numerical robustness, ability to fuse a wide range of visual cues and constraints, unrestricted topological properties of segments, and applicability to N-D problems.
Abstract: Combinatorial graph cut algorithms have been successfully applied to a wide range of problems in vision and graphics. This paper focusses on possibly the simplest application of graph-cuts: segmentation of objects in image data. Despite its simplicity, this application epitomizes the best features of combinatorial graph cuts methods in vision: global optima, practical efficiency, numerical robustness, ability to fuse a wide range of visual cues and constraints, unrestricted topological properties of segments, and applicability to N-D problems. Graph cuts based approaches to object extraction have also been shown to have interesting connections with earlier segmentation methods such as snakes, geodesic active contours, and level-sets. The segmentation energies optimized by graph cuts combine boundary regularization with region-based properties in the same fashion as Mumford-Shah style functionals. We present motivation and detailed technical description of the basic combinatorial optimization framework for image segmentation via s/t graph cuts. After the general concept of using binary graph cut algorithms for object segmentation was first proposed and tested in Boykov and Jolly (2001), this idea was widely studied in computer vision and graphics communities. We provide links to a large number of known extensions based on iterative parameter re-estimation and learning, multi-scale or hierarchical approaches, narrow bands, and other techniques for demanding photo, video, and medical applications.

2,076 citations


Journal ArticleDOI
TL;DR: This paper proposes to endow the tensor space with an affine-invariant Riemannian metric and demonstrates that it leads to strong theoretical properties: the cone of positive definite symmetric matrices is replaced by a regular and complete manifold without boundaries, the geodesic between two tensors and the mean of a set of tensors are uniquely defined.
Abstract: Tensors are nowadays a common source of geometric information. In this paper, we propose to endow the tensor space with an affine-invariant Riemannian metric. We demonstrate that it leads to strong theoretical properties: the cone of positive definite symmetric matrices is replaced by a regular and complete manifold without boundaries (null eigenvalues are at the infinity), the geodesic between two tensors and the mean of a set of tensors are uniquely defined, etc. We have previously shown that the Riemannian metric provides a powerful framework for generalizing statistics to manifolds. In this paper, we show that it is also possible to generalize to tensor fields many important geometric data processing algorithms such as interpolation, filtering, diffusion and restoration of missing data. For instance, most interpolation and Gaussian filtering schemes can be tackled efficiently through a weighted mean computation. Linear and anisotropic diffusion schemes can be adapted to our Riemannian framework, through partial differential evolution equations, provided that the metric of the tensor space is taken into account. For that purpose, we provide intrinsic numerical schemes to compute the gradient and Laplace-Beltrami operators. Finally, to enforce the fidelity to the data (either sparsely distributed tensors or complete tensors fields) we propose least-squares criteria based on our invariant Riemannian distance which are particularly simple and efficient to solve.

1,588 citations


Journal ArticleDOI
TL;DR: Algorithmic techniques are presented that substantially improve the running time of the loopy belief propagation approach and reduce the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel, which is important for problems such as image restoration that have a large label set.
Abstract: Markov random field models provide a robust and unified framework for early vision problems such as stereo and image restoration. Inference algorithms based on graph cuts and belief propagation have been found to yield accurate results, but despite recent advances are often too slow for practical use. In this paper we present some algorithmic techniques that substantially improve the running time of the loopy belief propagation approach. One of the techniques reduces the complexity of the inference algorithm to be linear rather than quadratic in the number of possible labels for each pixel, which is important for problems such as image restoration that have a large label set. Another technique speeds up and reduces the memory requirements of belief propagation on grid graphs. A third technique is a multi-grid method that makes it possible to obtain good results with a small fixed number of message passing iterations, independent of the size of the input images. Taken together these techniques speed up the standard algorithm by several orders of magnitude. In practice we obtain results that are as accurate as those of other global methods (e.g., using the Middlebury stereo benchmark) while being nearly as fast as purely local methods.

1,560 citations


Journal ArticleDOI
TL;DR: The paper shows that the correlation graph between u and ρ may serve as an efficient tool to select the splitting parameter, and proposes a new fast algorithm to solve the TV − L1 minimization problem.
Abstract: This paper explores various aspects of the image decomposition problem using modern variational techniques. We aim at splitting an original image f into two components u and ?, where u holds the geometrical information and ? holds the textural information. The focus of this paper is to study different energy terms and functional spaces that suit various types of textures. Our modeling uses the total-variation energy for extracting the structural part and one of four of the following norms for the textural part: L2, G, L1 and a new tunable norm, suggested here for the first time, based on Gabor functions. Apart from the broad perspective and our suggestions when each model should be used, the paper contains three specific novelties: first we show that the correlation graph between u and ? may serve as an efficient tool to select the splitting parameter, second we propose a new fast algorithm to solve the TV ? L1 minimization problem, and third we introduce the theory and design tools for the TV-Gabor model.

659 citations


Journal ArticleDOI
TL;DR: An algorithm for unsupervised learning of image manifolds by semidefinite programming that computes a low dimensional representation of each image with the property that distances between nearby images are preserved.
Abstract: Can we detect low dimensional structure in high dimensional data sets of images? In this paper, we propose an algorithm for unsupervised learning of image manifolds by semidefinite programming. Given a data set of images, our algorithm computes a low dimensional representation of each image with the property that distances between nearby images are preserved. More generally, it can be used to analyze high dimensional data that lies on or near a low dimensional manifold. We illustrate the algorithm on easily visualized examples of curves and surfaces, as well as on actual images of faces, handwritten digits, and solid objects.

590 citations


Journal ArticleDOI
TL;DR: A novel representation for three-dimensional objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches is introduced, allowing the acquisition of true 3D affine and Euclidean models from multiple unregistered images, as well as their recognition in photographs taken from arbitrary viewpoints.
Abstract: This article introduces a novel representation for three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patches under affine projection are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true 3D affine and Euclidean models from multiple unregistered images, as well as their recognition in photographs taken from arbitrary viewpoints. The proposed approach does not require a separate segmentation stage, and it is applicable to highly cluttered scenes. Modeling and recognition results are presented.

458 citations


Journal ArticleDOI
TL;DR: This work presents Discriminative Random Fields (DRFs) to model spatial interactions in images in a discriminative framework based on the concept of Conditional Random Fields proposed by lafferty et al.(2001).
Abstract: In this research we address the problem of classification and labeling of regions given a single static natural image. Natural images exhibit strong spatial dependencies, and modeling these dependencies in a principled manner is crucial to achieve good classification accuracy. In this work, we present Discriminative Random Fields (DRFs) to model spatial interactions in images in a discriminative framework based on the concept of Conditional Random Fields proposed by lafferty et al.(2001). The DRFs classify image regions by incorporating neighborhood spatial interactions in the labels as well as the observed data. The DRF framework offers several advantages over the conventional Markov Random Field (MRF) framework. First, the DRFs allow to relax the strong assumption of conditional independence of the observed data generally used in the MRF framework for tractability. This assumption is too restrictive for a large number of applications in computer vision. Second, the DRFs derive their classification power by exploiting the probabilistic discriminative models instead of the generative models used for modeling observations in the MRF framework. Third, the interaction in labels in DRFs is based on the idea of pairwise discrimination of the observed data making it data-adaptive instead of being fixed a priori as in MRFs. Finally, all the parameters in the DRF model are estimated simultaneously from the training data unlike the MRF framework where the likelihood parameters are usually learned separately from the field parameters. We present preliminary experiments with man-made structure detection and binary image restoration tasks, and compare the DRF results with the MRF results.

420 citations


Journal ArticleDOI
TL;DR: This paper proposes shape dissimilarity measures on the space of level set functions which are analytically invariant under the action of certain transformation groups, and proposes a statistical shape prior which allows to accurately encode multiple fairly distinct training shapes.
Abstract: In this paper, we make two contributions to the field of level set based image segmentation. Firstly, we propose shape dissimilarity measures on the space of level set functions which are analytically invariant under the action of certain transformation groups. The invariance is obtained by an intrinsic registration of the evolving level set function. In contrast to existing approaches to invariance in the level set framework, this closed-form solution removes the need to iteratively optimize explicit pose parameters. The resulting shape gradient is more accurate in that it takes into account the effect of boundary variation on the object's pose. Secondly, based on these invariant shape dissimilarity measures, we propose a statistical shape prior which allows to accurately encode multiple fairly distinct training shapes. This prior constitutes an extension of kernel density estimators to the level set domain. In contrast to the commonly employed Gaussian distribution, such nonparametric density estimators are suited to model aribtrary distributions. We demonstrate the advantages of this multi-modal shape prior applied to the segmentation and tracking of a partially occluded walking person in a video sequence, and on the segmentation of the left ventricle in cardiac ultrasound images. We give quantitative results on segmentation accuracy and on the dependency of segmentation results on the number of training shapes.

406 citations


Journal ArticleDOI
TL;DR: A variational model for optic flow computation based on non-linearised and higher order constancy assumptions, including the common grey value constancy assumption, as well as the constancy of the Hessian and the Laplacian are proposed.
Abstract: In this paper, we suggest a variational model for optic flow computation based on non-linearised and higher order constancy assumptions. Besides the common grey value constancy assumption, also gradient constancy, as well as the constancy of the Hessian and the Laplacian are proposed. Since the model strictly refrains from a linearisation of these assumptions, it is also capable to deal with large displacements. For the minimisation of the rather complex energy functional, we present an efficient numerical scheme employing two nested fixed point iterations. Following a coarse-to-fine strategy it turns out that there is a theoretical foundation of so-called warping techniques hitherto justified only on an experimental basis. Since our algorithm consists of the integration of various concepts, ranging from different constancy assumptions to numerical implementation issues, a detailed account of the effect of each of these concepts is included in the experimental section. The superior performance of the proposed method shows up by significantly smaller estimation errors when compared to previous techniques. Further experiments also confirm excellent robustness under noise and insensitivity to parameter variations.

388 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed an algorithm to increase the resolution of multispectral satellite images knowing the panchromatic image at high resolution and the spectral channels at lower resolution.
Abstract: We propose an algorithm to increase the resolution of multispectral satellite images knowing the panchromatic image at high resolution and the spectral channels at lower resolution. Our algorithm is based on the assumption that, to a large extent, the geometry of the spectral channels is contained in the topographic map of its panchromatic image. This assumption, together with the relation of the panchromatic image to the spectral channels, and the expression of the low-resolution pixel in terms of the high-resolution pixels given by some convolution kernel followed by subsampling, constitute the elements for constructing an energy functional (with several variants) whose minima will give the reconstructed spectral images at higher resolution. We discuss the validity of the above approach and describe our numerical procedure. Finally, some experiments on a set of multispectral satellite images are displayed.

309 citations


Journal ArticleDOI
TL;DR: A noise removal technique using partial differential equations (PDEs) that combines the Total Variational filter with a fourth-order PDE filter that is able to preserve edges and at the same time avoid the staircase effect in smooth regions is proposed.
Abstract: A noise removal technique using partial differential equations (PDEs) is proposed here. It combines the Total Variational (TV) filter with a fourth-order PDE filter. The combined technique is able to preserve edges and at the same time avoid the staircase effect in smooth regions. A weighting function is used in an iterative way to combine the solutions of the TV-filter and the fourth-order filter. Numerical experiments confirm that the new method is able to use less restrictive time step than the fourth-order filter. Numerical examples using images with objects consisting of edge, flat and intermediate regions illustrate advantages of the proposed model.

Journal ArticleDOI
TL;DR: A new tensor-driven PDE is introduced, regularizing images while taking the curvatures of specific integral curves into account, and it is shown that this constraint is particularly well suited for the preservation of thin structures in an image restoration process.
Abstract: We are interested in PDE's (Partial Differential Equations) in order to smooth multi-valued images in an anisotropic manner. Starting from a review of existing anisotropic regularization schemes based on diffusion PDE's, we point out the pros and cons of the different equations proposed in the literature. Then, we introduce a new tensor-driven PDE, regularizing images while taking the curvatures of specific integral curves into account. We show that this constraint is particularly well suited for the preservation of thin structures in an image restoration process. A direct link is made between our proposed equation and a continuous formulation of the LIC's (Line Integral Convolutions by Cabral and Leedom (1993). It leads to the design of a very fast and stable algorithm that implements our regularization method, by successive integrations of pixel values along curved integral lines. Besides, the scheme numerically performs with a sub-pixel accuracy and preserves then thin image structures better than classical finite-differences discretizations. Finally, we illustrate the efficiency of our generic curvature-preserving approach --- in terms of speed and visual quality --- with different comparisons and various applications requiring image smoothing : color images denoising, inpainting and image resizing by nonlinear interpolation.

Journal ArticleDOI
TL;DR: A Dynamically Multi-Linked Hidden Markov Model (DML-HMM) is developed based on the discovery of salient dynamic interlinks among multiple temporal processes corresponding to multiple event classes resulting in its topology being intrinsically determined by the underlying causality and temporal order among events.
Abstract: In this work, we present a unified bottom-up and top-down automatic model selection based approach for modelling complex activities of multiple objects in cluttered scenes. An activity of multiple objects is represented based on discrete scene events and their behaviours are modelled by reasoning about the temporal and causal correlations among different events. This is significantly different from the majority of the existing techniques that are centred on object tracking followed by trajectory matching. In our approach, object-independent events are detected and classified by unsupervised clustering using Expectation-Maximisation (EM) and classified using automatic model selection based on Schwarz's Bayesian Information Criterion (BIC). Dynamic Probabilistic Networks (DPNs) are formulated for modelling the temporal and causal correlations among discrete events for robust and holistic scene-level behaviour interpretation. In particular, we developed a Dynamically Multi-Linked Hidden Markov Model (DML-HMM) based on the discovery of salient dynamic interlinks among multiple temporal processes corresponding to multiple event classes. A DML-HMM is built using BIC based factorisation resulting in its topology being intrinsically determined by the underlying causality and temporal order among events. Extensive experiments are conducted on modelling activities captured in different indoor and outdoor scenes. Our experimental results demonstrate that the performance of a DML-HMM on modelling group activities in a noisy and cluttered scene is superior compared to those of other comparable dynamic probabilistic networks including a Multi-Observation Hidden Markov Model (MOHMM), a Parallel Hidden Markov Model (PaHMM) and a Coupled Hidden Markov Model (CHMM).

Journal ArticleDOI
TL;DR: An exact and parameter-free algorithm to build scale-sets image descriptions whose sections constitute a monotone sequence of upward global minima of a multi-scale energy, which is called the “scale climbing” algorithm is introduced.
Abstract: This paper introduces a multi-scale theory of piecewise image modelling, called the scale-sets theory, and which can be regarded as a region-oriented scale-space theory The first part of the paper studies the general structure of a geometrically unbiased region-oriented multi-scale image description and introduces the scale-sets representation, a representation which allows to handle such a description exactly The second part of the paper deals with the way scale-sets image analyses can be built according to an energy minimization principle We consider a rather general formulation of the partitioning problem which involves minimizing a two-term-based energy, of the form � C + D, where D is a goodness-of-fit term and C is a regularization term We describe the way such energies arise from basic principles of approximate modelling and we relate them to operational rate/distorsion problems involved in lossy compression problems We then show that an important subset of these energies constitutes a class of multi-scale energies in that the minimal cut of a hierarchy gets coarser and coarser as parameter � increases This allows us to devise a fast dynamic-programming procedure to find the complete scale-sets representation of this family of minimal cuts Considering then the construction of the hierarchy from which the minimal cuts are extracted, we end up with an exact and parameter-free algorithm to build scale-sets image descriptions whose sections constitute a monotone sequence of upward global minima of a multi-scale energy, which is called the "scale climbing" algorithm This algorithm can be viewed as a continuation method along the scale dimension or as a minimum pursuit along the operational rate/distorsion curve Furthermore, the solution verifies a linear scale invariance property which allows to completely postpone the tuning of the scale parameter to a subsequent stage For computational reasons, the scale climbing algorithm is approximated by a pair-wise region merging scheme: however the principal properties of the solutions are kept Some results obtained with Mumford-Shah's piece-wise constant model and a variant are provided and different applications of the proposed multi-scale analyses are finally sketched

Journal ArticleDOI
TL;DR: It is proved that, under the weak-perspective projection model, enforcing both the basis and the rotation constraints leads to a closed-form solution to the problem of non-rigid shape and motion recovery, which is important for applications like robot navigation and human computer interaction.
Abstract: Recovery of three dimensional (3D) shape and motion of non-static scenes from a monocular video sequence is important for applications like robot navigation and human computer interaction. If every point in the scene randomly moves, it is impossible to recover the non-rigid shapes. In practice, many non-rigid objects, e.g. the human face under various expressions, deform with certain structures. Their shapes can be regarded as a weighted combination of certain shape bases. Shape and motion recovery under such situations has attracted much interest. Previous work on this problem (Bregler, C., Hertzmann, A., and Biermann, H. 2000. In Proc. Int. Conf. Computer Vision and Pattern Recognition; Brand, M. 2001. In Proc. Int. Conf. Computer Vision and Pattern Recognition; Torresani, L., Yang, D., Alexander, G., and Bregler, C. 2001. In Proc. Int. Conf. Computer Vision and Pattern Recognition) utilized only orthonormality constraints on the camera rotations (rotation constraints). This paper proves that using only the rotation constraints results in ambiguous and invalid solutions. The ambiguity arises from the fact that the shape bases are not unique. An arbitrary linear transformation of the bases produces another set of eligible bases. To eliminate the ambiguity, we propose a set of novel constraints, basis constraints, which uniquely determine the shape bases. We prove that, under the weak-perspective projection model, enforcing both the basis and the rotation constraints leads to a closed-form solution to the problem of non-rigid shape and motion recovery. The accuracy and robustness of our closed-form solution is evaluated quantitatively on synthetic data and qualitatively on real video sequences.

Journal ArticleDOI
TL;DR: This paper evaluates the approach theoretically and shows why a straightforward application of the 2D invariance idea will not work, and describes strategies designed to overcome inherent problems in the straightforward approach and outlines the recognition algorithm.
Abstract: This paper presents an approach for viewpoint invariant human action recognition, an area that has received scant attention so far, relative to the overall body of work in human action recognition. It has been established previously that there exist no invariants for 3D to 2D projection. However, there exist a wealth of techniques in 2D invariance that can be used to advantage in 3D to 2D projection. We exploit these techniques and model actions in terms of view-invariant canonical body poses and trajectories in 2D invariance space, leading to a simple and effective way to represent and recognize human actions from a general viewpoint. We first evaluate the approach theoretically and show why a straightforward application of the 2D invariance idea will not work. We describe strategies designed to overcome inherent problems in the straightforward approach and outline the recognition algorithm. We then present results on 2D projections of publicly available human motion capture data as well on manually segmented real image sequences. In addition to robustness to viewpoint change, the approach is robust enough to handle different people, minor variabilities in a given action, and the speed of aciton (and hence, frame-rate) while encoding sufficient distinction among actions.

Journal ArticleDOI
TL;DR: An ensemble learning framework based on random sampling on all three key components of a classification system: the feature space, training samples, and subspace parameters is developed, and a robust random sampling face recognition system integrating shape, texture, and Gabor responses is constructed.
Abstract: Subspace face recognition often suffers from two problems: (1) the training sample set is small compared with the high dimensional feature vector; (2) the performance is sensitive to the subspace dimension. Instead of pursuing a single optimal subspace, we develop an ensemble learning framework based on random sampling on all three key components of a classification system: the feature space, training samples, and subspace parameters. Fisherface and Null Space LDA (N-LDA) are two conventional approaches to address the small sample size problem. But in many cases, these LDA classifiers are overfitted to the training set and discard some useful discriminative information. By analyzing different overfitting problems for the two kinds of LDA classifiers, we use random subspace and bagging to improve them respectively. By random sampling on feature vectors and training samples, multiple stabilized Fisherface and N-LDA classifiers are constructed and the two groups of complementary classifiers are integrated using a fusion rule, so nearly all the discriminative information is preserved. In addition, we further apply random sampling on parameter selection in order to overcome the difficulty of selecting optimal parameters in our algorithms. Then, we use the developed random sampling framework for the integration of multiple features. A robust random sampling face recognition system integrating shape, texture, and Gabor responses is finally constructed.

Journal ArticleDOI
TL;DR: This work presents a novel feature matching algorithm for automatic pairwise registration of range images which is robust to the resolution of the range images, the number of tensors per view, the required amount of overlap, and noise.
Abstract: Automatic registration of range images is a fundamental problem in 3D modeling of free-from objects. Various feature matching algorithms have been proposed for this purpose. However, these algorithms suffer from various limitations mainly related to their applicability, efficiency, robustness to resolution, and the discriminating capability of the used feature representation. We present a novel feature matching algorithm for automatic pairwise registration of range images which overcomes these limitations. Our algorithm uses a novel tensor representation which represents semi-local 3D surface patches of a range image by third order tensors. Multiple tensors are used to represent each range image. Tensors of two range images are matched to identify correspondences between them. Correspondences are verified and then used for pairwise registration of the range images. Experimental results show that our algorithm is accurate and efficient. Moreover, it is robust to the resolution of the range images, the number of tensors per view, the required amount of overlap, and noise. Comparisons with the spin image representation revealed that our representation has more discriminating capabilities and performs better at a low resolution of the range images.

Journal ArticleDOI
TL;DR: This work provides a convergence analysis for widely used registration algorithms such as ICP, using either closest points or tangent planes at closest points and for a recently developed approach based on quadratic approximants of the squared distance function.
Abstract: The computation of a rigid body transformation which optimally aligns a set of measurement points with a surface and related registration problems are studied from the viewpoint of geometry and optimization. We provide a convergence analysis for widely used registration algorithms such as ICP, using either closest points (Besl and McKay, 1992) or tangent planes at closest points (Chen and Medioni, 1991) and for a recently developed approach based on quadratic approximants of the squared distance function (Pottmann et al., 2004). ICP based on closest points exhibits local linear convergence only. Its counterpart which minimizes squared distances to the tangent planes at closest points is a Gauss---Newton iteration; it achieves local quadratic convergence for a zero residual problem and--if enhanced by regularization and step size control--comes close to quadratic convergence in many realistic scenarios. Quadratically convergent algorithms are based on the approach in (Pottmann et al., 2004). The theoretical results are supported by a number of experiments; there, we also compare the algorithms with respect to global convergence behavior, stability and running time.

Journal ArticleDOI
TL;DR: In this paper, the problem of matching two unsynchronized video sequences of the same dynamic scene, recorded by different stationary uncalibrated video cameras, is addressed by enforcing consistent matching of all points along corresponding space-time trajectories.
Abstract: This paper studies the problem of matching two unsynchronized video sequences of the same dynamic scene, recorded by different stationary uncalibrated video cameras. The matching is done both in time and in space, where the spatial matching can be modeled by a homography (for 2D scenarios) or by a fundamental matrix (for 3D scenarios). Our approach is based on matching space-time trajectories of moving objects, in contrast to matching interest points (e.g., corners), as done in regular feature-based image-to-image matching techniques. The sequences are matched in space and time by enforcing consistent matching of all points along corresponding space-time trajectories. By exploiting the dynamic properties of these space-time trajectories, we obtain sub-frame temporal correspondence (synchronization) between the two video sequences. Furthermore, using trajectories rather than feature-points significantly reduces the combinatorial complexity of the spatial point-matching problem when the search space is large. This benefit allows for matching information across sensors in situations which are extremely difficult when only image-to-image matching is used, including: (a) matching under large scale (zoom) differences, (b) very wide base-line matching, and (c) matching across different sensing modalities (e.g., IR and visible-light cameras). We show examples of recovering homographies and fundamental matrices under such conditions.

Journal ArticleDOI
TL;DR: In this paper, a novel object recognition approach based on affine invariant regions is presented, which actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions.
Abstract: We present a novel Object Recognition approach based on affine invariant regions. It actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions. After producing an initial set of matches, the method gradually explores the surrounding image areas, recursively constructing more and more matching regions, increasingly farther from the initial ones. This process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. The approach includes a mechanism for capturing the relationships between multiple model views and exploiting these for integrating the contributions of the views at recognition time. This is based on an efficient algorithm for partitioning a set of region matches into groups lying on smooth surfaces. Integration is achieved by measuring the consistency of configurations of groups arising from different model views. Experimental results demonstrate the stronger power of the approach in dealing with extensive clutter, dominant occlusion, and large scale and viewpoint changes. Non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. All presented techniques can extend any view-point invariant feature extractor.

Journal ArticleDOI
TL;DR: In this paper, the problem of fitting 2D shapes into a conformal mappings of two shapes into each other is solved by weaving together conformally the interior and exterior of the unit circle, glued together by a given diffeomorphism.
Abstract: The study of 2D shapes and their similarities is a central problem in the field of vision. It arises in particular from the task of classifying and recognizing objects from their observed silhouette. Defining natural distances between 2D shapes creates a metric space of shapes, whose mathematical structure is inherently relevant to the classification task. One intriguing metric space comes from using conformal mappings of 2D shapes into each other, via the theory of Teichmuller spaces. In this space every simple closed curve in the plane (a "shape") is represented by a `fingerprint' which is a diffeomorphism of the unit circle to itself (a differentiable and invertible, periodic function). More precisely, every shape defines to a unique equivalence class of such diffeomorphisms up to right multiplication by a Mobius map. The fingerprint does not change if the shape is varied by translations and scaling and any such equivalence class comes from some shape. This coset space, equipped with the infinitesimal Weil-Petersson (WP) Riemannian norm is a metric space. In this space, the shortest path between each two shapes is unique, and is given by a geodesic connecting them. Their distance from each other is given by integrating the WP-norm along that geodesic. In this paper we concentrate on solving the "welding" problem of "sewing" together conformally the interior and exterior of the unit circle, glued on the unit circle by a given diffeomorphism, to obtain the unique 2D shape associated with this diffeomorphism. This will allow us to go back and forth between 2D shapes and their representing diffeomorphisms in this "space of shapes". We then present an efficient method for computing the unique shortest path, the geodesic of shape morphing between each two end-point shapes. The group of diffeomorphisms of S1 acts as a group of isometries on the space of shapes and we show how this can be used to define shape transformations, like for instance `adding a protruding limb' to any shape.

Journal ArticleDOI
TL;DR: In this article, the authors propose a complete framework for 3D geometry modeling and processing that uses only fast geodesic computations, including a greedy algorithm to perform a uniform or adaptive remeshing of a triangulated surface.
Abstract: In this paper, we propose a complete framework for 3D geometry modeling and processing that uses only fast geodesic computations. The basic building block for these techniques is a novel greedy algorithm to perform a uniform or adaptive remeshing of a triangulated surface. Our other contributions include a parameterization scheme based on barycentric coordinates, an intrinsic algorithm for computing geodesic centroidal tessellations, and a fast and robust method to flatten a genus-0 surface patch. On large meshes (more than 500,000 vertices), our techniques speed up computation by over one order of magnitude in comparison to classical remeshing and parameterization methods. Our methods are easy to implement and do not need multilevel solvers to handle complex models that may contain poorly shaped triangles.

Journal ArticleDOI
TL;DR: It is demonstrated that context may be determined from low level visual features sampled over a wide receptive field and it is shown that when the target object is unambiguously visible, context is only marginally useful.
Abstract: In this study, a discriminative detector for object context is designed and tested. The context-feature is simple to implement, feed-forward, and effective across multiple object types in a street-scenes environment. Using context alone, we demonstrate robust detection of locations likely to contain bicycles, cars, and pedestrians. Furthermore, experiments are conducted so as to address several open questions regarding visual context. Specifically, it is demonstrated that context may be determined from low level visual features (simple color and texture descriptors) sampled over a wide receptive field. At least for the framework tested, high level semantic knowledge, e.g, the nature of the surrounding objects, is superfluous. Finally, it is shown that when the target object is unambiguously visible, context is only marginally useful.

Journal ArticleDOI
TL;DR: A segmentation technique to automatically extract the myocardium in 4D cardiac MR and CT datasets using EM-based region segmentation and Dijkstra active contours using graph cuts, spline fitting, or point pattern matching is described.
Abstract: This paper describes a segmentation technique to automatically extract the myocardium in 4D cardiac MR and CT datasets. The segmentation algorithm is a two step process. The global localization step roughly localizes the left ventricle using techniques such as maximum discrimination, thresholding and connected component analysis. The local deformations step combines EM-based region segmentation and Dijkstra active contours using graph cuts, spline fitting, or point pattern matching. The technique has been tested on a large number of patients and both quantitative and qualitative results are presented.

Journal ArticleDOI
TL;DR: An algebraic geometric approach to 3-D motion estimation and segmentation of multiple rigid-body motions from noise-free point correspondences in two perspective views that exploits the algebraic and geometric properties of the so-called multibody epipolar constraint and its associatedMultibody fundamental matrix.
Abstract: We present an algebraic geometric approach to 3-D motion estimation and segmentation of multiple rigid-body motions from noise-free point correspondences in two perspective views. Our approach exploits the algebraic and geometric properties of the so-called multibody epipolar constraint and its associated multibody fundamental matrix, which are natural generalizations of the epipolar constraint and of the fundamental matrix to multiple motions. We derive a rank constraint on a polynomial embedding of the correspondences, from which one can estimate the number of independent motions as well as linearly solve for the multibody fundamental matrix. We then show how to compute the epipolar lines from the first-order derivatives of the multibody epipolar constraint and the epipoles by solving a plane clustering problem using Generalized PCA (GPCA). Given the epipoles and epipolar lines, the estimation of individual fundamental matrices becomes a linear problem. The clustering of the feature points is then automatically obtained from either the epipoles and epipolar lines or from the individual fundamental matrices. Although our approach is mostly designed for noise-free correspondences, we also test its performance on synthetic and real data with moderate levels of noise.

Journal ArticleDOI
TL;DR: A method for automatically obtaining object representations suitable for retrieval from generic video shots that includes associating regions within a single shot to represent a deforming object and an affine factorization method that copes with motion degeneracy.
Abstract: We describe a method for automatically obtaining object representations suitable for retrieval from generic video shots. The object representation consists of an association of frame regions. These regions provide exemplars of the object's possible visual appearances. Two ideas are developed: (i) associating regions within a single shot to represent a deforming object; (ii) associating regions from the multiple visual aspects of a 3D object, thereby implicitly representing 3D structure. For the association we exploit temporal continuity (tracking) and wide baseline matching of affine covariant regions. In the implementation there are three areas of novelty: First, we describe a method to repair short gaps in tracks. Second, we show how to join tracks across occlusions (where many tracks terminate simultaneously). Third, we develop an affine factorization method that copes with motion degeneracy. We obtain tracks that last throughout the shot, without requiring a 3D reconstruction. The factorization method is used to associate tracks into object-level groups, with common motion. The outcome is that separate parts of an object that are not simultaneously visible (such as the front and back of a car, or the front and side of a face) are associated together. In turn this enables object-level matching and recognition throughout a video. We illustrate the method on the feature film "Groundhog Day." Examples are given for the retrieval of deforming objects (heads, walking people) and rigid objects (vehicles, locations).

Journal ArticleDOI
TL;DR: This work presents a matching algorithm that establishes many-to-many correspondences between the nodes of two noisy, vertex-labeled weighted graphs using a novel embedding technique based on a spherical encoding of graph structure.
Abstract: Object recognition can be formulated as matching image features to model features. When recognition is exemplar-based, feature correspondence is one-to-one. However, segmentation errors, articulation, scale difference, and within-class deformation can yield image and model features which don't match one-to-one but rather many-to-many. Adopting a graph-based representation of a set of features, we present a matching algorithm that establishes many-to-many correspondences between the nodes of two noisy, vertex-labeled weighted graphs. Our approach reduces the problem of many-to-many matching of weighted graphs to that of many-to-many matching of weighted point sets in a normed vector space. This is accomplished by embedding the initial weighted graphs into a normed vector space with low distortion using a novel embedding technique based on a spherical encoding of graph structure. Many-to-many vector correspondences established by the Earth Mover's Distance framework are mapped back into many-to-many correspondences between graph nodes. Empirical evaluation of the algorithm on an extensive set of recognition trials, including a comparison with two competing graph matching approaches, demonstrates both the robustness and efficacy of the overall approach.

Journal ArticleDOI
TL;DR: A new variational model to segment an object belonging to a given shape space using the active contour method, a geometric shape prior and the Mumford-Shah functional is proposed and it is proved the existence of this minimum in the space of functions with bounded variation.
Abstract: In this paper, we propose a new variational model to segment an object belonging to a given shape space using the active contour method, a geometric shape prior and the Mumford-Shah functional. The core of our model is an energy functional composed by three complementary terms. The first one is based on a shape model which constrains the active contour to get a shape of interest. The second term detects object boundaries from image gradients. And the third term drives globally the shape prior and the active contour towards a homogeneous intensity region. The segmentation of the object of interest is given by the minimum of our energy functional. This minimum is computed with the calculus of variations and the gradient descent method that provide a system of evolution equations solved with the well-known level set method. We also prove the existence of this minimum in the space of functions with bounded variation. Applications of the proposed model are presented on synthetic and medical images.

Journal ArticleDOI
TL;DR: In this paper, a numerical framework based on bidirectional multigrid methods was proposed for accelerating a broad class of variational optic flow methods with different constancy and smoothness assumptions.
Abstract: Variational methods are among the most accurate techniques for estimating the optic flow. They yield dense flow fields and can be designed such that they preserve discontinuities, estimate large displacements correctly and perform well under noise and varying illumination. However, such adaptations render the minimisation of the underlying energy functional very expensive in terms of computational costs: Typically one or more large linear or nonlinear equation systems have to be solved in order to obtain the desired solution. Consequently, variational methods are considered to be too slow for real-time performance. In our paper we address this problem in two ways: (i) We present a numerical framework based on bidirectional multigrid methods for accelerating a broad class of variational optic flow methods with different constancy and smoothness assumptions. Thereby, our work focuses particularly on regularisation strategies that preserve discontinuities. (ii) We show by the examples of five classical and two recent variational techniques that real-time performance is possible in all cases--even for very complex optic flow models that offer high accuracy. Experiments show that frame rates up to 63 dense flow fields per second for image sequences of size 160 × 120 can be achieved on a standard PC. Compared to classical iterative methods this constitutes a speedup of two to four orders of magnitude.