scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Computer Vision in 2009"


Journal ArticleDOI
TL;DR: A non-iterative solution to the PnP problem—the estimation of the pose of a calibrated camera from n 3D-to-2D point correspondences—whose computational complexity grows linearly with n, which can be done in O(n) time by expressing these coordinates as weighted sum of the eigenvectors of a 12×12 matrix.
Abstract: We propose a non-iterative solution to the PnP problem--the estimation of the pose of a calibrated camera from n 3D-to-2D point correspondences--whose computational complexity grows linearly with n This is in contrast to state-of-the-art methods that are O(n 5) or even O(n 8), without being more accurate Our method is applicable for all n?4 and handles properly both planar and non-planar configurations Our central idea is to express the n 3D points as a weighted sum of four virtual control points The problem then reduces to estimating the coordinates of these control points in the camera referential, which can be done in O(n) time by expressing these coordinates as weighted sum of the eigenvectors of a 12×12 matrix and solving a small constant number of quadratic equations to pick the right weights Furthermore, if maximal precision is required, the output of the closed-form solution can be used to initialize a Gauss-Newton scheme, which improves accuracy with negligible amount of additional time The advantages of our method are demonstrated by thorough testing on both synthetic and real-data

2,598 citations


Journal ArticleDOI
TL;DR: A new approach for learning a discriminative model of object classes, incorporating texture, layout, and context information efficiently, which gives competitive and visually pleasing results for objects that are highly textured, highly structured, and even articulated.
Abstract: This paper details a new approach for learning a discriminative model of object classes, incorporating texture, layout, and context information efficiently The learned model is used for automatic visual understanding and semantic segmentation of photographs Our discriminative model exploits texture-layout filters, novel features based on textons, which jointly model patterns of texture and their spatial layout Unary classification and feature selection is achieved using shared boosting to give an efficient classifier which can be applied to a large number of classes Accurate image segmentation is achieved by incorporating the unary classifier in a conditional random field, which (i) captures the spatial interactions between class labels of neighboring pixels, and (ii) improves the segmentation of specific object instances Efficient training of the model on large datasets is achieved by exploiting both random feature selection and piecewise training methods High classification and segmentation accuracy is demonstrated on four varied databases: (i) the MSRC 21-class database containing photographs of real objects viewed under general lighting conditions, poses and viewpoints, (ii) the 7-class Corel subset and (iii) the 7-class Sowerby database used in He et al (Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, vol 2, pp 695---702, June 2004), and (iv) a set of video sequences of television shows The proposed algorithm gives competitive and visually pleasing results for objects that are highly textured (grass, trees, etc), highly structured (cars, faces, bicycles, airplanes, etc), and even articulated (body, cow, etc)

1,193 citations


Journal ArticleDOI
TL;DR: The approach provides a practical method for learning high-order Markov random field models with potential functions that extend over large pixel neighborhoods with non-linear functions of many linear filter responses.
Abstract: We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach provides a practical method for learning high-order Markov random field (MRF) models with potential functions that extend over large pixel neighborhoods. These clique potentials are modeled using the Product-of-Experts framework that uses non-linear functions of many linear filter responses. In contrast to previous MRF approaches all parameters, including the linear filters themselves, are learned from training data. We demonstrate the capabilities of this Field-of-Experts model with two example applications, image denoising and image inpainting, which are implemented using a simple, approximate inference scheme. While the model is trained on a generic image database and is not tuned toward a specific application, we obtain results that compete with specialized techniques.

848 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a new signal processing analysis of the bilateral filter, which complements the recent studies that analyzed it as a PDE or as a robust statistical estimator.
Abstract: The bilateral filter is a nonlinear filter that smoothes a signal while preserving strong edges. It has demonstrated great effectiveness for a variety of problems in computer vision and computer graphics, and fast versions have been proposed. Unfortunately, little is known about the accuracy of such accelerations. In this paper, we propose a new signal-processing analysis of the bilateral filter which complements the recent studies that analyzed it as a PDE or as a robust statistical estimator. The key to our analysis is to express the filter in a higher-dimensional space where the signal intensity is added to the original domain dimensions. Importantly, this signal-processing perspective allows us to develop a novel bilateral filtering acceleration using downsampling in space and intensity. This affords a principled expression of accuracy in terms of bandwidth and sampling. The bilateral filter can be expressed as linear convolutions in this augmented space followed by two simple nonlinearities. This allows us to derive criteria for downsampling the key operations and achieving important acceleration of the bilateral filter. We show that, for the same running time, our method is more accurate than previous acceleration techniques. Typically, we are able to process a 2 megapixel image using our acceleration technique in less than a second, and have the result be visually similar to the exact computation that takes several tens of minutes. The acceleration is most effective with large spatial kernels. Furthermore, this approach extends naturally to color images and cross bilateral filtering.

789 citations


Journal ArticleDOI
TL;DR: The proposed adaptive stochastic gradient descent method is compared to a standard, non-adaptive Robbins-Monro (RM) algorithm and indicates that ASGD is robust to variations in the registration framework and is less sensitive to the settings of the user-defined parameters than RM.
Abstract: We present a stochastic gradient descent optimisation method for image registration with adaptive step size prediction. The method is based on the theoretical work by Plakhov and Cruz (J. Math. Sci. 120(1):964---973, 2004). Our main methodological contribution is the derivation of an image-driven mechanism to select proper values for the most important free parameters of the method. The selection mechanism employs general characteristics of the cost functions that commonly occur in intensity-based image registration. Also, the theoretical convergence conditions of the optimisation method are taken into account. The proposed adaptive stochastic gradient descent (ASGD) method is compared to a standard, non-adaptive Robbins-Monro (RM) algorithm. Both ASGD and RM employ a stochastic subsampling technique to accelerate the optimisation process. Registration experiments were performed on 3D CT and MR data of the head, lungs, and prostate, using various similarity measures and transformation models. The results indicate that ASGD is robust to these variations in the registration framework and is less sensitive to the settings of the user-defined parameters than RM. The main disadvantage of RM is the need for a predetermined step size function. The ASGD method provides a solution for that issue.

432 citations


Journal ArticleDOI
TL;DR: An iterative sampling procedure to improve the uniform sampling strategy, an automatic scheme of inferring the tuning parameter from the data, a precise initialization procedure for K-means, as well as a simple strategy for isolating outliers are suggested.
Abstract: This paper presents novel techniques for improving the performance of a multi-way spectral clustering framework (Govindu in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 1, pp. 1150---1157, 2005; Chen and Lerman, 2007, preprint in the supplementary webpage) for segmenting affine subspaces. Specifically, it suggests an iterative sampling procedure to improve the uniform sampling strategy, an automatic scheme of inferring the tuning parameter from the data, a precise initialization procedure for K-means, as well as a simple strategy for isolating outliers. The resulting algorithm, Spectral Curvature Clustering (SCC), requires only linear storage and takes linear running time in the size of the data. It is supported by theory which both justifies its successful performance and guides our practical choices. We compare it with other existing methods on a few artificial instances of affine subspaces. Application of the algorithm to several real-world problems is also discussed.

428 citations


Journal ArticleDOI
TL;DR: This work seeks that projection which produces a type of intrinsic, independent of lighting reflectance-information only image by minimizing entropy, and from there go on to remove shadows as previously, and goes over to the quadratic entropy, rather than Shannon's definition.
Abstract: Recently, a method for removing shadows from colour images was developed (Finlayson et al. in IEEE Trans. Pattern Anal. Mach. Intell. 28:59---68, 2006) that relies upon finding a special direction in a 2D chromaticity feature space. This "invariant direction" is that for which particular colour features, when projected into 1D, produce a greyscale image which is approximately invariant to intensity and colour of scene illumination. Thus shadows, which are in essence a particular type of lighting, are greatly attenuated. The main approach to finding this special angle is a camera calibration: a colour target is imaged under many different lights, and the direction that best makes colour patch images equal across illuminants is the invariant direction. Here, we take a different approach. In this work, instead of a camera calibration we aim at finding the invariant direction from evidence in the colour image itself. Specifically, we recognize that producing a 1D projection in the correct invariant direction will result in a 1D distribution of pixel values that have smaller entropy than projecting in the wrong direction. The reason is that the correct projection results in a probability distribution spike, for pixels all the same except differing by the lighting that produced their observed RGB values and therefore lying along a line with orientation equal to the invariant direction. Hence we seek that projection which produces a type of intrinsic, independent of lighting reflectance-information only image by minimizing entropy, and from there go on to remove shadows as previously. To be able to develop an effective description of the entropy-minimization task, we go over to the quadratic entropy, rather than Shannon's definition. Replacing the observed pixels with a kernel density probability distribution, the quadratic entropy can be written as a very simple formulation, and can be evaluated using the efficient Fast Gauss Transform. The entropy, written in this embodiment, has the advantage that it is more insensitive to quantization than is the usual definition. The resulting algorithm is quite reliable, and the shadow removal step produces good shadow-free colour image results whenever strong shadow edges are present in the image. In most cases studied, entropy has a strong minimum for the invariant direction, revealing a new property of image formation.

312 citations


Journal ArticleDOI
TL;DR: An interactive framework for soft segmentation and matting of natural images and videos is presented in this article, which is based on the optimal, linear time, computation of weighted geodesic distances to user-provided scribbles, from which the whole data is automatically segmented.
Abstract: An interactive framework for soft segmentation and matting of natural images and videos is presented in this paper. The proposed technique is based on the optimal, linear time, computation of weighted geodesic distances to user-provided scribbles, from which the whole data is automatically segmented. The weights are based on spatial and/or temporal gradients, considering the statistics of the pixels scribbled by the user, without explicit optical flow or any advanced and often computationally expensive feature detectors. These could be naturally added to the proposed framework as well if desired, in the form of weights in the geodesic distances. An automatic localized refinement step follows this fast segmentation in order to further improve the results and accurately compute the corresponding matte function. Additional constraints into the distance definition permit to efficiently handle occlusions such as people or objects crossing each other in a video sequence. The presentation of the framework is complemented with numerous and diverse examples, including extraction of moving foreground from dynamic background in video, natural and 3D medical images, and comparisons with the recent literature.

309 citations


Journal ArticleDOI
TL;DR: The multifractal spectrum (MFS) is introduced, a new texture signature that is invariant under the bi-Lipschitz map, which includes view-point changes and non-rigid deformations of the texture surface, as well as local affine illumination changes.
Abstract: Image texture provides a rich visual description of the surfaces in the scene. Many texture signatures based on various statistical descriptions and various local measurements have been developed. Existing signatures, in general, are not invariant to 3D geometric transformations, which is a serious limitation for many applications. In this paper we introduce a new texture signature, called the multifractal spectrum (MFS). The MFS is invariant under the bi-Lipschitz map, which includes view-point changes and non-rigid deformations of the texture surface, as well as local affine illumination changes. It provides an efficient framework combining global spatial invariance and local robust measurements. Intuitively, the MFS could be viewed as a "better histogram" with greater robustness to various environmental changes and the advantage of capturing some geometrical distribution information encoded in the texture. Experiments demonstrate that the MFS codes the essential structure of textures with very low dimension, and thus represents an useful tool for texture classification.

300 citations


Journal ArticleDOI
TL;DR: A nonparametric region-based active contour model for segmenting cluttered scenes and a variant of the model that is able to properly segment a cluttered scene with local illumination changes is proposed.
Abstract: We propose and analyze a nonparametric region-based active contour model for segmenting cluttered scenes. The proposed model is unsupervised and assumes pixel intensity is independently identically distributed. Our proposed energy functional consists of a geometric regularization term that penalizes the length of the partition boundaries and a region-based image term that uses histograms of pixel intensity to distinguish different regions. More specifically, the region data encourages segmentation so that local histograms within each region are approximately homogeneous. An advantage of using local histograms in the data term is that histogram differentiation is not required to solve the energy minimization problem. We use Wasserstein distance with exponent 1 to determine the dissimilarity between two histograms. The Wasserstein distance is a metric and is able to faithfully measure the distance between two histograms, compared to many pointwise distances. Moreover, it is insensitive to oscillations, and therefore our model is robust to noise. A fast global minimization method based on (Chan et al. in SIAM J. Appl. Math. 66(5):1632---1648, 2006; Bresson et al. in J. Math. Imaging Vis. 28(2):151---167, 2007) is employed to solve the proposed model. The advantages of using this method are two-fold. First, the computational time is less than that of the method by gradient descent of the associated Euler-Lagrange equation (Chan et al. in Proc. of SSVM, pp. 697---708, 2007). Second, it is able to find a global minimizer. Finally, we propose a variant of our model that is able to properly segment a cluttered scene with local illumination changes.

254 citations


Journal ArticleDOI
TL;DR: This paper presents a novel approach to camera calibration where top-down information from rough camera parameter estimates and the output of a multi-view-stereo system on scaled-down input images is used to effectively guide the search for additional image correspondences and significantly improve camera calibration parameters using a standard bundle adjustment algorithm.
Abstract: The advent of high-resolution digital cameras and sophisticated multi-view stereo algorithms offers the promise of unprecedented geometric fidelity in image-based modeling tasks, but it also puts unprecedented demands on camera calibration to fulfill these promises. This paper presents a novel approach to camera calibration where top-down information from rough camera parameter estimates and the output of a multi-view-stereo system on scaled-down input images is used to effectively guide the search for additional image correspondences and significantly improve camera calibration parameters using a standard bundle adjustment algorithm (Lourakis and Argyros 2008). The proposed method has been tested on six real datasets including objects without salient features for which image correspondences cannot be found in a purely bottom-up fashion, and objects with high curvature and thin structures that are lost in visual hull construction even with small errors in camera parameters. Three different methods have been used to qualitatively assess the improvements of the camera parameters. The implementation of the proposed algorithm is publicly available at Furukawa and Ponce (2008b).

Journal ArticleDOI
TL;DR: A spatio-temporal saliency model that predicts eye movement during video free viewing inspired by the biology of the first steps of the human visual system is presented.
Abstract: This paper presents a spatio-temporal saliency model that predicts eye movement during video free viewing. This model is inspired by the biology of the first steps of the human visual system. The model extracts two signals from video stream corresponding to the two main outputs of the retina: parvocellular and magnocellular. Then, both signals are split into elementary feature maps by cortical-like filters. These feature maps are used to form two saliency maps: a static and a dynamic one. These maps are then fused into a spatio-temporal saliency map. The model is evaluated by comparing the salient areas of each frame predicted by the spatio-temporal saliency map to the eye positions of different subjects during a free video viewing experiment with a large database (17000 frames). In parallel, the static and the dynamic pathways are analyzed to understand what is more or less salient and for what type of videos our model is a good or a poor predictor of eye movement.

Journal ArticleDOI
TL;DR: A method is developed for the estimation of the essential matrix, giving the first guaranteed optimal algorithm for estimating the relative pose using a cost function based on reprojection errors.
Abstract: This paper introduces a new algorithmic technique for solving certain problems in geometric computer vision. The main novelty of the method is a branch-and-bound search over rotation space, which is used in this paper to determine camera orientation. By searching over all possible rotations, problems can be reduced to known fixed-rotation problems for which optimal solutions have been previously given. In particular, a method is developed for the estimation of the essential matrix, giving the first guaranteed optimal algorithm for estimating the relative pose using a cost function based on reprojection errors. Recently convex optimization techniques have been shown to provide optimal solutions to many of the common problems in structure from motion. However, they do not apply to problems involving rotations. The search method described in this paper allows such problems to be solved optimally. Apart from the essential matrix, the algorithm is applied to the camera pose problem, providing an optimal algorithm. The approach has been implemented and tested on a number of both synthetically generated and real data sets with good performance.

Journal ArticleDOI
TL;DR: This paper recalls in this paper how this is related to well-known approaches for mean curvature motion, introduced by Almgren et al. and shows how the corresponding problems can be solved with sub-pixel accuracy using Parametric Maximum Flow techniques.
Abstract: In a recent paper Boykov et al. (LNCS, Vol. 3953, pp. 409---422, 2006) propose an approach for computing curve and surface evolution using a variational approach and the geo-cuts method of Boykov and Kolmogorov (International conference on computer vision, pp. 26---33, 2003). We recall in this paper how this is related to well-known approaches for mean curvature motion, introduced by Almgren et al. (SIAM Journal on Control and Optimization 31(2):387---438, 1993) and Luckhaus and Sturzenhecker (Calculus of Variations and Partial Differential Equations 3(2):253---271, 1995), and show how the corresponding problems can be solved with sub-pixel accuracy using Parametric Maximum Flow techniques. This provides interesting algorithms for computing crystalline curvature motion, possibly with a forcing term.

Journal ArticleDOI
TL;DR: A novel method for acquiring high-quality solid models of complex 3D shapes from multiple calibrated photographs, along with qualitative and quantitative comparisons with several state-of-the-art image-based-modeling algorithms is presented.
Abstract: This article presents a novel method for acquiring high-quality solid models of complex 3D shapes from multiple calibrated photographs. After the purely geometric constraints associated with the silhouettes found in each image have been used to construct a coarse surface approximation in the form of a visual hull, photoconsistency constraints are enforced in three consecutive steps: (1) the rims where the surface grazes the visual hull are first identified through dynamic programming; (2) with the rims now fixed, the visual hull is carved using graph cuts to globally optimize the photoconsistency of the surface and recover its main features; (3) an iterative (local) refinement step is finally used to recover fine surface details. The proposed approach has been implemented, and experiments with seven real data sets are presented, along with qualitative and quantitative comparisons with several state-of-the-art image-based-modeling algorithms.

Journal ArticleDOI
TL;DR: A thorough quantitative evaluation of four image segmentation algorithms on images from the Berkeley Segmentation Database using an efficient algorithm for computing precision and recall with regard to human ground-truth boundaries is presented.
Abstract: We present a thorough quantitative evaluation of four image segmentation algorithms on images from the Berkeley Segmentation Database. The algorithms are evaluated using an efficient algorithm for computing precision and recall with regard to human ground-truth boundaries. We test each segmentation method over a representative set of input parameters, and present tuning curves that fully characterize algorithm performance over the complete image database. We complement the evaluation on the BSD with segmentation results on synthetic images. The results reported here provide a useful benchmark for current and future research efforts in image segmentation.

Journal ArticleDOI
TL;DR: This paper generalizes the original mean shift algorithm to data points lying on Riemannian manifolds to extend mean shift based clustering and filtering techniques to a large class of frequently occurring non-vector spaces in vision.
Abstract: The original mean shift algorithm is widely applied for nonparametric clustering in vector spaces. In this paper we generalize it to data points lying on Riemannian manifolds. This allows us to extend mean shift based clustering and filtering techniques to a large class of frequently occurring non-vector spaces in vision. We present an exact algorithm and prove its convergence properties as opposed to previous work which approximates the mean shift vector. The computational details of our algorithm are presented for frequently occurring classes of manifolds such as matrix Lie groups, Grassmann manifolds, essential matrices and symmetric positive definite matrices. Applications of the mean shift over these manifolds are shown.

Journal ArticleDOI
TL;DR: An analysis of the action of KBR on contrast is performed, showing the need to anti-symmetrize its equation in order to produce a two-sided contrast modification, able to enhance both under and over-exposed pictures.
Abstract: We present an interpretation of Land's Retinex theory that we show to be consistent with the original formulation. The proposed model relies on the computation of the expectation value of a suitable random variable weighted with a kernel function, thus the name Kernel-Based Retinex (KBR) for the corresponding algorithm. KBR shares the same intrinsic characteristics of the original Retinex: it can reduce the effect of a color cast and enhance details in low-key images but, since it can only increase pixel intensities, it is not able to enhance over-exposed pictures. Comparing the analytical structure of KBR with that of a recent variational model of color image enhancement, we are able to perform an analysis of the action of KBR on contrast, showing the need to anti-symmetrize its equation in order to produce a two-sided contrast modification, able to enhance both under and over-exposed pictures. The anti-symmetrized KBR equations show clear correspondences with other existing color correction models, in particular ACE, whose relationship with Retinex has always been difficult to clarify. Finally, from an image processing point of view, we mention that both KBR and its antisymmetric version are free from the chromatic noise due to the use of paths in the original Retinex implementation and that they can be suitably approximated in order to reduce their computational complexity from $\mathcal{O}(N^{2})$ to $\mathcal{O}(N\log N)$ , being N the number of input pixels.

Journal ArticleDOI
TL;DR: A method that detects and segments multiple, partially occluded objects in images by maximizing the joint likelihood and grouped, merged, and assigned to multiple object hypotheses is proposed.
Abstract: We propose a method that detects and segments multiple, partially occluded objects in images. A part hierarchy is defined for the object class. Both the segmentation and detection tasks are formulated as binary classification problem. A whole-object segmentor and several part detectors are learned by boosting local shape feature based weak classifiers. Given a new image, the part detectors are applied to obtain a number of part responses. All the edge pixels in the image that positively contribute to the part responses are extracted. A joint likelihood of multiple objects is defined based on the part detection responses and the object edges. Computation of the joint likelihood includes an inter-object occlusion reasoning that is based on the object silhouettes extracted with the whole-object segmentor. By maximizing the joint likelihood, part detection responses are grouped, merged, and assigned to multiple object hypotheses. The proposed approach is demonstrated with the class of pedestrians. The experimental results show that our method outperforms the previous ones.

Journal ArticleDOI
TL;DR: A description-based approach, which enables a user to encode the structure of a high-level human activity as a formal representation, and a system which reliably recognizes sequences of complex human activities with a high recognition rate.
Abstract: This paper describes a methodology for automated recognition of complex human activities. The paper proposes a general framework which reliably recognizes high-level human actions and human-human interactions. Our approach is a description-based approach, which enables a user to encode the structure of a high-level human activity as a formal representation. Recognition of human activities is done by semantically matching constructed representations with actual observations. The methodology uses a context-free grammar (CFG) based representation scheme as a formal syntax for representing composite activities. Our CFG-based representation enables us to define complex human activities based on simpler activities or movements. Our system takes advantage of both statistical recognition techniques from computer vision and knowledge representation concepts from traditional artificial intelligence. In the low-level of the system, image sequences are processed to extract poses and gestures. Based on the recognition of gestures, the high-level of the system hierarchically recognizes composite actions and interactions occurring in a sequence of image frames. The concept of hallucinations and a probabilistic semantic-level recognition algorithm is introduced to cope with imperfect lower-layers. As a result, the system recognizes human activities including `fighting' and `assault', which are high-level activities that previous systems had difficulties. The experimental results show that our system reliably recognizes sequences of complex human activities with a high recognition rate.

Journal ArticleDOI
TL;DR: A new global optimization method to the field of multiview 3D reconstruction is introduced to cast the problem of 3D shape reconstruction as one of minimizing a spatially continuous convex functional.
Abstract: In this article, we introduce a new global optimization method to the field of multiview 3D reconstruction. While global minimization has been proposed in a discrete formulation in form of the maxflow-mincut framework, we suggest the use of a continuous convex relaxation scheme. Specifically, we propose to cast the problem of 3D shape reconstruction as one of minimizing a spatially continuous convex functional. In qualitative and quantitative evaluation we demonstrate several advantages of the proposed continuous formulation over the discrete graph cut solution. Firstly, geometric properties such as weighted boundary length and surface area are represented in a numerically consistent manner: The continuous convex relaxation assures that the algorithm does not suffer from metrication errors in the sense that the reconstruction converges to the continuous solution as the spatial resolution is increased. Moreover, memory requirements are reduced, allowing for globally optimal reconstructions at higher resolutions. We study three different energy models for multiview reconstruction, which are based on a common variational template unifying regional volumetric terms and on-surface photoconsistency. The three models use data measurements at increasing levels of sophistication. While the first two approaches are based on a classical silhouette-based volume subdivision, the third one relies on stereo information to define regional costs. Furthermore, this scheme is exploited to compute a precise photoconsistency measure as opposed to the classical estimation. All three models are compared on standard data sets demonstrating their advantages and shortcomings. For the third one, which gives the most accurate results, a more exhaustive qualitative and quantitative evaluation is presented.

Journal ArticleDOI
TL;DR: A variational model to perform the fusion of an arbitrary number of images while preserving the salient information and enhancing the contrast for visualization through a minimization functional approach which implicitly takes into account a set of human vision characteristics.
Abstract: We present a variational model to perform the fusion of an arbitrary number of images while preserving the salient information and enhancing the contrast for visualization. We propose to use the structure tensor to simultaneously describe the geometry of all the inputs. The basic idea is that the fused image should have a structure tensor which approximates the structure tensor obtained from the multiple inputs. At the same time, the fused image should appear `natural' and `sharp' to a human interpreter. We therefore propose to combine the geometry merging of the inputs with perceptual enhancement and intensity correction. This is performed through a minimization functional approach which implicitly takes into account a set of human vision characteristics.

Journal ArticleDOI
TL;DR: It is shown that this mutual reinforcement of object-level and feature-level similarity improves unsupervised image clustering, and the technique is applied to automatically discover categories and foreground regions in images from benchmark datasets.
Abstract: We present a method to automatically discover meaningful features in unlabeled image collections. Each image is decomposed into semi-local features that describe neighborhood appearance and geometry. The goal is to determine for each image which of these parts are most relevant, given the image content in the remainder of the collection. Our method first computes an initial image-level grouping based on feature correspondences, and then iteratively refines cluster assignments based on the evolving intra-cluster pattern of local matches. As a result, the significance attributed to each feature influences an image's cluster membership, while related images in a cluster affect the estimated significance of their features. We show that this mutual reinforcement of object-level and feature-level similarity improves unsupervised image clustering, and apply the technique to automatically discover categories and foreground regions in images from benchmark datasets.

Journal ArticleDOI
TL;DR: A statistical interpretation of the full (piecewise smooth) Mumford-Shah functional is derived by relating it to recent works on local region statistics and it is shown that this statistical interpretation comes along with several implications that can lead to faster implementations.
Abstract: The Mumford-Shah functional is a general and quite popular variational model for image segmentation. In particular, it provides the possibility to represent regions by smooth approximations. In this paper, we derive a statistical interpretation of the full (piecewise smooth) Mumford-Shah functional by relating it to recent works on local region statistics. Moreover, we show that this statistical interpretation comes along with several implications. Firstly, one can derive extended versions of the Mumford-Shah functional including more general distribution models. Secondly, it leads to faster implementations. Finally, thanks to the analytical expression of the smooth approximation via Gaussian convolution, the coordinate descent can be replaced by a true gradient descent.

Journal ArticleDOI
TL;DR: An automatic 3D face recognition approach which can accurately differentiate between expression deformations and interpersonal disparities and hence recognize faces under any facial expression is presented.
Abstract: The accuracy of non-rigid 3D face recognition approaches is highly influenced by their capacity to differentiate between the deformations caused by facial expressions from the distinctive geometric attributes that uniquely characterize a 3D face, interpersonal disparities. We present an automatic 3D face recognition approach which can accurately differentiate between expression deformations and interpersonal disparities and hence recognize faces under any facial expression. The patterns of expression deformations are first learnt from training data in PCA eigenvectors. These patterns are then used to morph out the expression deformations. Similarity measures are extracted by matching the morphed 3D faces. PCA is performed in such a way it models only the facial expressions leaving out the interpersonal disparities. The approach was applied on the FRGC v2.0 dataset and superior recognition performance was achieved. The verification rates at 0.001 FAR were 98.35% and 97.73% for scans under neutral and non-neutral expressions, respectively.

Journal ArticleDOI
TL;DR: This work presents a novel approach to quantify partial similarity using the notion of Pareto optimality, and exemplifies this approach on the problems of recognizing non-rigid geometric objects, images, and analyzing text sequences.
Abstract: Similarity is one of the most important abstract concepts in human perception of the world. In computer vision, numerous applications deal with comparing objects observed in a scene with some a priori known patterns. Often, it happens that while two objects are not similar, they have large similar parts, that is, they are partially similar. Here, we present a novel approach to quantify partial similarity using the notion of Pareto optimality. We exemplify our approach on the problems of recognizing non-rigid geometric objects, images, and analyzing text sequences.

Journal ArticleDOI
TL;DR: A method for pose-invariant facial expression recognition from monocular video sequences that uses a simple model, called the variable-intensity template, for describing different facial expressions, and is capable of estimating facial poses and expressions simultaneously.
Abstract: In this paper, we propose a method for pose-invariant facial expression recognition from monocular video sequences. The advantage of our method is that, unlike existing methods, our method uses a simple model, called the variable-intensity template, for describing different facial expressions. This makes it possible to prepare a model for each person with very little time and effort. Variable-intensity templates describe how the intensities of multiple points, defined in the vicinity of facial parts, vary with different facial expressions. By using this model in the framework of a particle filter, our method is capable of estimating facial poses and expressions simultaneously. Experiments demonstrate the effectiveness of our method. A recognition rate of over 90% is achieved for all facial orientations, horizontal, vertical, and in-plane, in the range of ±40 degrees, ±20 degrees, and ±40 degrees from the frontal view, respectively.

Journal ArticleDOI
TL;DR: A method for coherence-enhancing diffusion on the invertible orientation score of a 2D image and two explicit finite-difference schemes to apply the nonlinear diffusion in the orientation score and provide a stability analysis are proposed.
Abstract: Many image processing problems require the enhancement of crossing elongated structures. These problems cannot easily be solved by commonly used coherence-enhancing diffusion methods. Therefore, we propose a method for coherence-enhancing diffusion on the invertible orientation score of a 2D image. In an orientation score, the local orientation is represented by an additional third dimension, ensuring that crossing elongated structures are separated from each other. We consider orientation scores as functions on the Euclidean motion group, and use the group structure to apply left-invariant diffusion equations on orientation scores. We describe how we can calculate regularized left-invariant derivatives, and use the Hessian to estimate three descriptive local features: curvature, deviation from horizontality, and orientation confidence. These local features are used to adapt a nonlinear coherence-enhancing, crossing-preserving, diffusion equation on the orientation score. We propose two explicit finite-difference schemes to apply the nonlinear diffusion in the orientation score and provide a stability analysis. Experiments on both artificial and medical images show that preservation of crossings is the main advantage compared to standard coherence-enhancing diffusion. The use of curvature leads to improved enhancement of curves with high curvature. Furthermore, the use of deviation from horizontality makes it feasible to reduce the number of sampled orientations while still preserving crossings.

Journal ArticleDOI
TL;DR: This paper focuses on the exploitation of subtle relative-motion cues present at occlusion boundaries and presents a novel, mid-level model for reasoning more globally about object boundaries and propagating such local information to extract improved, extended boundaries.
Abstract: The boundaries of objects in an image are often considered a nuisance to be "handled" due to the occlusion they exhibit. Since most, if not all, computer vision techniques aggregate information spatially within a scene, information spanning these boundaries, and therefore from different physical surfaces, is invariably and erroneously considered together. In addition, these boundaries convey important perceptual information about 3D scene structure and shape. Consequently, their identification can benefit many different computer vision pursuits, from low-level processing techniques to high-level reasoning tasks. While much focus in computer vision is placed on the processing of individual, static images, many applications actually offer video, or sequences of images, as input. The extra temporal dimension of the data allows the motion of the camera or the scene to be used in processing. In this paper, we focus on the exploitation of subtle relative-motion cues present at occlusion boundaries. When combined with more standard appearance information, we demonstrate these cues' utility in detecting occlusion boundaries locally. We also present a novel, mid-level model for reasoning more globally about object boundaries and propagating such local information to extract improved, extended boundaries.

Journal ArticleDOI
TL;DR: The stationary parameterization is included for diffeomorphic registration in the LDDMM framework and the variational problem related to this registration scenario is formulated and the associated Euler-Lagrange equations are derived.
Abstract: Computational Anatomy aims for the study of variability in anatomical structures from images. Variability is encoded by the spatial transformations existing between anatomical images and a template selected as reference. In the absence of a more justified model for inter-subject variability, transformations are considered to belong to a convenient family of diffeomorphisms which provides a suitable mathematical setting for the analysis of anatomical variability. One of the proposed paradigms for diffeomorphic registration is the Large Deformation Diffeomorphic Metric Mapping (LDDMM). In this framework, transformations are characterized as end points of paths parameterized by time-varying flows of vector fields defined on the tangent space of a Riemannian manifold of diffeomorphisms and computed from the solution of the non-stationary transport equation associated to these flows. With this characterization, optimization in LDDMM is performed on the space of non-stationary vector field flows resulting into a time and memory consuming algorithm. Recently, an alternative characterization of paths of diffeomorphisms based on constant-time flows of vector fields has been proposed in the literature. With this parameterization, diffeomorphisms constitute solutions of stationary ODEs. In this article, the stationary parameterization is included for diffeomorphic registration in the LDDMM framework. We formulate the variational problem related to this registration scenario and derive the associated Euler-Lagrange equations. Moreover, the performance of the non-stationary vs the stationary parameterizations in real and simulated 3D-MRI brain datasets is evaluated. Compared to the non-stationary parameterization, our proposal provides similar results in terms of image matching and local differences between the diffeomorphic transformations while drastically reducing memory and time requirements.