scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
14 Mar 2017
TL;DR: In this paper, the authors combine semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model to estimate the continuous 6-DoF pose of an object from a single RGB image.
Abstract: This paper presents a novel approach to estimating the continuous six degree of freedom (6-DoF) pose (3D translation and rotation) of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. Unlike prior work, we are agnostic to whether the object is textured or textureless, as the convnet learns the optimal representation from the available training image data. Furthermore, the approach can be applied to instance- and class-based pose recovery. Empirically, we show that the proposed approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios with a cluttered background. For class-based object pose estimation, state-of-the-art accuracy is shown on the large-scale PASCAL3D+ dataset.

351 citations

Journal ArticleDOI
TL;DR: It is hoped that the functional description of the deep hierarchies realized in the primate visual system provides valuable insights for the design of computer vision algorithms, fostering increasingly productive interaction between biological and computer vision research.
Abstract: Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Organized for a computer vision audience, we present functional principles of the processing hierarchies present in the primate visual system considering recent discoveries in neurophysiology. The hierarchical processing in the primate visual system is characterized by a sequence of different levels of processing (on the order of 10) that constitute a deep hierarchy in contrast to the flat vision architectures predominantly used in today's mainstream computer vision. We hope that the functional description of the deep hierarchies realized in the primate visual system provides valuable insights for the design of computer vision algorithms, fostering increasingly productive interaction between biological and computer vision research.

350 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: This paper proposes a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a discriminative dictionary and proposes a max-margin approach to train the dictionary modulated by CRF, and meanwhile a CRF with sparse coding.
Abstract: Top-down visual saliency facilities object localization by providing a discriminative representation of target objects and a probability map for reducing the search space. In this paper, we propose a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a discriminative dictionary. The proposed model is formulated based on a CRF with latent variables. By using sparse codes as latent variables, we train the dictionary modulated by CRF, and meanwhile a CRF with sparse coding. We propose a max-margin approach to train our model via fast inference algorithms. We evaluate our model on the Graz-02 and PASCAL VOC 2007 datasets. Experimental results show that our model performs favorably against the state-of-the-art top-down saliency methods. We also observe that the dictionary update significantly improves the model performance.

350 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...The SIFT descriptors [16] are extracted from each image patch to represent the object appearance....

    [...]

Journal ArticleDOI
TL;DR: This paper introduces a discriminative model for the retrieval of images from text queries that formalizes the retrieval task as a ranking problem, and introduces a learning procedure optimizing a criterion related to the ranking performance.
Abstract: This paper introduces a discriminative model for the retrieval of images from text queries. Our approach formalizes the retrieval task as a ranking problem, and introduces a learning procedure optimizing a criterion related to the ranking performance. The proposed model hence addresses the retrieval problem directly and does not rely on an intermediate image annotation task, which contrasts with previous research. Moreover, our learning procedure builds upon recent work on the online learning of kernel-based classifiers. This yields an efficient, scalable algorithm, which can benefit from recent kernels developed for image comparison. The experiments performed over stock photography data show the advantage of our discriminative ranking approach over state-of-the-art alternatives (e.g. our model yields 26.3% average precision over the Corel dataset, which should be compared to 22.0%, for the best alternative model evaluated). Further analysis of the results shows that our model is especially advantageous over difficult queries such as queries with few relevant pictures or multiple-word queries.

349 citations

Journal ArticleDOI
27 Jul 2009
TL;DR: A Mobius Voting algorithm that iteratively produces "votes" for predicted correspondences between the mutually closest points with magnitude representing their estimated deviation from isometry, which is converted to a permutation matrix with simple matrix operations and output as a discrete set of point correspondences with confidence values.
Abstract: The goal of our work is to develop an efficient, automatic algorithm for discovering point correspondences between surfaces that are approximately and/or partially isometric.Our approach is based on three observations. First, isometries are a subset of the Mobius group, which has low-dimensionality -- six degrees of freedom for topological spheres, and three for topological discs. Second, computing the Mobius transformation that interpolates any three points can be computed in closed-form after a mid-edge flattening to the complex plane. Third, deviations from isometry can be modeled by a transportation-type distance between corresponding points in that plane.Motivated by these observations, we have developed a Mobius Voting algorithm that iteratively: 1) samples a triplet of three random points from each of two point sets, 2) uses the Mobius transformations defined by those triplets to map both point sets into a canonical coordinate frame on the complex plane, and 3) produces "votes" for predicted correspondences between the mutually closest points with magnitude representing their estimated deviation from isometry. The result of this process is a fuzzy correspondence matrix, which is converted to a permutation matrix with simple matrix operations and output as a discrete set of point correspondences with confidence values.The main advantage of this algorithm is that it can find intrinsic point correspondences in cases of extreme deformation. During experiments with a variety of data sets, we find that it is able to find dozens of point correspondences between different object types in different poses fully automatically.

348 citations

References
More filters
Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations


"Distinctive Image Features from Sca..." refers background or methods in this paper

  • ...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....

    [...]

  • ...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....

    [...]

  • ...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....

    [...]

  • ...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...

    [...]

  • ...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....

    [...]

Book
01 Jan 2000
TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.
Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

15,558 citations

01 Jan 2001
TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.
Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

14,282 citations


"Distinctive Image Features from Sca..." refers background in this paper

  • ...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....

    [...]

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.