scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Proceedings ArticleDOI
20 Jun 2011
TL;DR: H hierarchicalkernel descriptors are proposed that apply kernel descriptors recursively to form image-level features and thus provide a conceptually simple and consistent way to generate image- level features from pixel attributes.
Abstract: Kernel descriptors [1] provide a unified way to generate rich visual feature sets by turning pixel attributes into patch-level features, and yield impressive results on many object recognition tasks. However, best results with kernel descriptors are achieved using efficient match kernels in conjunction with nonlinear SVMs, which makes it impractical for large-scale problems. In this paper, we propose hierarchical kernel descriptors that apply kernel descriptors recursively to form image-level features and thus provide a conceptually simple and consistent way to generate image-level features from pixel attributes. More importantly, hierarchical kernel descriptors allow linear SVMs to yield state-of-the-art accuracy while being scalable to large datasets. They can also be naturally extended to extract features over depth images. We evaluate hierarchical kernel descriptors both on the CIFAR10 dataset and the new RGB-D Object Dataset consisting of segmented RGB and depth images of 300 everyday objects.

261 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...The most popular and successful local descriptors are orientation histograms including SIFT [19] and HOG [5], which are robust to minor transformations of images....

    [...]

  • ...Kernel descriptors include SIFT and HOG as special cases, and provide a principled way generate rich patch-level features from various pixel attributes....

    [...]

  • ...Experiments on both CIFAR10 and the RGB-D Object Dataset (available at http://www.cs.washington.edu/rgbd-dataset) show that hierarchical kernel descriptors outperform kernel descriptors and many state-of-the-art algorithms including deep belief nets, convolutional neural networks, and local coordinate coding with carefully tuned SIFT features....

    [...]

  • ...The combination of three hierarchical kernel descriptors has an accuracy of 80.0%, higher than all other competing techniques; its accuracy is 14.4 percent higher than SIFT, 9.0 percent higher than mcRBM combined with DBNs, and 5.5 percent higher than the improved LCC. Hierarchical kernel descriptors slightly outperform the very recent work: the convolutional RBM and the triangle Kmeans with 4000 centers [4]....

    [...]

  • ...Hua et al. [10] learned a linear transformation for SIFT using linear discriminant analysis and showed better results with lower dimensionality than SIFT on local feature matching problems....

    [...]

Posted Content
TL;DR: This paper compares the performance of three different image matching techniques, i.e., SIFT, SURF, and ORB, against different kinds of transformations and deformations such as scaling, rotation, noise, fish eye distortion, and shearing and shows that which algorithm is the best more robust against each kind of distortion.
Abstract: Fast and robust image matching is a very important task with various applications in computer vision and robotics. In this paper, we compare the performance of three different image matching techniques, i.e., SIFT, SURF, and ORB, against different kinds of transformations and deformations such as scaling, rotation, noise, fish eye distortion, and shearing. For this purpose, we manually apply different types of transformations on original images and compute the matching evaluation parameters such as the number of key points in images, the matching rate, and the execution time required for each algorithm and we will show that which algorithm is the best more robust against each kind of distortion. Index Terms-Image matching, scale invariant feature transform (SIFT), speed up robust feature (SURF), robust independent elementary features (BRIEF), oriented FAST, rotated BRIEF (ORB).

261 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...Thirdly, a key point orientation assignment based on local image gradient and lastly a descriptor generator to compute the local image descriptor for each key point based on image gradient magnitude and orientation [3]....

    [...]

  • ...Although SIFT has proven to be very efficient in object recognition applications, it requires a large computational complexity which is a major drawback especially for real-time applications [3, 4]....

    [...]

  • ...Scale Invariant Feature Transform (SIFT) is a feature detector developed by Lowe in 2004 [3]....

    [...]

Proceedings Article
08 Dec 2014
TL;DR: This paper proposes a novel deep neural net, named multi-view perceptron (MVP), which can untangle the identity and view features, and in the meanwhile infer a full spectrum of multi- view images, given a single 2D face image.
Abstract: Various factors, such as identity, view, and illumination, are coupled in face images. Disentangling the identity and view representations is a major challenge in face recognition. Existing face recognition systems either use handcrafted features or learn features discriminatively to improve recognition accuracy. This is different from the behavior of primate brain. Recent studies [5, 19] discovered that primate brain has a face-processing network, where view and identity are processed by different neurons. Taking into account this instinct, this paper proposes a novel deep neural net, named multi-view perceptron (MVP), which can untangle the identity and view features, and in the meanwhile infer a full spectrum of multi-view images, given a single 2D face image. The identity features of MVP achieve superior performance on the MultiPIE dataset. MVP is also capable to interpolate and predict images under viewpoints that are unobserved in the training data.

258 citations

Journal ArticleDOI
28 Jul 2014
TL;DR: This paper presents the first publicly available face database based on the Kinect sensor, and conducts benchmark evaluations on the proposed database using standard face recognition methods, and demonstrates the gain in performance when integrating the depth data with the RGB data via score-level fusion.
Abstract: The recent success of emerging RGB-D cameras such as the Kinect sensor depicts a broad prospect of 3-D data-based computer applications. However, due to the lack of a standard testing database, it is difficult to evaluate how the face recognition technology can benefit from this up-to-date imaging sensor. In order to establish the connection between the Kinect and face recognition research, in this paper, we present the first publicly available face database (i.e., KinectFaceDB 1 ) based on the Kinect sensor. The database consists of different data modalities (well-aligned and processed 2-D, 2.5-D, 3-D, and video-based face data) and multiple facial variations. We conducted benchmark evaluations on the proposed database using standard face recognition methods, and demonstrated the gain in performance when integrating the depth data with the RGB data via score-level fusion. We also compared the 3-D images of Kinect (from the KinectFaceDB) with the traditional high-quality 3-D scans (from the FRGC database) in the context of face biometrics, which reveals the imperative needs of the proposed database for face recognition research. 1 Online at http://rgb-d.eurecom.fr

257 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...Tables VIII and IX illustrate the fusion results from both the RGB and depth using PCA, LBP, and LGBP (SIFT is not used because it cannot capture the correct information from depth images as shown in Section IV-D2)....

    [...]

  • ...Because the depth map is highly smooth, the SIFT-based method is inappropriate for 2.5-...

    [...]

  • ...PCA [19] (i.e., the Eigenface method), LBP [21], SIFT [81], and LGBP [34]-based methods are selected as the baseline techniques for the 2-...

    [...]

  • ...SIFT-based method extracts the key points from all training and testing images, where the similarity measure is achieved by key points matching....

    [...]

  • ...In the contrary, since LBP, SIFT, and LGBP are local-based methods, they are more robust to such local distortions....

    [...]

Journal ArticleDOI
TL;DR: Comprehensive evaluation of efficiency, distribution quality, and positional accuracy of the extracted point pairs proves the capabilities of the proposed matching algorithm on a variety of optical remote sensing images.
Abstract: Extracting well-distributed, reliable, and precisely aligned point pairs for accurate image registration is a difficult task, particularly for multisource remote sensing images that have significant illumination, rotation, and scene differences. The scale-invariant feature transform (SIFT) approach, as a well-known feature-based image matching algorithm, has been successfully applied in a number of automatic registration of remote sensing images. Regardless of its distinctiveness and robustness, the SIFT algorithm suffers from some problems in the quality, quantity, and distribution of extracted features particularly in multisource remote sensing imageries. In this paper, an improved SIFT algorithm is introduced that is fully automated and applicable to various kinds of optical remote sensing images, even with those that are five times the difference in scale. The main key of the proposed approach is a selection strategy of SIFT features in the full distribution of location and scale where the feature qualities are quarantined based on the stability and distinctiveness constraints. Then, the extracted features are introduced to an initial cross-matching process followed by a consistency check in the projective transformation model. Comprehensive evaluation of efficiency, distribution quality, and positional accuracy of the extracted point pairs proves the capabilities of the proposed matching algorithm on a variety of optical remote sensing images.

255 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...In our research, regarding [21], the feature contrast (i....

    [...]

  • ...Therefore, extrema in which the ratio of H eigenvalues is above a threshold, for example, Tr = 10 (proposed in [21]), are considered as points corresponding to edges and discarded (for more explanations, see [21])....

    [...]

  • ...03 (a threshold proposed by Lowe [21])....

    [...]

  • ...SIFT features are scale invariant and accurate, and they are robust against illumination differences, changes in 3-D viewpoint, and image noise [21]....

    [...]

  • ...The famous SIFT algorithm proposed by Lowe [21] consists of three main modules: feature extraction, feature description,...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.