scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: By proposing an additional technique that makes the feature descriptor robust to rotation, the efficiency of the algorithm is validated and it is proved that it is about 30 times faster than those based on Gabor filters.
Abstract: A good feature descriptor is desired to be discriminative, robust, and computationally inexpensive in both terms of time and storage requirement. In the domain of face recognition, these properties allow the system to quickly deliver high recognition results to the end user. Motivated by the recent feature descriptor called Patterns of Oriented Edge Magnitudes (POEM), which balances the three concerns, this paper aims at enhancing its performance with respect to all these criteria. To this end, we first optimize the parameters of POEM and then apply the whitened principal-component-analysis dimensionality reduction technique to get a more compact, robust, and discriminative descriptor. For face recognition, the efficiency of our algorithm is proved by strong results obtained on both constrained (Face Recognition Technology, FERET) and unconstrained (Labeled Faces in the Wild, LFW) data sets in addition with the low complexity. Impressively, our algorithm is about 30 times faster than those based on Gabor filters. Furthermore, by proposing an additional technique that makes our descriptor robust to rotation, we validate its efficiency for the task of image matching.

200 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...The invariance to rotation is often obtained by normalizing the region with respect to rotation and then computing the description [27], [30], [31]....

    [...]

  • ...2, as suggested in [27], are set to the threshold....

    [...]

  • ...A wide variety of detectors and descriptors have already been proposed in the literature [27], [29]–[31]....

    [...]

  • ...Finding correspondences between two images of the same scene or object is part of many computer vision applications, such as object recognition [27] and wide baseline matching [28]....

    [...]

  • ...Lowe [27] proposed a SIFT, which combines a scale-invariant region detector and a descriptor based on gradient distribution in the detected regions....

    [...]

Journal ArticleDOI
TL;DR: The recent progress in visual feature detection is presented and future trends as well as challenges are identified and the relations among different kinds of features are covered.

199 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...The classical gradient based corner detection leads to gradient based features such as SIFT [39], SURF [40]....

    [...]

  • ...Contour based DoG-curve [35], ANDD [36],Hyperbola Fitting [37], ACJ [38] Blob detection Interest point PDE based LoG, DoG, DoH, Hessian-Laplacian [7], SIFT [39], SURF [40], Cer-SURF [41], DART [42], Rank-SIFT [43], RLOG [44], MO-GP [45], KAZE [46], A-KAZE [47], WADE [48] Template based ORB [49], BRISK [50], FREAK [51]...

    [...]

  • ...According to the experiments, the computational efficiency is lifted up meanwhile the repeatability of SURF [40], CerSURF [41], DART [42] is comparable of classical SIFT [39] in viewpoint, scale, illumination changes....

    [...]

  • ...SIFT (Scale Invariant Feature Transform) [39] locates interest points with DoG pyramid and Hessian matrix....

    [...]

  • ...NMX [20] Pb edge responses, SIFT [39] F-measure AnyBoost...

    [...]

Journal ArticleDOI
TL;DR: This paper proposes to combine the human pose estimation module, the MRF-based color and category inference module and the (super)pixel-level category classifier learning module to generate multiple well-performing category classifiers, which can be directly applied to parse the fashion items in the images.
Abstract: In this paper we address the problem of automatically parsing the fashion images with weak supervision from the user-generated color-category tags such as “red jeans” and “white T-shirt”. This problem is very challenging due to the large diversity of fashion items and the absence of pixel-level tags, which make the traditional fully supervised algorithms inapplicable. To solve the problem, we propose to combine the human pose estimation module, the MRF-based color and category inference module and the (super)pixel-level category classifier learning module to generate multiple well-performing category classifiers, which can be directly applied to parse the fashion items in the images. Besides, all the training images are parsed with color-category labels and the human poses of the images are estimated during the model learning phase in this work. We also construct a new fashion image dataset called Colorful-Fashion, in which all 2,682 images are labeled with pixel-level color-category labels. Extensive experiments on this dataset clearly show the effectiveness of the proposed method for the weakly supervised fashion parsing task.

199 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...For SIFT and HOG features, we apply dense sampling strategy with 4 4 step size and then generate the BoWs features....

    [...]

  • ...We extract four types of features for each patch, including color, SIFT [24], HOG [8] and location features....

    [...]

  • ...LabelMe subset (also known as SIFT Flow dataset) [19] includes 2688 images....

    [...]

  • ...The dictionary sizes for SIFT and HOG are both set as 300 [5]....

    [...]

Proceedings ArticleDOI
16 Jul 2011
TL;DR: A new nearest neighbor search algorithm that builds a nearest neighbor graph in an offline phase and when queried with a new point, performs hill-climbing starting from a randomly sampled node of the graph.
Abstract: We introduce a new nearest neighbor search algorithm. The algorithm builds a nearest neighbor graph in an offline phase and when queried with a new point, performs hill-climbing starting from a randomly sampled node of the graph. We provide theoretical guarantees for the accuracy and the computational complexity and empirically show the effectiveness of this algorithm.

198 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...We extracted 5 datasets of 17000, 50000, 118000 and 204000 (128-dimensional) SIFT descriptors2 [Lowe, 2004]....

    [...]

Proceedings ArticleDOI
20 Jun 2009
TL;DR: This work presents an approach for markerless motion capture (MoCap) of articulated objects, which are recorded with multiple unsynchronized moving cameras, which allows us to track people with off-the-shelf handheld video cameras.
Abstract: In this work we present an approach for markerless motion capture (MoCap) of articulated objects, which are recorded with multiple unsynchronized moving cameras. Instead of using fixed (and expensive) hardware synchronized cameras, this approach allows us to track people with off-the-shelf handheld video cameras. To prepare a sequence for motion capture, we first reconstruct the static background and the position of each camera using Structure-from-Motion (SfM). Then the cameras are registered to each other using the reconstructed static background geometry. Camera synchronization is achieved via the audio streams recorded by the cameras in parallel. Finally, a markerless MoCap approach is applied to recover positions and joint configurations of subjects. Feature tracks and dense background geometry are further used to stabilize the MoCap. The experiments show examples with highly challenging indoor and outdoor scenes.

198 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...To estimate the parameters of a single moving camera we apply a feature-based approach, where corresponding feature points are determined in consecutive frames with the KLT-Tracker [21] or SIFT matching [10]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.