scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: An improved method for tentative correspondence selection, applicable both with and without view synthesis, and a modification of the standard first to second nearest distance rule increases the number of correct matches by 5–20% at no additional computational cost are introduced.

158 citations

Book
19 Apr 2011
TL;DR: This lecture summarizes what is and isn't possible to do reliably today, and overviews key concepts that could be employed in systems requiring visual categorization, with an emphasis on recent advances in the field.
Abstract: The visual recognition problem is central to computer vision research. From robotics to information retrieval, many desired applications demand the ability to identify and localize categories, places, and objects. This tutorial overviews computer vision algorithms for visual object recognition and image classification. We introduce primary representations and learning approaches, with an emphasis on recent advances in the field. The target audience consists of researchers or students working in AI, robotics, or vision who would like to understand what methods and representations are available for these problems. This lecture summarizes what is and isn't possible to do reliably today, and overviews key concepts that could be employed in systems requiring visual categorization. Table of Contents: Introduction / Overview: Recognition of Specific Objects / Local Features: Detection and Description / Matching Local Features / Geometric Verification of Matched Features / Example Systems: Specific-Object Recognition / Overview: Recognition of Generic Object Categories / Representations for Object Categories / Generic Object Detection: Finding and Scoring Candidates / Learning Generic Object Category Models / Example Systems: Generic Object Recognition / Other Considerations and Current Challenges / Conclusions

158 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...The kd-tree (Freidman et al. 1977) is one such approach that has often been employed to match local descriptors, in several variants (e.g., (Lowe 2004, Beis & Lowe 1997, Muja & Lowe 2009, Silpa-Anan & Hartley 2008))....

    [...]

  • ...Due to the specificity of state-of-the-art feature descriptors such as SIFT (Lowe 2004) or SURF (Bay et al....

    [...]

  • ...The most popular choice for this step is the SIFT descriptor (Lowe 2004), which we present in detail in the following....

    [...]

  • ...An often-used strategy (initially proposed by Lowe (2004)) is to consider the ratio of the distance to the closest neighbor to that of the second-closest one as a decision criterion....

    [...]

  • ...Local features (such as SIFT (Lowe 2004)) are independently extracted from both images, and their descriptors are matched to establish putative correspondences....

    [...]

Posted Content
TL;DR: This survey gives an overview over different techniques used for pixel-level semantic segmentation such as unsupervised methods, Decision Forests and SVMs and recently published approaches with convolutional neural networks.
Abstract: This survey gives an overview over different techniques used for pixel-level semantic segmentation. Metrics and datasets for the evaluation of segmentation algorithms and traditional approaches for segmentation such as unsupervised methods, Decision Forests and SVMs are described and pointers to the relevant papers are given. Recently published approaches with convolutional neural networks are mentioned and typical problematic situations for segmentation algorithms are examined. A taxonomy of segmentation algorithms is given.

158 citations

Journal ArticleDOI
TL;DR: A novel method based on bilateral filter (BF) scale-invariant feature transform (SIFT) (BFSIFT) to find feature matches for synthetic aperture radar (SAR) image registration, where more accurately located matches can be found in the anisotropic scale space.
Abstract: In this letter, we propose a novel method based on bilateral filter (BF) scale-invariant feature transform (SIFT) (BFSIFT) to find feature matches for synthetic aperture radar (SAR) image registration. First, the anisotropic scale space of the image is constructed using BFs. The constructing process is noniterative and fast. Compared with the Gaussian scale space used in SIFT, more accurately located matches can be found in the anisotropic one. Then, keypoints are detected and described in the coarser scales using SIFT. At last, dual-matching strategy and random sample consensus are used to establish matches. The probability of correct matching is significantly increased by skipping the finest scale and by the dual-matching strategy. Experiments on various slant range images demonstrate the applicability of BFSIFT to find feature matches for SAR image registration.

157 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Recently, scale-invariant feature transform (SIFT) [5] has been widely applied to many matching applications....

    [...]

Proceedings ArticleDOI
01 Sep 2009
TL;DR: A semi-automatic system that converts conventional video shots to stereoscopic video pairs using a diffusion scheme and a classification scheme that assigns depth to image patches, which tolerates both scene motion and camera motion.
Abstract: We present a semi-automatic system that converts conventional video shots to stereoscopic video pairs. The system requires just a few user-scribbles in a sparse set of frames. The system combines a diffusion scheme, which takes into account the local saliency and the local motion at each video location, coupled with a classification scheme that assigns depth to image patches. The system tolerates both scene motion and camera motion. In typical shots, containing hundreds of frames, even in the face of significant motion, it is enough to mark scribbles on the first and last frames of the shot. Once marked, plausible stereo results are obtained in a matter of seconds, leading to a scalable video conversion system. Finally, we validate our results with ground truth stereo video.

157 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...We have experimented with using the gray values themselves, color histograms, SIFT [15], and SIFT+gray descriptors....

    [...]

  • ...As Figure 5 shows, it seems that SIFT+gray values is the most suitable for our problem....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.