scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: Global SDM is proposed, an extension of Supervised Descent Method that divides the search space into regions of similar gradient directions that provides a better and more efficient strategy to minimize non-linear least squares functions in computer vision problems.
Abstract: Mathematical optimization plays a fundamental role in solving many problems in computer vision (e.g., camera calibration, image alignment, structure from motion). It is generally accepted that second order descent methods are the most robust, fast, and reliable approaches for nonlinear optimization of a general smooth function. However, in the context of computer vision, second order descent methods have two main drawbacks: 1) the function might not be analytically differentiable and numerical approximations are impractical, and 2) the Hessian may be large and not positive definite. Recently, Supervised Descent Method (SDM), a method that learns the “weighted averaged gradients” in a supervised manner has been proposed to solve these issues. However, SDM is a local algorithm and it is likely to average conflicting gradient directions. This paper proposes Global SDM (GSDM), an extension of SDM that divides the search space into regions of similar gradient directions. GSDM provides a better and more efficient strategy to minimize non-linear least squares functions in computer vision problems. We illustrate the effectiveness of GSDM in two problems: non-rigid image alignment and extrinsic camera calibration.

227 citations


Cites background from "Distinctive Image Features from Sca..."

  • ..., SIFT [27] or HoG [12]) and h(d(x)) ∈ R128p×1 in the case of extracting SIFT features....

    [...]

  • ...Given an image d ∈ Rm×1 of m pixels, d(x) ∈ Rp×1 indexes p landmarks in the image. h is a non-linear feature extraction function (e.g., SIFT [27] or HoG [12]) and h(d(x)) ∈ R128p×1 in the case of extracting SIFT features....

    [...]

  • ...In this setting, SDM frames facial feature tracking as minimizing the following function over ∆x f(x0 + ∆x) = ‖h(d(x0 + ∆x))− φ∗‖22, (11) where x0 is the initial configuration of the landmarks which corresponds to an average shape and φ∗ = h(d(x∗)) represents the SIFT values in the manually labeled landmarks....

    [...]

Journal ArticleDOI
TL;DR: A robot localization system using biologically inspired vision models two extensively studied human visual capabilities: extracting the ldquogistrdquo of a scene to produce a coarse localization hypothesis and refining it by locating salient landmark points in the scene.
Abstract: We present a robot localization system using biologically inspired vision. Our system models two extensively studied human visual capabilities: (1) extracting the ldquogistrdquo of a scene to produce a coarse localization hypothesis and (2) refining it by locating salient landmark points in the scene. Gist is computed here as a holistic statistical signature of the image, thereby yielding abstract scene classification and layout. Saliency is computed as a measure of interest at every image location, which efficiently directs the time-consuming landmark-identification process toward the most likely candidate locations in the image. The gist features and salient regions are then further processed using a Monte Carlo localization algorithm to allow the robot to generate its position. We test the system in three different outdoor environments-building complex (38.4 m times 54.86 m area, 13 966 testing images), vegetation-filled park (82.3 m times 109.73 m area, 26 397 testing images), and open-field park (137.16 m times 178.31 m area, 34 711 testing images)-each with its own challenges. The system is able to localize, on average, within 0.98, 2.63, and 3.46 m, respectively, even with multiple kidnapped-robot instances.

226 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...planar (translational and rotational) transformation matrix [12] that characterizes the alignment....

    [...]

  • ...We employ a straightforward SIFT-recognition system [12] (using all the suggested parameters and thresholds) but consider only regions that have more than five keypoints to...

    [...]

  • ...We use SIFT keypoints [12] because they are the current...

    [...]

  • ...We use two sets of signatures: SIFT keypoints [12] and salient...

    [...]

  • ...A popular starting point for local features are scale-invariant feature transform (SIFT) keypoints [12]....

    [...]

Journal ArticleDOI
01 Feb 2013
TL;DR: A contact-less remote-sensing crack detection and quantification methodology based on 3D scene reconstruction (computer vision), image processing, and pattern recognition concepts is introduced, giving a robotic inspection system the ability to analyze images captured from any distance and using any focal length or resolution.
Abstract: Visual inspection of structures is a highly qualitative method in which inspectors visually assess a structure’s condition. If a region is inaccessible, binoculars must be used to detect and characterize defects. Although several Non-Destructive Testing methods have been proposed for inspection purposes, they are nonadaptive and cannot quantify crack thickness reliably. In this paper, a contact-less remote-sensing crack detection and quantification methodology based on 3D scene reconstruction (computer vision), image processing, and pattern recognition concepts is introduced. The proposed approach utilizes depth perception to detect cracks and quantify their thickness, thereby giving a robotic inspection system the ability to analyze images captured from any distance and using any focal length or resolution. This unique adaptive feature is especially useful for incorporating mobile systems, such as unmanned aerial vehicles, into structural inspection methods since it would allow inaccessible regions to be properly inspected for cracks. Guidelines are presented for optimizing the acquisition and processing of images, thereby enhancing the quality and reliability of the damage detection approach and allowing the capture of even the slightest cracks (e.g., detection of 0.1 mm cracks from a distance of 20 m), which are routinely encountered in realistic field applications where the camera-object distance and image contrast are not controllable.

226 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...In this system, SIFT keypoints [15] are detected in each image and then matched...

    [...]

Journal ArticleDOI
TL;DR: This paper presents a computer vision-based approach that is complemented by proven photogrammetric principles to generate orthophotos from a range of uncalibrated oblique and vertical aerial frame images and proves that this approach moves beyond current restrictions due to its applicability to datasets that were previously thought to be unsuited for convenient georeferencing.

225 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...The approach is similar to the well-known SIFT (Scale Invariant Feature Transform) algorithm developed by David Lowe (Lowe, 2004), since the features are also stable under viewpoint, scale and lighting variations....

    [...]

Proceedings ArticleDOI
01 Jan 2009
TL;DR: This work proposes natural language processing methods for extracting salient visual attributes from natural language descriptions to use as ‘templates’ for the object categories, and applies vision methods to extract corresponding attributes from test images.
Abstract: We investigate the task of learning models for visual object recognition from natural language descriptions alone. The approach contributes to the recognition of fine-grain object categories, such as animal and plant species, where it may be difficult to collect many images for training, but where textual descriptions of visual attributes are readily available. As an example we tackle recognition of butterfly species, learning models from descriptions in an online nature guide. We propose natural language processing methods for extracting salient visual attributes from these descriptions to use as ‘templates’ for the object categories, and apply vision methods to extract corresponding attributes from test images. A generative model is used to connect textual terms in the learnt templates to visual attributes. We report experiments comparing the performance of humans and the proposed method on a dataset of ten butterfly categories.

225 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...First, candidate image regions likely to be spots are extracted by applying the Difference-of-Gaussians (DoG) interest point operator [18] to the image at multiple scales....

    [...]

  • ...As descriptors we use the SIFT descriptor [18] computed at three consecutive octave scales around the interest point....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.