Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Global supervised descent method

[...]

Xuehan Xiong¹, Fernando De la Torre¹•Institutions (1)

Carnegie Mellon University¹

07 Jun 2015

TL;DR: Global SDM is proposed, an extension of Supervised Descent Method that divides the search space into regions of similar gradient directions that provides a better and more efficient strategy to minimize non-linear least squares functions in computer vision problems.

...read moreread less

Abstract: Mathematical optimization plays a fundamental role in solving many problems in computer vision (e.g., camera calibration, image alignment, structure from motion). It is generally accepted that second order descent methods are the most robust, fast, and reliable approaches for nonlinear optimization of a general smooth function. However, in the context of computer vision, second order descent methods have two main drawbacks: 1) the function might not be analytically differentiable and numerical approximations are impractical, and 2) the Hessian may be large and not positive definite. Recently, Supervised Descent Method (SDM), a method that learns the “weighted averaged gradients” in a supervised manner has been proposed to solve these issues. However, SDM is a local algorithm and it is likely to average conflicting gradient directions. This paper proposes Global SDM (GSDM), an extension of SDM that divides the search space into regions of similar gradient directions. GSDM provides a better and more efficient strategy to minimize non-linear least squares functions in computer vision problems. We illustrate the effectiveness of GSDM in two problems: non-rigid image alignment and extrinsic camera calibration.

...read moreread less

227 citations

Cites background from "Distinctive Image Features from Sca..."

..., SIFT [27] or HoG [12]) and h(d(x)) ∈ R128p×1 in the case of extracting SIFT features....
[...]
...Given an image d ∈ Rm×1 of m pixels, d(x) ∈ Rp×1 indexes p landmarks in the image. h is a non-linear feature extraction function (e.g., SIFT [27] or HoG [12]) and h(d(x)) ∈ R128p×1 in the case of extracting SIFT features....
[...]
...In this setting, SDM frames facial feature tracking as minimizing the following function over ∆x f(x0 + ∆x) = ‖h(d(x0 + ∆x))− φ∗‖22, (11) where x0 is the initial configuration of the landmarks which corresponds to an average shape and φ∗ = h(d(x∗)) represents the SIFT values in the manually labeled landmarks....
[...]

Journal Article•DOI•

Biologically Inspired Mobile Robot Vision Localization

[...]

Christian Siagian¹, Laurent Itti¹•Institutions (1)

University of Southern California¹

01 Aug 2009-IEEE Transactions on Robotics

TL;DR: A robot localization system using biologically inspired vision models two extensively studied human visual capabilities: extracting the ldquogistrdquo of a scene to produce a coarse localization hypothesis and refining it by locating salient landmark points in the scene.

...read moreread less

Abstract: We present a robot localization system using biologically inspired vision. Our system models two extensively studied human visual capabilities: (1) extracting the ldquogistrdquo of a scene to produce a coarse localization hypothesis and (2) refining it by locating salient landmark points in the scene. Gist is computed here as a holistic statistical signature of the image, thereby yielding abstract scene classification and layout. Saliency is computed as a measure of interest at every image location, which efficiently directs the time-consuming landmark-identification process toward the most likely candidate locations in the image. The gist features and salient regions are then further processed using a Monte Carlo localization algorithm to allow the robot to generate its position. We test the system in three different outdoor environments-building complex (38.4 m times 54.86 m area, 13 966 testing images), vegetation-filled park (82.3 m times 109.73 m area, 26 397 testing images), and open-field park (137.16 m times 178.31 m area, 34 711 testing images)-each with its own challenges. The system is able to localize, on average, within 0.98, 2.63, and 3.46 m, respectively, even with multiple kidnapped-robot instances.

...read moreread less

226 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...planar (translational and rotational) transformation matrix [12] that characterizes the alignment....
[...]
...We employ a straightforward SIFT-recognition system [12] (using all the suggested parameters and thresholds) but consider only regions that have more than five keypoints to...
[...]
...We use SIFT keypoints [12] because they are the current...
[...]
...We use two sets of signatures: SIFT keypoints [12] and salient...
[...]
...A popular starting point for local features are scale-invariant feature transform (SIFT) keypoints [12]....
[...]

Journal Article•DOI•

An innovative methodology for detection and quantification of cracks through incorporation of depth perception

[...]

Mohammad R. Jahanshahi¹, Sami F. Masri¹, Curtis Padgett², Gaurav S. Sukhatme¹•Institutions (2)

University of Southern California¹, Jet Propulsion Laboratory²

01 Feb 2013

TL;DR: A contact-less remote-sensing crack detection and quantification methodology based on 3D scene reconstruction (computer vision), image processing, and pattern recognition concepts is introduced, giving a robotic inspection system the ability to analyze images captured from any distance and using any focal length or resolution.

...read moreread less

Abstract: Visual inspection of structures is a highly qualitative method in which inspectors visually assess a structure’s condition. If a region is inaccessible, binoculars must be used to detect and characterize defects. Although several Non-Destructive Testing methods have been proposed for inspection purposes, they are nonadaptive and cannot quantify crack thickness reliably. In this paper, a contact-less remote-sensing crack detection and quantification methodology based on 3D scene reconstruction (computer vision), image processing, and pattern recognition concepts is introduced. The proposed approach utilizes depth perception to detect cracks and quantify their thickness, thereby giving a robotic inspection system the ability to analyze images captured from any distance and using any focal length or resolution. This unique adaptive feature is especially useful for incorporating mobile systems, such as unmanned aerial vehicles, into structural inspection methods since it would allow inaccessible regions to be properly inspected for cracks. Guidelines are presented for optimizing the acquisition and processing of images, thereby enhancing the quality and reliability of the damage detection approach and allowing the capture of even the slightest cracks (e.g., detection of 0.1 mm cracks from a distance of 20 m), which are routinely encountered in realistic field applications where the camera-object distance and image contrast are not controllable.

...read moreread less

226 citations

Cites methods from "Distinctive Image Features from Sca..."

...In this system, SIFT keypoints [15] are detected in each image and then matched...
[...]

Journal Article•DOI•

Mapping by matching: a computer vision-based approach to fast and accurate georeferencing of archaeological aerial photographs

[...]

Geert Verhoeven¹, Geert Verhoeven², Michael Doneus², Christian Briese³, Frank Vermeulen¹ - Show less +1 more•Institutions (3)

Ghent University¹, University of Vienna², Vienna University of Technology³

01 Jul 2012-Journal of Archaeological Science

TL;DR: This paper presents a computer vision-based approach that is complemented by proven photogrammetric principles to generate orthophotos from a range of uncalibrated oblique and vertical aerial frame images and proves that this approach moves beyond current restrictions due to its applicability to datasets that were previously thought to be unsuited for convenient georeferencing.

...read moreread less

225 citations

Cites methods from "Distinctive Image Features from Sca..."

...The approach is similar to the well-known SIFT (Scale Invariant Feature Transform) algorithm developed by David Lowe (Lowe, 2004), since the features are also stable under viewpoint, scale and lighting variations....
[...]

Proceedings Article•DOI•

Learning Models for Object Recognition from Natural Language Descriptions

[...]

Josiah Wang¹, Katja Markert¹, Mark Everingham¹•Institutions (1)

University of Leeds¹

01 Jan 2009

TL;DR: This work proposes natural language processing methods for extracting salient visual attributes from natural language descriptions to use as ‘templates’ for the object categories, and applies vision methods to extract corresponding attributes from test images.

...read moreread less

Abstract: We investigate the task of learning models for visual object recognition from natural language descriptions alone. The approach contributes to the recognition of fine-grain object categories, such as animal and plant species, where it may be difficult to collect many images for training, but where textual descriptions of visual attributes are readily available. As an example we tackle recognition of butterfly species, learning models from descriptions in an online nature guide. We propose natural language processing methods for extracting salient visual attributes from these descriptions to use as ‘templates’ for the object categories, and apply vision methods to extract corresponding attributes from test images. A generative model is used to connect textual terms in the learnt templates to visual attributes. We report experiments comparing the performance of humans and the proposed method on a dataset of ten butterfly categories.

...read moreread less

225 citations

Cites methods from "Distinctive Image Features from Sca..."

...First, candidate image regions likely to be spots are extracted by applying the Difference-of-Gaussians (DoG) interest point operator [18] to the image at multiple scales....
[...]
...As descriptors we use the SIFT descriptor [18] computed at three consecutive octave scales around the interest point....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
…
43
44
45
46
47
48
49
…
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations