Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Fully Automatic Subpixel Image Registration of Multiangle CHRIS/Proba Data

[...]

Jianglin Ma¹, Jonathan Cheung-Wai Chan¹, Frank Canters¹•Institutions (1)

VU University Amsterdam¹

08 Apr 2010-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The proposed registration scheme has been tested using data from the Compact High Resolution Imaging Spectrometer (CHRIS) onboard the Project for On-Board Autonomy (Proba) satellite and demonstrates that the proposed method works well in areas with little variation in topography.

...read moreread less

Abstract: Subpixel image registration is the key to successful image fusion and superresolution enhancement of multiangle satellite data. Multiangle image registration poses two main challenges: 1) Images captured at large view angles are susceptible to resolution change and blurring, and 2) local geometric distortion caused by topographic effects and/or platform instability may be important. In this paper, we propose a two-step nonrigid automatic registration scheme for multiangle satellite images. In the first step, control points (CPs) are selected in a preregistration process based on the scale-invariant feature transform (SIFT). However, the number of CPs obtained in this first step may be too few and/or CPs may be unevenly distributed. To remediate these problems, in a second step, the preliminary registered image is subdivided into chips of 64 × 64 pixels, and each chip is matched with a corresponding chip in the reference image using normalized cross correlation (NCC). By doing so, more CPs with better spatial distribution are obtained. Two criteria are applied during the generation of CPs to identify outliers. Selected SIFT and NCC CPs are used for defining a nonrigid thin-plate-spline model. The proposed registration scheme has been tested using data from the Compact High Resolution Imaging Spectrometer (CHRIS) onboard the Project for On-Board Autonomy (Proba) satellite. Experimental results demonstrate that the proposed method works well in areas with little variation in topography. Application in areas with more pronounced relief would require the use of orthorectified image data in order to achieve subpixel registration accuracy.

...read moreread less

94 citations

Cites background from "Distinctive Image Features from Sca..."

...The arrow illustrates the increasing space scales [15]....
[...]
...Following the suggestion in [15], k is set to √ 2, which leads to a significant difference in successive scales....
[...]
...Scale-space extrema in the Difference of Gaussians are regarded as the most stable scale-invariant features [15]....
[...]
...the scale-invariant feature transform (SIFT) [11], [13]–[15]....
[...]

Proceedings Article•DOI•

3D reconstruction of underwater structures

[...]

Chris Beall¹, Brian J. Lawrence¹, Viorela Ila², Frank Dellaert¹•Institutions (2)

Georgia Institute of Technology College of Computing¹, Georgia Institute of Technology²

03 Dec 2010

TL;DR: The technique is a highly accurate sparse 3D reconstruction of underwater structures such as corals, constructed from synchronized high definition videos collected using a wide baseline stereo rig.

...read moreread less

Abstract: Environmental change is a growing international concern, calling for the regular monitoring, studying and preserving of detailed information about the evolution of underwater ecosystems. For example, fragile coral reefs are exposed to various sources of hazards and potential destruction, and need close observation. Computer vision offers promising technologies to build 3D models of an environment from two-dimensional images. The state of the art techniques have enabled high-quality digital reconstruction of large-scale structures, e.g., buildings and urban environments, but only sparse representations or dense reconstruction of small objects have been obtained from underwater video and still imagery. The application of standard 3D reconstruction methods to challenging underwater environments typically produces unsatisfactory results. Accurate, full camera trajectories are needed to serve as the basis for dense 3D reconstruction. A highly accurate sparse 3D reconstruction is the ideal foundation on which to base subsequent dense reconstruction algorithms. In our application the models are constructed from synchronized high definition videos collected using a wide baseline stereo rig. The rig can be hand-held, attached to a boat, or even to an autonomous underwater vehicle. We solve this problem by employing a smoothing and mapping toolkit developed in our lab specifically for this type of application. The result of our technique is a highly accurate sparse 3D reconstruction of underwater structures such as corals.

...read moreread less

94 citations

Cites methods from "Distinctive Image Features from Sca..."

...Given their scale and local affine invariance properties, we opt to use SIFT [15] or SURF [16] instead, as they constitute a better option for matching visual features...
[...]
...SIFT and SURF descriptor matching are quite reliable in many situations, yet RANSAC is needed to eliminate outliers due to erroneous stereo and temporal matching, as outliers are capable of introducing large error into the solution....
[...]
...Given their scale and local affine invariance properties, we opt to use SIFT [15] or SURF [16] instead, as they constitute a better option for matching visual features from varying poses....
[...]
...To deal with scale and affine distortions in SIFT, for example, keypoint patches are selected from difference-of-Gaussian images at various scales, for which the dominant gradient orientation and scale are stored....
[...]
...Our technique produces similar results whether we use SIFT or SURF, with SURF running significantly faster....
[...]

Journal Article•DOI•

sRD-SIFT: Keypoint Detection and Matching in Images With Radial Distortion

[...]

Miguel Lourenço¹, Joao P. Barreto¹, Francisco de Assis Guedes de Vasconcelos¹•Institutions (1)

University of Coimbra¹

01 Jun 2012-IEEE Transactions on Robotics

TL;DR: Modifications to the SIFT algorithm are proposed that substantially improve the repeatability of detection and effectiveness of matching under radial distortion, while preserving the original invariance to scale and rotation.

...read moreread less

Abstract: Keypoint detection and matching is of fundamental importance for many applications in computer and robot vision. The association of points across different views is problematic because image features can undergo significant changes in appearance. Unfortunately, state-of-the-art methods, like the scale-invariant feature transform (SIFT), are not resilient to the radial distortion that often arises in images acquired by cameras with microlenses and/or wide field-of-view. This paper proposes modifications to the SIFT algorithm that substantially improve the repeatability of detection and effectiveness of matching under radial distortion, while preserving the original invariance to scale and rotation. The scale-space representation of the image is obtained using adaptive filtering that compensates the local distortion, and the keypoint description is carried after implicit image gradient correction. Unlike competing methods, our approach avoids image resampling (the processing is carried out in the original image plane), it does not require accurate camera calibration (an approximate modeling of the distortion is sufficient), and it adds minimal computational overhead. Extensive experiments show the advantages of our method in establishing point correspondence across images with radial distortion.

...read moreread less

94 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...The reasons for new detections are explained in [3]....
[...]
...2) Measuring Matching Performance: Two keypoints are considered to be a match iff the Euclidean distance between their SIFT descriptors is below a certain threshold λ [3], [7]....
[...]
...The original SIFT algorithm that is proposed by Lowe [3], despite being invariant to rotations on the plane P2 , is unable to handle the projective transformations due to camera rotation [14]....
[...]
...The scale-invariant feature transform (SIFT) [3] is arguably one of the most popular matching algorithms, being broadly used in robotics because of its invariance to common image transformations such as scale, rotation, and moderate viewpoint change [4], [5]....
[...]

Journal Article•DOI•

Learning Vocabularies over a Fine Quantization

[...]

Andrej Mikulík¹, Michal Perdoch¹, Ondřej Chum¹, Jiří Matas¹•Institutions (1)

Czech Technical University in Prague¹

01 May 2013-International Journal of Computer Vision

TL;DR: It is demonstrated that the performance of specific object retrieval increases with the size of the vocabulary and that the large vocabularies increase the speed of the tf-idf scoring step.

...read moreread less

Abstract: A novel similarity measure for bag-of-words type large scale image retrieval is presented. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming embedding. The novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard Oxford 5k, Oxford 105k and Paris datasets/protocols. We study the effect of a fine quantization and very large vocabularies (up to 64 million words) and show that the performance of specific object retrieval increases with the size of the vocabulary. This observation is in contradiction with previously published results. We further demonstrate that the large vocabularies increase the speed of the tf-idf scoring step.

...read moreread less

93 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...However, even in the original work on SIFT descriptor matching (Lowe 2004) it is shown that the similarity of the descriptors is not only dependent on the distance of the descriptors, but also on the location of the features in the feature space....
[...]
...Most, if not all, recent stateof-the-art methods extend the bag-of-words representation introduced by Sivic and Zisserman (Sivic and Zisserman 2003) who represented the image by a histogram of “visual words”, i.e., discretized SIFT descriptors (Lowe 2004)....
[...]
...To avoid bias (by quantization errors, for example), instead of using the vector-quantized form of the descriptors, the conventional image matching (based on the full SIFT (Lowe 2004)) has to be used....
[...]
...4.2 Feature Tracks To avoid bias (by quantization errors, for example), instead of using the vector-quantized form of the descriptors, the conventional image matching (based on the full SIFT (Lowe 2004)) has to be used....
[...]

Proceedings Article•DOI•

Texture Classification using a Linear Configuration Model based Descriptor.

[...]

Yimo Guo¹, Guoying Zhao², Matti Pietikäinen²•Institutions (2)

University of Science and Technology Beijing¹, University of Oulu²

01 Jan 2011

TL;DR: The descriptor MiC is shown that encodes image microscopic configuration by a linear configuration model that could avoid the generalization problem suffered by other statistical learning methods.

...read moreread less

Abstract: Texture classification can be concluded as the problem of classifying images according to textural cues, that is, categorizing a texture image obtained under certain illumination and viewpoint condition as belonging to one of the pre-learned texture classes. Therefore, it would mainly pass through two steps: image representation or description and classification. In this paper, we focus on the feature extraction part that aims to extract effective patterns to distinguish different textures. Among various feature extraction methods, local features have performed well in real-world applications, such as LBP[4], SIFT [2] and Histogram of Oriented Gradients (HOG) [1]. Representative methods also include grey level difference or co-occurrence statistics [10], and methods based on multi-channel filtering or wavelet decomposition [3, 5, 7]. To learn representative structural configuration from texture images, Varma et al. proposed texton methods based on the filter response space and local image patch space [8, 9]. We show in this paper the descriptor MiC that encodes image microscopic configuration by a linear configuration model. The final local configuration pattern (LCP) feature integrates both the microscopic features represented by optimal model parameters and local features represented by pattern occurrences. To be specific, microscopic features capture image microscopic configuration which embodies image configuration and pixel-wise interaction relationships by a linear model. The optimal model parameters are estimated by an efficient least squares estimator. To achieve rotation invariance, which is a desired property for texture features, Fourier transform is applied to the estimated parameter vectors. Finally, the transformed vectors are concatenated with local pattern occurrences to construct LCPs. As this framework is unsupervised, it could avoid the generalization problem suffered by other statistical learning methods. To model the image configuration with respect to each pattern, we estimate optimal weights, associating with intensities of neighboring pixels, to linearly reconstruct the central pixel intensity. This can be expressed by:

...read moreread less

93 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
…
148
149
150
151
152
153
154
…
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations