Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Handling urban location recognition as a 2D homothetic problem

[...]

Georges Baatz¹, Kevin Köser¹, David Chen², Radek Grzeszczuk³, Marc Pollefeys¹ - Show less +1 more•Institutions (3)

ETH Zurich¹, Stanford University², Nokia³

05 Sep 2010

TL;DR: 3D building information is exploited by exploiting the nowadays often available 3D building data and massive street-view like image data for database creation to solve the problem of large scale place-of-interest recognition in cell phone images of urban scenarios.

...read moreread less

Abstract: We address the problem of large scale place-of-interest recognition in cell phone images of urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-view like image data for database creation. Exploiting vanishing points in query images and thus fully removing 3D rotation from the recognition problem allows then to simplify the feature invariance to a pure homothetic problem, which we show leaves more discriminative power in feature descriptors than classical SIFT. We rerank visual word based document queries using a fast stratified homothetic verification that is tailored for repetitive patterns like window grids on facades and in most cases boosts the correct document to top positions if it was in the short list. Since we exploit 3D building information, the approach finally outputs the camera pose in real world coordinates ready for augmenting the cell phone image with virtual 3D information. The whole system is demonstrated to outperform traditional approaches on city scale experiments for different sources of street-view like image data and a challenging set of cell phone images.

...read moreread less

93 citations

Proceedings Article•DOI•

MAV urban localization from Google street view data

[...]

Andras Majdik¹, Yves Albers-Schoenberg¹, Davide Scaramuzza¹•Institutions (1)

University of Zurich¹

01 Nov 2013

TL;DR: The success of the approach shows that the new air-ground matching algorithm can robustly handle extreme changes in viewpoint, illumination, perceptual aliasing, and over-season variations, thus, outperforming conventional visual place-recognition approaches.

...read moreread less

Abstract: We tackle the problem of globally localizing a camera-equipped micro aerial vehicle flying within urban environments for which a Google Street View image database exists. To avoid the caveats of current image-search algorithms in case of severe viewpoint changes between the query and the database images, we propose to generate virtual views of the scene, which exploit the air-ground geometry of the system. To limit the computational complexity of the algorithm, we rely on a histogram-voting scheme to select the best putative image correspondences. The proposed approach is tested on a 2 km image dataset captured with a small quadroctopter flying in the streets of Zurich. The success of our approach shows that our new air-ground matching algorithm can robustly handle extreme changes in viewpoint, illumination, perceptual aliasing, and over-season variations, thus, outperforming conventional visual place-recognition approaches.

...read moreread less

93 citations

Cites methods from "Distinctive Image Features from Sca..."

...Point feature detectors and descriptors—such as SIFT [17], BRISK [22], etc....
[...]
...A comparison between two images is done through the following pipeline: (i) SIFT [17] image features are extracted in both images; (ii) their descriptors are matched; (iii) outliers are rejected through verification of their geometric consistency via fundamental-matrix estimation (e....
[...]

Journal Article•DOI•

A Novel Two-Step Registration Method for Remote Sensing Images Based on Deep and Local Features

[...]

Wenping Ma¹, Jun Zhang¹, Yue Wu¹, Licheng Jiao¹, Hao Zhu¹, Wei Zhao¹ - Show less +2 more•Institutions (1)

Xidian University¹

21 Feb 2019-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: An effective coarse-to-fine strategy is introduced and a new two-step registration method based on deep and local features based on a convolutional neural network is developed, which can apparently increase the correct correspondences, can improve the ratio of correct Correspondences, and is highly robust and accurate.

...read moreread less

Abstract: Automatic remote sensing image registration has achieved great accomplishment. However, it is still a vital challenging problem to develop a robust and accurate registration method due to the negative effects of noise and imaging differences between images. For these images, it is difficult to guarantee the accuracy and robustness at the same time for one-step registration methods. To address this issue, we introduce an effective coarse-to-fine strategy and develop a new two-step registration method based on deep and local features in this paper. The first step is to calculate the approximate spatial relationship, which is obtained by a convolutional neural network. This step makes full use of the deep features to match and can generate stable results. For the second step, a matching strategy considering spatial relationship is applied to the local feature-based method. In addition, this step adopts more accurate features in location to adjust the results of the previous step. A variety of homologous and multimodal remote sensing images, including optical, synthetic aperture radar, and general map images, are used to evaluate the proposed method. The comparison experiments demonstrate that our method can apparently increase the correct correspondences, can improve the ratio of correct correspondences, and is highly robust and accurate.

...read moreread less

93 citations

Cites methods from "Distinctive Image Features from Sca..."

...The registration process of the classic SIFT method contains five steps: scale-space extrema detection, keypoint localization, orientation assignment, keypoint descriptor, and keypoint matching [10]....
[...]
...In order to improve the stability of keypoints, Lowe [25] used the Taylor expansion of the scale-space function, D(x, y, σ ) D(x) = D + ∂ D T ∂x x + 1 2 xT ∂2 D ∂x2 x (3) x = (x, y, σ )T is the offset....
[...]
...Lowe recommends that the descriptors are computed by using gradient information of eight directions in a 4 × 4 window within the keypoint scale space....
[...]
...SIFT was first introduced by Lowe [25] in 1999 and then improved in 2004 [10]....
[...]
...Scale-invariant feature transform (SIFT) [10] is one of the most commonly used methods among point feature-based methods, and various improved SIFT-based methods are also widely used....
[...]

Journal Article•DOI•

An Efficient Approach to Semantic Segmentation

[...]

Gabriela Csurka¹, Florent Perronnin¹•Institutions (1)

Xerox¹

01 Nov 2011-International Journal of Computer Vision

TL;DR: It is believed that an important contribution of this paper is to show that even a simple decoupled system can provide state-of-the-art performance on the PASCAL VOC 2007, PASCal VOC 2008 and MSRC 21 datasets.

...read moreread less

Abstract: We consider the problem of semantic segmentation, i.e. assigning each pixel in an image to a set of pre-defined semantic object categories. State-of-the-art semantic segmentation algorithms typically consist of three components: a local appearance model, a local consistency model and a global consistency model. These three components are generally integrated into a unified probabilistic framework. While it enables at training time a joint estimation of the model parameters and while it ensures at test time a globally consistent labeling of the pixels, it also comes at a high computational cost. We propose a simple approach to semantic segmentation where the three components are decoupled (this journal submission is an extended version of the following conference paper: G. Csurka and F. Perronnin, "A simple high performance approach to semantic segmentation", BMVC, 2008). For the local appearance model, we make use of the Fisher kernel. While this framework was shown to lead to high accuracy for image classification, to our best knowledge this is its first application to the segmentation problem. The semantic segmentation process is then guided by a low-level segmentation which enforces local consistency. Finally, to enforce image-level consistency we use global image classifiers: if an image as a whole is unlikely to contain an object class, then the corresponding class is not considered in the segmentation pipeline. The decoupling of the components makes our system very efficient both at training and test time. An efficient training enables to estimate the model parameters on large quantities of data. Especially, we explain how our system can leverage weakly labeled data, i.e. images for which we do not have pixel-level labels but either object bounding boxes or even only image-level labels. We believe that an important contribution of this paper is to show that even a simple decoupled system can provide state-of-the-art performance on the PASCAL VOC 2007, PASCAL VOC 2008 and MSRC 21 datasets.

...read moreread less

92 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...the output of filter banks), color statistics (histogram or moments) and SIFT (Lowe 2004)....
[...]
...The SIFT and color maps are then simply averaged for each category....
[...]
...We make use of two types of low-level descriptors: 128-dimensional SIFT features (Lowe 2004) and 96-dimensional color descriptors....
[...]
...The most popular descriptors include texture (i.e. the output of filter banks), color statistics (histogram or moments) and SIFT (Lowe 2004)....
[...]
...We thus obtain one pixel-level probability map per class per feature type, i.e. one for SIFT and one for color in our case....
[...]

Proceedings Article•DOI•

Scalable Kernel Correlation Filter with Sparse Feature Integration

[...]

Andres Solis Montero¹, Jochen Lang¹, Robert Laganiere¹•Institutions (1)

University of Ottawa¹

07 Dec 2015

TL;DR: A fast scalable solution based on the Kernalized Correlation Filter (KCF) framework that integrates the fast HoG descriptors and Intel's Complex Conjugate Symmetric (CCS) packed format to boost the achievable frame rates.

...read moreread less

Abstract: Correlation filters for long-term visual object tracking have recently seen great interest. Although they present competitive performance results, there is still a need for improving their tracking capabilities. In this paper, we present a fast scalable solution based on the Kernalized Correlation Filter (KCF) framework. We introduce an adjustable Gaussian window function and a keypoint-based model for scale estimation to deal with the fixed size limitation in the Kernelized Correlation Filter. Furthermore, we integrate the fast HoG descriptors and Intel's Complex Conjugate Symmetric (CCS) packed format to boost the achievable frame rates. We test our solution using the Visual Tracker Benchmark and the VOT Challenge datasets. We evaluate our tracker in terms of precision and success rate, accuracy, robustness and speed. The empirical evaluations demonstrate clear improvements by the proposed tracker over the KCF algorithm while ranking among the top state-of-the-art trackers.

...read moreread less

92 citations

Cites methods from "Distinctive Image Features from Sca..."

...It adopts SIFT [18] features and descriptors to match keypoints and uses single value decomposition to estimate position, scale and orientation of the matches....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
…
150
151
152
153
154
155
156
…
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations