Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Reconfigurable models for scene recognition

[...]

Sobhan Naderi Parizi¹, John Oberlin¹, Pedro F. Felzenszwalb¹•Institutions (1)

Brown University¹

16 Jun 2012

TL;DR: A new latent variable model for scene recognition that represents a scene as a collection of region models arranged in a reconfigurable pattern and uses a latent variable to specify which region model is assigned to each image region.

...read moreread less

Abstract: We propose a new latent variable model for scene recognition. Our approach represents a scene as a collection of region models (“parts”) arranged in a reconfigurable pattern. We partition an image into a predefined set of regions and use a latent variable to specify which region model is assigned to each image region. In our current implementation we use a bag of words representation to capture the appearance of an image region. The resulting method generalizes a spatial bag of words approach that relies on a fixed model for the bag of words in each image region. Our models can be trained using both generative and discriminative methods. In the generative setting we use the Expectation-Maximization (EM) algorithm to estimate model parameters from a collection of images with category labels. In the discriminative setting we use a latent structural SVM (LSSVM). We note that LSSVMs can be very sensitive to initialization and demonstrate that generative training with EM provides a good initialization for discriminative training with LSSVM.

...read moreread less

177 citations

Cites methods from "Distinctive Image Features from Sca..."

...We used densely sampled SIFT features [7] to define visual words....
[...]

Journal Article•DOI•

Remote Sensing Image Matching Based on Adaptive Binning SIFT Descriptor

[...]

Amin Sedaghat¹, Hamid Ebadi¹•Institutions (1)

K.N.Toosi University of Technology¹

27 Apr 2015-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Experimental results show that the proposed AB-SIFT matching method is more robust and accurate than state-of-the-art methods, including the SIFT, DAISY, the gradient location and orientation histogram, the local intensity order pattern, and the binary robust invariant scale keypoint.

...read moreread less

Abstract: Image matching based on local invariant features is crucial for many photogrammetric and remote sensing applications such as image registration and image mosaicking. In this paper, a novel local feature descriptor named adaptive binning scale-invariant feature transform (AB-SIFT) for fully automatic remote sensing image matching that is robust to local geometric distortions is proposed. The main idea of the proposed method is an adaptive binning strategy to compute the local feature descriptor. The proposed descriptor is computed on a normalized region defined by an improved version of the prominent Hessian affine feature extraction algorithm called the uniform robust Hessian affine algorithm. Unlike common distribution-based descriptors, the proposed descriptor uses an adaptive histogram quantization strategy for both location and gradient orientations, which is robust and actually resistant to a local viewpoint distortion and extremely increases the discriminability and robustness of the final AB-SIFT descriptor. In addition to the SIFT descriptor, the proposed adaptive quantization strategy can be easily extended for other distribution-based descriptors. Experimental results on both synthetic and real image pairs show that the proposed AB-SIFT matching method is more robust and accurate than state-of-the-art methods, including the SIFT, DAISY, the gradient location and orientation histogram, the local intensity order pattern, and the binary robust invariant scale keypoint.

...read moreread less

174 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...are most affected by local distortion errors [19]....
[...]
...The most popular distribution-based local descriptor is the scale-invariant feature transform (SIFT) [19], which is a 3-D histogram of gradient location and orientation on Cartesian 4 × 4 grids....
[...]
...In order to achieve orientation invariance, before descriptor generation, one orientation is assigned to each feature point based on local image gradient directions based on Lowe’s method [19]....
[...]

Journal Article•DOI•

Utilizing Local Phase Information to Remove Rain from Video

[...]

Varun Santhaseelan¹, Vijayan K. Asari¹•Institutions (1)

University of Dayton¹

01 Mar 2015-International Journal of Computer Vision

TL;DR: From a variety of experiments that are performed on output videos, it is shown that the proposed technique performs better than state-of-the-art techniques.

...read moreread less

Abstract: In the context of extracting information from video, bad weather conditions like rain can have a detrimental effect. In this paper, a novel framework to detect and remove rain streaks from video is proposed. The first part of the proposed framework for rain removal is a technique to detect rain streaks based on phase congruency features. The variation of features from frame to frame is used to estimate the candidate rain pixels in a frame. In order to reduce the number of false candidates due to global motion, frames are registered using phase correlation. The second part of the proposed framework is a novel reconstruction technique that utilizes information from three different sources, which are intensities of the rain affected pixel, spatial neighbors, and temporal neighbors. An optimal estimate for the actual intensity of the rain affected pixel is made based on the minimization of registration error between frames. An optical flow technique using local phase information is adopted for registration. This part of the proposed framework for removing rain is modeled such that the presence of local motion will not distort the features in the reconstructed video. The proposed framework is evaluated quantitatively and qualitatively on a variety of videos with varying complexities. The effectiveness of the algorithm is quantitatively verified by computing a no-reference image quality measure on individual frames of the reconstructed video. From a variety of experiments that are performed on output videos, it is shown that the proposed technique performs better than state-of-the-art techniques.

...read moreread less

174 citations

Cites methods from "Distinctive Image Features from Sca..."

...When feature based techniques like Scale Invariant Feature Transform (SIFT) (Lowe (2004)) are used for stabilization, the feature points could be on the rain streaks....
[...]

Journal Article•DOI•

Vehicle Detection in Very High Resolution Satellite Images of City Areas

[...]

Jens Leitloff, Stefan Hinz, Uwe Stilla

01 Apr 2010-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: An approach for automatic vehicle detection from optical satellite images using implicit modeling and the use of a priori knowledge of typical vehicle constellation leads to an enhanced overall completeness compared to approaches which are only based on statistical classification techniques.

...read moreread less

Abstract: Current traffic research is mostly based on data from fixed-installed sensors like induction loops, bridge sensors, and cameras. Thereby, the traffic flow on main roads can partially be acquired, while data from the major part of the entire road network are not available. Today's optical sensor systems on satellites provide large-area images with 1-m resolution and better, which can deliver complement information to traditional acquired data. In this paper, we present an approach for automatic vehicle detection from optical satellite images. Therefore, hypotheses for single vehicles are generated using adaptive boosting in combination with Haar-like features. Additionally, vehicle queues are detected using a line extraction technique since grouped vehicles are merged to either dark or bright ribbons. Utilizing robust parameter estimation, single vehicles are determined within those vehicle queues. The combination of implicit modeling and the use of a priori knowledge of typical vehicle constellation leads to an enhanced overall completeness compared to approaches which are only based on statistical classification techniques. Thus, a detection rate of over 80% is possible with very high reliability. Furthermore, an approach for movement estimation of the detected vehicle is described, which allows the distinction of moving and stationary traffic. Thus, even an estimate for vehicles' speed is possible, which gives additional information about the traffic condition at image acquisition time.

...read moreread less

173 citations

Cites background from "Distinctive Image Features from Sca..."

...Furthermore, SIFT and SURF require key-point detection which will not yield to a pixelwise classification....
[...]
...More sophisticated features such as scale-invariant feature transform (SIFT) [53], speeded up robust features (SURF) [54], or similar descriptors could also be included....
[...]
...(SIFT) [53], speeded up robust features (SURF) [54], or similar descriptors could also be included....
[...]

Book Chapter•DOI•

Learning to Hash with Binary Deep Neural Network

[...]

Thanh-Toan Do¹, Anh-Dzung Doan¹, Ngai-Man Cheung¹•Institutions (1)

Singapore University of Technology and Design¹

08 Oct 2016

TL;DR: This work proposes deep network models and learning algorithms for unsupervised and supervised binary hashing that incorporate independence and balance properties in the direct and strict forms in the learning and includes similarity preserving property in the objective function.

...read moreread less

Abstract: This work proposes deep network models and learning algorithms for unsupervised and supervised binary hashing. Our novel network design constrains one hidden layer to directly output the binary codes. This addresses a challenging issue in some previous works: optimizing non-smooth objective functions due to binarization. Moreover, we incorporate independence and balance properties in the direct and strict forms in the learning. Furthermore, we include similarity preserving property in our objective function. Our resulting optimization with these binary, independence, and balance constraints is difficult to solve. We propose to attack it with alternating optimization and careful relaxation. Experimental results on three benchmark datasets show that our proposed methods compare favorably with the state of the art.

...read moreread less

173 citations

Cites methods from "Distinctive Image Features from Sca..."

...SIFT1M [28] dataset contains 128 dimensional SIFT vectors [29]....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
…
65
66
67
68
69
70
71
…
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations