Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The State of the Art in HDR Deghosting: A Survey and Evaluation

[...]

Okan Tarhan Tursun¹, Ahmet Oğuz Akyüz¹, Aykut Erdem², Erkut Erdem²•Institutions (2)

Middle East Technical University¹, Hacettepe University²

01 May 2015-Computer Graphics Forum

TL;DR: A taxonomy of deghosting algorithms is proposed which can be used to group existing and future algorithms into meaningful classes, and the results of a subjective experiment are shared which aims to evaluate various state‐of‐the‐art de ghosting algorithms.

...read moreread less

Abstract: Obtaining a high quality high dynamic range HDR image in the presence of camera and object movement has been a long-standing challenge. Many methods, known as HDR deghosting algorithms, have been developed over the past ten years to undertake this challenge. Each of these algorithms approaches the deghosting problem from a different perspective, providing solutions with different degrees of complexity, solutions that range from rudimentary heuristics to advanced computer vision techniques. The proposed solutions generally differ in two ways: 1 how to detect ghost regions and 2 what to do to eliminate ghosts. Some algorithms choose to completely discard moving objects giving rise to HDR images which only contain the static regions. Some other algorithms try to find the best image to use for each dynamic region. Yet others try to register moving objects from different images in the spirit of maximizing dynamic range in dynamic regions. Furthermore, each algorithm may introduce different types of artifacts as they aim to eliminate ghosts. These artifacts may come in the form of noise, broken objects, under- and over-exposed regions, and residual ghosting. Given the high volume of studies conducted in this field over the recent years, a comprehensive survey of the state of the art is required. Thus, the first goal of this paper is to provide this survey. Secondly, the large number of algorithms brings about the need to classify them. Thus the second goal of this paper is to propose a taxonomy of deghosting algorithms which can be used to group existing and future algorithms into meaningful classes. Thirdly, the existence of a large number of algorithms brings about the need to evaluate their effectiveness, as each new algorithm claims to outperform its precedents. Therefore, the last goal of this paper is to share the results of a subjective experiment which aims to evaluate various state-of-the-art deghosting algorithms.

...read moreread less

115 citations

Proceedings Article•DOI•

Damage detection from aerial images via convolutional neural networks

[...]

Aito Fujita, Ken Sakurada¹, Tomoyuki Imaizumi, Riho Ito, Shuhei Hikosaka, Ryosuke Nakamura - Show less +2 more•Institutions (1)

Nagoya University¹

08 May 2017

TL;DR: The experimental results show that the CNN-based washed-aways detection system achieves 94–96% classification accuracy across all conditions, indicating the promising applicability of CNNs for washed-away building detection.

...read moreread less

Abstract: This paper explores the effective use of Convolutional Neural Networks (CNNs) in the context of washed-away building detection from pre- and post-tsunami aerial images. To this end, we compile a dedicated, labeled aerial image dataset to construct models that classify whether a building is washed-away. Each datum in the set is a pair of pre- and post-tsunami image patches and encompasses a target building at the center of the patch. Using this dataset, we comprehensively evaluate CNNs from a practical-application viewpoint, e.g., input scenarios (pre-tsunami images are not always available), input scales (building size varies) and different configurations for CNNs. The experimental results show that our CNN-based washed-away detection system achieves 94–96% classification accuracy across all conditions, indicating the promising applicability of CNNs for washed-away building detection.

...read moreread less

115 citations

Additional excerpts

...Such pair comparisons have been addressed using one-branch [4] or two-branch (also called Siamese) [3, 4, 5] CNNs, and they achieved the best performance by a significant margin compared to hand-crafted features such as SIFT [6]....
[...]

Proceedings Article•DOI•

Joint Recovery of Dense Correspondence and Cosegmentation in Two Images

[...]

Tatsunori Taniai¹, Sudipta N. Sinha², Yoichi Sato¹•Institutions (2)

University of Tokyo¹, Microsoft²

01 Jun 2016

TL;DR: This work proposes a new technique to jointly recover cosegmentation and dense per-pixel correspondence in two images by parameterizing the correspondence field using piecewise similarity transformations and recovers a mapping between the estimated common "foreground" regions in the two images.

...read moreread less

Abstract: We propose a new technique to jointly recover cosegmentation and dense per-pixel correspondence in two images. Our method parameterizes the correspondence field using piecewise similarity transformations and recovers a mapping between the estimated common "foreground" regions in the two images allowing them to be precisely aligned. Our formulation is based on a hierarchical Markov random field model with segmentation and transformation labels. The hierarchical structure uses nested image regions to constrain inference across multiple scales. Unlike prior hierarchical methods which assume that the structure is given, our proposed iterative technique dynamically recovers the structure along with the labeling. This joint inference is performed in an energy minimization framework using iterated graph cuts. We evaluate our method on a new dataset of 400 image pairs with manually obtained ground truth, where it outperforms state-of-the-art methods designed specifically for either cosegmentation or correspondence estimation.

...read moreread less

115 citations

Cites methods from "Distinctive Image Features from Sca..."

...We use this heuristic to predict the probability of a true match, motivated by the ratio test [37] (see Figure 4)....
[...]

Book Chapter•DOI•

One-shot learning gesture recognition from RGB-D data using bag of features

[...]

Jun Wan¹, Qiuqi Ruan¹, Wei Li¹, Shuang Deng•Institutions (1)

Beijing Jiaotong University¹

01 Jan 2013-Journal of Machine Learning Research

TL;DR: A new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform (3D EMoSIFT) is proposed, which fuses RGB-D data and leads to a much lower reconstruction error and achieves better performance.

...read moreread less

Abstract: For one-shot learning gesture recognition, two important challenges are: how to extract distinctive features and how to learn a discriminative model from only one training sample per gesture class. For feature extraction, a new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform (3D EMoSIFT) is proposed, which fuses RGB-D data. Compared with other features, the new feature set is invariant to scale and rotation, and has more compact and richer visual representations. For learning a discriminative model, all features extracted from training samples are clustered with the k-means algorithm to learn a visual codebook. Then, unlike the traditional bag of feature (BoF) models using vector quantization (VQ) to map each feature into a certain visual codeword, a sparse coding method named simulation orthogonal matching pursuit (SOMP) is applied and thus each feature can be represented by some linear combination of a small number of codewords. Compared with VQ, SOMP leads to a much lower reconstruction error and achieves better performance. The proposed approach has been evaluated on ChaLearn gesture database and the result has been ranked amongst the top best performing techniques on ChaLearn gesture challenge (round 2).

...read moreread less

115 citations

Journal Article•DOI•

Airport Detection From Large IKONOS Images Using Clustered SIFT Keypoints and Region Information

[...]

Chao Tao¹, Yihua Tan¹, Huajie Cai¹, Jinwen Tian¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Jan 2011-IEEE Geoscience and Remote Sensing Letters

TL;DR: A novel region-location algorithm is proposed, which exploits the clustering information from matched SIFT keypoints, as well as the region information extracted through the image segmentation, which outperforms the existing algorithms in terms of detection accuracy.

...read moreread less

Abstract: This letter presents a new method for airport detection from large high-spatial-resolution IKONOS images. To this end, we describe airport by a set of scale-invariant feature transform (SIFT) keypoints and detect it using an improved SIFT matching strategy. After obtaining SIFT matched keypoints, to both discard the redundant matched points and locate the possible regions of candidates that contain the target, a novel region-location algorithm is proposed, which exploits the clustering information from matched SIFT keypoints, as well as the region information extracted through the image segmentation. Finally, airport recognition is achieved by applying the prior knowledge to the candidate regions. Experimental results show that the proposed approach outperforms the existing algorithms in terms of detection accuracy.

...read moreread less

114 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...SIFT is one of the most successful local feature descriptors and has been widely used to match objects represented by template images in a given image [12]....
[...]
...Like [12], we first describe the airport by a template image, as shown in Fig....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
…
115
116
117
118
119
120
121
…
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations