Distinctive Image Features from Scale-Invariant Keypoints

doi:10.1023/B:VISI.0000029664.99615.94

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

read less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Mapping landslide displacements using Structure from Motion (SfM) and image correlation of multi-temporal UAV photography:

[...]

Arko Lucieer¹, Steven M. de Jong², Darren Turner¹•Institutions (2)

University of Tasmania¹, Utrecht University²

01 Feb 2014-Progress in Physical Geography

TL;DR: In this article, a Structure from Motion (SfM) workflow was applied to derive a 3D model of a landslide in southeast Tasmania from multi-view UAV photography and the geometric accuracy of the model and resulting DEMs and orthophoto mosaics was tested with ground control points coordinated with geodetic GPS receivers.

...read moreread less

Abstract: In this study, we present a flexible, cost-effective, and accurate method to monitor landslides using a small unmanned aerial vehicle (UAV) to collect aerial photography. In the first part, we apply a Structure from Motion (SfM) workflow to derive a 3D model of a landslide in southeast Tasmania from multi-view UAV photography. The geometric accuracy of the 3D model and resulting DEMs and orthophoto mosaics was tested with ground control points coordinated with geodetic GPS receivers. A horizontal accuracy of 7 cm and vertical accuracy of 6 cm was achieved. In the second part, two DEMs and orthophoto mosaics acquired on 16 July 2011 and 10 November 2011 were compared to study landslide dynamics. The COSI-Corr image correlation technique was evaluated to quantify and map terrain displacements. The magnitude and direction of the displacement vectors derived from correlating two hillshaded DEM layers corresponded to a visual interpretation of landslide change. Results show that the algorithm can accurately map displacements of the toes, chunks of soil, and vegetation patches on top of the landslide, but is not capable of mapping the retreat of the main scarp. The conclusion is that UAV-based imagery in combination with 3D scene reconstruction and image correlation algorithms provide flexible and effective tools to map and monitor landslide dynamics.

...read moreread less

606 citations

Cites background from "Distinctive Image Features from Sca..."

...Based on advances in image feature recognition, such as the scale invariant feature transform (SIFT) (Lowe, 2004), characteristic image objects can be automatically at ARIZONA STATE UNIV on October 24, 2014ppg.sagepub.comDownloaded from detected, described, and matched between photographs....
[...]

Proceedings Article•DOI•

Robust Scene Text Recognition with Automatic Rectification

[...]

Baoguang Shi¹, Xinggang Wang¹, Pengyuan Lyu¹, Cong Yao¹, Xiang Bai¹ - Show less +1 more•Institutions (1)

Huazhong University of Science and Technology¹

12 Mar 2016

TL;DR: This article proposed a robust text recognizer with automatic rectification (RARE), which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN).

...read moreread less

Abstract: Recognizing text in natural images is a challenging task with many unsolved problems. Different from those in documents, words in natural images often possess irregular shapes, which are caused by perspective distortion, curved character placement, etc. We propose RARE (Robust text recognizer with Automatic REctification), a recognition model that is robust to irregular text. RARE is a speciallydesigned deep neural network, which consists of a Spatial Transformer Network (STN) and a Sequence Recognition Network (SRN). In testing, an image is firstly rectified via a predicted Thin-Plate-Spline (TPS) transformation, into a more "readable" image for the following SRN, which recognizes text through a sequence recognition approach. We show that the model is able to recognize several types of irregular text, including perspective text and curved text. RARE is end-to-end trainable, requiring only images and associated text labels, making it convenient to train and deploy the model in practical systems. State-of-the-art or highly-competitive performance achieved on several benchmarks well demonstrates the effectiveness of the proposed model.

...read moreread less

606 citations

Proceedings Article•DOI•

Deep learning of binary hash codes for fast image retrieval

[...]

Kevin Lin¹, Huei-Fang Yang¹, Jen-Hao Hsiao², Chu-Song Chen¹•Institutions (2)

Academia Sinica¹, Yahoo!²

07 Jun 2015

TL;DR: This work proposes an effective deep learning framework to generate binary hash codes for fast image retrieval by employing a hidden layer for representing the latent concepts that dominate the class labels in convolutional neural networks.

...read moreread less

Abstract: Approximate nearest neighbor search is an efficient strategy for large-scale image retrieval. Encouraged by the recent advances in convolutional neural networks (CNNs), we propose an effective deep learning framework to generate binary hash codes for fast image retrieval. Our idea is that when the data labels are available, binary codes can be learned by employing a hidden layer for representing the latent concepts that dominate the class labels. The utilization of the CNN also allows for learning image representations. Unlike other supervised methods that require pair-wised inputs for binary code learning, our method learns hash codes and image representations in a point-wised manner, making it suitable for large-scale datasets. Experimental results show that our method outperforms several state-of-the-art hashing algorithms on the CIFAR-10 and MNIST datasets. We further demonstrate its scalability and efficacy on a large-scale dataset of 1 million clothing images.

...read moreread less

605 citations

Proceedings Article•DOI•

CSIFT: A SIFT Descriptor with Color Invariant Characteristics

[...]

Alaa E. Abdel-Hakim¹, Aly A. Farag¹•Institutions (1)

University of Louisville¹

17 Jun 2006

TL;DR: The built Colored SIFT (CSIFT) is more robust than the conventional SIFT with respect to color and photometrical variations and the evaluation results support the potential of the proposed approach.

...read moreread less

Abstract: SIFT has been proven to be the most robust local invariant feature descriptor. SIFT is designed mainly for gray images. However, color provides valuable information in object description and matching tasks. Many objects can be misclassified if their color contents are ignored. This paper addresses this problem and proposes a novel colored local invariant feature descriptor. Instead of using the gray space to represent the input image, the proposed approach builds the SIFT descriptors in a color invariant space. The built Colored SIFT (CSIFT) is more robust than the conventional SIFT with respect to color and photometrical variations. The evaluation results support the potential of the proposed approach.

...read moreread less

603 citations

Journal Article•DOI•

SHOT: Unique signatures of histograms for surface and texture description

[...]

Samuele Salti¹, Federico Tombari¹, Luigi Di Stefano¹•Institutions (1)

University of Bologna¹

01 Aug 2014-Computer Vision and Image Understanding

TL;DR: A thorough experimental evaluation vouches that SHOT outperforms state-of-the-art local descriptors in experiments addressing descriptor matching for object recognition, 3D reconstruction and shape retrieval.

...read moreread less

602 citations

Cites background from "Distinctive Image Features from Sca..."

...In the comparison presented in Proença et al. [50], SHOT outperforms Spin Images and dense SIFT for the task of object category recognition on the RGB-D Object dataset [51]....
[...]
...More specifically, as proposed in [35] we compute the ratio between the nearest neighbor and the second best: if the ratio is below a threshold a correspondence is established between the scene feature and its closest model feature....
[...]
...This scheme has been recently extended to a hybrid scheme, by computing a SIFT descriptor out of the resulting depth image [23]....
[...]
...By analyzing SIFT [35], arguably the most successful and widespread proposal among 2D descriptors, we have singled out the...
[...]
...By analyzing SIFT [35], arguably the most successful and widespread proposal among 2D descriptors, we have singled out the major reasons behind its effectiveness....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
…
59
60
61
62
63
64
65
…
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

"Distinctive Image Features from Sca..." refers background or methods in this paper

...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....
[...]
...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....
[...]
...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....
[...]
...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...
[...]
...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....
[...]

Book•

Multiple view geometry in computer vision

[...]

Richard Hartley¹, Andrew Zisserman²•Institutions (2)

Australian National University¹, University of Oxford²

01 Jan 2000

TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.

...read moreread less

Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

...read moreread less

15,558 citations

Multiple View Geometry in Computer Vision.

[...]

Bernhard P. Wrobel

01 Jan 2001

TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.

...read moreread less

Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

...read moreread less

14,282 citations

"Distinctive Image Features from Sca..." refers background in this paper

...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....
[...]

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations