Distinctive Image Features from Scale-Invariant Keypoints

doi:10.1023/B:VISI.0000029664.99615.94

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

read less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Objective comparison of particle tracking methods

[...]

Nicolas Chenouard¹, Ihor Smal¹, Fabrice de Chaumont², Martin Maška³, Martin Maška⁴, Ivo F. Sbalzarini¹, Yuanhao Gong¹, Janick Cardinale¹, Craig Carthel¹, Stefano Coraluppi¹, Mark R. Winter¹, Andrew R. Cohen⁵, William J. Godinez¹, Karl Rohr¹, Yannis Kalaidzidis¹, Liang Liang¹, James S. Duncan¹, Hongying Shen¹, Yingke Xu⁶, Klas E. G. Magnusson⁷, Joakim Jalden⁷, Helen M. Blau¹, Perrine Paul-Gilloteaux⁸, Philippe Roudot⁹, Charles Kervrann⁹, François Waharte⁸, Jean-Yves Tinevez¹⁰, Spencer L. Shorte¹⁰, Joost Willemse¹¹, Katherine Celler¹¹, Gilles P. van Wezel¹¹, Han-Wei Dan¹², Yuh-Show Tsai¹², Carlos Ortiz de Solórzano³, Jean-Christophe Olivo-Marin¹, Erik Meijering¹ - Show less +32 more•Institutions (12)

Max Planck Society¹, Centre national de la recherche scientifique², University of Navarra³, Masaryk University⁴, Drexel University⁵, Zhejiang University⁶, Royal Institute of Technology⁷, Curie Institute⁸, French Institute for Research in Computer Science and Automation⁹, Pasteur Institute¹⁰, Leiden University¹¹, Chung Yuan Christian University¹²

01 Mar 2014-Nature Methods

TL;DR: Although no single method performed best across all scenarios, the results revealed clear differences between the various approaches, leading to notable practical conclusions for users and developers.

...read moreread less

Abstract: Particle tracking is of key importance for quantitative analysis of intracellular dynamic processes from time-lapse microscopy image data. Because manually detecting and following large numbers of individual particles is not feasible, automated computational methods have been developed for these tasks by many groups. Aiming to perform an objective comparison of methods, we gathered the community and organized an open competition in which participating teams applied their own methods independently to a commonly defined data set including diverse scenarios. Performance was assessed using commonly defined measures. Although no single method performed best across all scenarios, the results revealed clear differences between the various approaches, leading to notable practical conclusions for users and developers.

...read moreread less

819 citations

Proceedings Article•DOI•

Towards Internet-scale multi-view stereo

[...]

Yasutaka Furukawa¹, Brian Curless², Steven M. Seitz¹, Richard Szeliski³•Institutions (3)

Google¹, University of Washington², Microsoft³

13 Jun 2010

TL;DR: An approach for enabling existing multi-view stereo methods to operate on extremely large unstructured photo collections to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstructions.

...read moreread less

Abstract: This paper introduces an approach for enabling existing multi-view stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstructions. This overlapping clustering problem is formulated as a constrained optimization and solved iteratively. The merging algorithm, designed to be parallel and out-of-core, incorporates robust filtering steps to eliminate low-quality reconstructions and enforce global visibility constraints. The approach has been tested on several large datasets downloaded from Flickr.com, including one with over ten thousand images, yielding a 3D reconstruction with nearly thirty million points.

...read moreread less

817 citations

Proceedings Article•DOI•

Creating efficient codebooks for visual recognition

[...]

Frédéric Jurie¹, Bill Triggs¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

17 Oct 2005

TL;DR: It is shown that dense representations outperform equivalent keypoint based ones on these tasks and that SVM or mutual information based feature selection starting from a dense codebook further improves the performance.

...read moreread less

Abstract: Visual codebook based quantization of robust appearance descriptors extracted from local image patches is an effective means of capturing image statistics for texture analysis and scene classification. Codebooks are usually constructed by using a method such as k-means to cluster the descriptor vectors of patches sampled either densely ('textons') or sparsely ('bags of features' based on key-points or salience measures) from a set of training images. This works well for texture analysis in homogeneous images, but the images that arise in natural object recognition tasks have far less uniform statistics. We show that for dense sampling, k-means over-adapts to this, clustering centres almost exclusively around the densest few regions in descriptor space and thus failing to code other informative regions. This gives suboptimal codes that are no better than using randomly selected centres. We describe a scalable acceptance-radius based clusterer that generates better codebooks and study its performance on several image classification tasks. We also show that dense representations outperform equivalent keypoint based ones on these tasks and that SVM or mutual information based feature selection starting from a dense codebook further improves the performance.

...read moreread less

817 citations

Proceedings Article•DOI•

Segmentation as selective search for object recognition

[...]

Koen E. A. van de Sande¹, Jasper Uijlings², Theo Gevers¹, Arnold W. M. Smeulders¹•Institutions (2)

University of Amsterdam¹, University of Trento²

06 Nov 2011

TL;DR: This work adapt segmentation as a selective search by reconsidering segmentation to generate many approximate locations over few and precise object delineations because an object whose location is never generated can not be recognised and appearance and immediate nearby context are most effective for object recognition.

...read moreread less

Abstract: For object recognition, the current state-of-the-art is based on exhaustive search. However, to enable the use of more expensive features and classifiers and thereby progress beyond the state-of-the-art, a selective search strategy is needed. Therefore, we adapt segmentation as a selective search by reconsidering segmentation: We propose to generate many approximate locations over few and precise object delineations because (1) an object whose location is never generated can not be recognised and (2) appearance and immediate nearby context are most effective for object recognition. Our method is class-independent and is shown to cover 96.7% of all objects in the Pascal VOC 2007 test set using only 1,536 locations per image. Our selective search enables the use of the more expensive bag-of-words method which we use to substantially improve the state-of-the-art by up to 8.5% for 8 out of 20 classes on the Pascal VOC 2010 detection challenge.

...read moreread less

815 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...For these measurements, we aggregate the gradient magnitude in 8 directions over a region, just like in a single subregion of SIFT with no Gaussian weighting....
[...]
...We extract SIFT [18] and two recommended colour SIFTs from [26], OpponentSIFT and RGB-SIFT....
[...]
...Stexture (a, b) is defined as the histogram intersection be tween SIFT-like texture measurements [18]....
[...]
...Stexture(a, b) is defined as the histogram intersection between SIFT-like texture measurements [18]....
[...]
...We extract SIFT [18] and two recommended colour SIFTs from [26], OpponentS 1FT and RGB-SIFT....
[...]

Proceedings Article•DOI•

Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model

[...]

Spyros Gidaris¹, Nikos Komodakis¹•Institutions (1)

École des ponts ParisTech¹

07 Dec 2015

TL;DR: An object detection system that relies on a multi-region deep convolutional neural network that also encodes semantic segmentation-aware features that aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization.

...read moreread less

Abstract: We propose an object detection system that relies on a multi-region deep convolutional neural network (CNN) that also encodes semantic segmentation-aware features. The resulting CNN-based representation aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization. We exploit the above properties of our recognition module by integrating it on an iterative localization mechanism that alternates between scoring a box proposal and refining its location with a deep CNN regression model. Thanks to the efficient use of our modules, we detect objects with very high localization accuracy. On the detection challenges of PASCAL VOC2007 and PASCAL VOC2012 we achieve mAP of 78.2% and 73.9% correspondingly, surpassing any other published work by a significant margin.

...read moreread less

810 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
…
43
44
45
46
47
48
49
…
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

"Distinctive Image Features from Sca..." refers background or methods in this paper

...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....
[...]
...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....
[...]
...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....
[...]
...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...
[...]
...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....
[...]

Book•

Multiple view geometry in computer vision

[...]

Richard Hartley¹, Andrew Zisserman²•Institutions (2)

Australian National University¹, University of Oxford²

01 Jan 2000

TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.

...read moreread less

Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

...read moreread less

15,558 citations

Multiple View Geometry in Computer Vision.

[...]

Bernhard P. Wrobel

01 Jan 2001

TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.

...read moreread less

Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

...read moreread less

14,282 citations

"Distinctive Image Features from Sca..." refers background in this paper

...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....
[...]

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations