Distinctive Image Features from Scale-Invariant Keypoints

doi:10.1023/B:VISI.0000029664.99615.94

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

read less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Learning to Find Good Correspondences

[...]

Kwang Moo Yi¹, Eduard Trulls², Yuki Ono³, Vincent Lepetit⁴, Mathieu Salzmann², Pascal Fua² - Show less +2 more•Institutions (4)

University of Victoria¹, École Polytechnique Fédérale de Lausanne², Sony Broadcast & Professional Research Laboratories³, Graz University of Technology⁴

01 Jan 2018

TL;DR: In this paper, a multi-layer perceptron operating on pixel coordinates rather than directly on the image is proposed to learn to find good correspondences for wide-baseline stereo.

...read moreread less

Abstract: We develop a deep architecture to learn to find good correspondences for wide-baseline stereo. Given a set of putative sparse matches and the camera intrinsics, we train our network in an end-to-end fashion to label the correspondences as inliers or outliers, while simultaneously using them to recover the relative pose, as encoded by the essential matrix. Our architecture is based on a multi-layer perceptron operating on pixel coordinates rather than directly on the image, and is thus simple and small. We introduce a novel normalization technique, called Context Normalization, which allows us to process each data point separately while embedding global information in it, and also makes the network invariant to the order of the correspondences. Our experiments on multiple challenging datasets demonstrate that our method is able to drastically improve the state of the art with little training data.

...read moreread less

456 citations

Proceedings Article•DOI•

Visual odometry based on stereo image sequences with RANSAC-based outlier rejection scheme

[...]

Bernd Kitt¹, Andreas Geiger¹, Henning Lategahn¹•Institutions (1)

Karlsruhe Institute of Technology¹

21 Jun 2010

TL;DR: This paper proposes a novel approach for estimating the egomotion of the vehicle from a sequence of stereo images which is directly based on the trifocal geometry between image triples, thus no time expensive recovery of the 3-dimensional scene structure is needed.

...read moreread less

Abstract: A common prerequisite for many vision-based driver assistance systems is the knowledge of the vehicle's own movement. In this paper we propose a novel approach for estimating the egomotion of the vehicle from a sequence of stereo images. Our method is directly based on the trifocal geometry between image triples, thus no time expensive recovery of the 3-dimensional scene structure is needed. The only assumption we make is a known camera geometry, where the calibration may also vary over time. We employ an Iterated Sigma Point Kalman Filter in combination with a RANSAC-based outlier rejection scheme which yields robust frame-to-frame motion estimation even in dynamic environments. A high-accuracy inertial navigation system is used to evaluate our results on challenging real-world video sequences. Experiments show that our approach is clearly superior compared to other filtering techniques in terms of both, accuracy and run-time.

...read moreread less

456 citations

Proceedings Article•DOI•

PlayAnywhere: a compact interactive tabletop projection-vision system

[...]

Andrew D. Wilson¹•Institutions (1)

Microsoft¹

23 Oct 2005

TL;DR: PlayAnywhere is introduced, a front-projected computer vision-based interactive table system which uses a new commercially available projection technology to obtain a compact, self-contained form factor and makes a number of contributions related to image processing techniques for front- Projection-vision table systems.

...read moreread less

Abstract: We introduce PlayAnywhere, a front-projected computer vision-based interactive table system which uses a new commercially available projection technology to obtain a compact, self-contained form factor. PlayAnywhere's configuration addresses installation, calibration, and portability issues that are typical of most vision-based table systems, and thereby is particularly motivated in consumer applications. PlayAnywhere also makes a number of contributions related to image processing techniques for front-projected vision-based table systems, including a shadow-based touch detection algorithm, a fast, simple visual bar code scheme tailored to projection-vision table systems, the ability to continuously track sheets of paper, and an optical flow-based algorithm for the manipulation of onscreen objects that does not rely on fragile tracking algorithms.

...read moreread less

456 citations

Cites methods from "Distinctive Image Features from Sca..."

...For example, with the page tracking algorithm described above and generic object recognition algorithm such as SIFT [ 18 ] it is a straightforward task to implement a system which recognizes which of several known audio CDs is placed on the table surface....
[...]

Journal Article•DOI•

An Efficient Earth Mover's Distance Algorithm for Robust Histogram Comparison

[...]

Haibin Ling¹, Kazunori Okada•Institutions (1)

University of Maryland, College Park¹

01 May 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed EMD-L1 significantly simplifies the original linear programming formulation of EMD, and empirically shows that this new algorithm has an average time complexity of O(N2), which significantly improves the best reported supercubic complexity of the original EMD.

...read moreread less

Abstract: We propose EMD-L1: a fast and exact algorithm for computing the earth mover's distance (EMD) between a pair of histograms. The efficiency of the new algorithm enables its application to problems that were previously prohibitive due to high time complexities. The proposed EMD-L1 significantly simplifies the original linear programming formulation of EMD. Exploiting the L1 metric structure, the number of unknown variables in EMD-L1 is reduced to O(N) from O(N2) of the original EMD for a histogram with N bins. In addition, the number of constraints is reduced by half and the objective function of the linear program is simplified. Formally, without any approximation, we prove that the EMD-L1 formulation is equivalent to the original EMD with a L1 ground distance. To perform the EMD-L1 computation, we propose an efficient tree-based algorithm, Tree-EMD. Tree-EMD exploits the fact that a basic feasible solution of the simplex algorithm-based solver forms a spanning tree when we interpret EMD-L1 as a network flow optimization problem. We empirically show that this new algorithm has an average time complexity of O(N2), which significantly improves the best reported supercubic complexity of the original EMD. The accuracy of the proposed methods is evaluated by experiments for two computation-intensive problems: shape recognition and interest point matching using multidimensional histogram-based local features. For shape recognition, EMD-L1 is applied to compare shape contexts on the widely tested MPEG7 shape data set, as well as an articulated shape data set. For interest point matching, SIFT, shape context and spin image are tested on both synthetic and real image pairs with large geometrical deformation, illumination change, and heavy intensity noise. The results demonstrate that our EMD-L1-based solutions outperform previously reported state-of-the-art features and distance measures in solving the two tasks

...read moreread less

456 citations

Book Chapter•DOI•

Unsupervised Discovery of Mid-Level Discriminative Patches

[...]

Saurabh Singh¹, Abhinav Gupta¹, Alexei A. Efros¹•Institutions (1)

Carnegie Mellon University¹

07 Oct 2012

TL;DR: The paper experimentally demonstrates the effectiveness of discriminative patches as an unsupervised mid-level visual representation, suggesting that it could be used in place of visual words for many tasks.

...read moreread less

Abstract: The goal of this paper is to discover a set of discriminative patches which can serve as a fully unsupervised mid-level visual representation The desired patches need to satisfy two requirements: 1) to be representative, they need to occur frequently enough in the visual world; 2) to be discriminative, they need to be different enough from the rest of the visual world The patches could correspond to parts, objects, "visual phrases", etc but are not restricted to be any one of them We pose this as an unsupervised discriminative clustering problem on a huge dataset of image patches We use an iterative procedure which alternates between clustering and training discriminative classifiers, while applying careful cross-validation at each step to prevent overfitting The paper experimentally demonstrates the effectiveness of discriminative patches as an unsupervised mid-level visual representation, suggesting that it could be used in place of visual words for many tasks Furthermore, discriminative patches can also be used in a supervised regime, such as scene classification, where they demonstrate state-of-the-art performance on the MIT Indoor-67 dataset

...read moreread less

455 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
…
85
86
87
88
89
90
91
…
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

"Distinctive Image Features from Sca..." refers background or methods in this paper

...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....
[...]
...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....
[...]
...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....
[...]
...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...
[...]
...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....
[...]

Book•

Multiple view geometry in computer vision

[...]

Richard Hartley¹, Andrew Zisserman²•Institutions (2)

Australian National University¹, University of Oxford²

01 Jan 2000

TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.

...read moreread less

Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

...read moreread less

15,558 citations

Multiple View Geometry in Computer Vision.

[...]

Bernhard P. Wrobel

01 Jan 2001

TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.

...read moreread less

Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

...read moreread less

14,282 citations

"Distinctive Image Features from Sca..." refers background in this paper

...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....
[...]

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations