Distinctive Image Features from Scale-Invariant Keypoints

doi:10.1023/B:VISI.0000029664.99615.94

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

read less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Constrained parametric min-cuts for automatic object segmentation

[...]

Joao Carreira¹, Cristian Sminchisescu¹•Institutions (1)

University of Bonn¹

13 Jun 2010

TL;DR: It is shown that this algorithm significantly outperforms the state of the art for low-level segmentation in the VOC09 segmentation dataset and achieves the same average best segmentation covering as the best performing technique to date.

...read moreread less

Abstract: We present a novel framework for generating and ranking plausible objects hypotheses in an image using bottom-up processes and mid-level cues The object hypotheses are represented as figure-ground segmentations, and are extracted automatically, without prior knowledge about properties of individual object classes, by solving a sequence of constrained parametric min-cut problems (CPMC) on a regular image grid We then learn to rank the object hypotheses by training a continuous model to predict how plausible the segments are, given their mid-level region properties We show that this algorithm significantly outperforms the state of the art for low-level segmentation in the VOC09 segmentation dataset It achieves the same average best segmentation covering as the best performing technique to date [2], 061 when using just the top 7 ranked segments, instead of the full hierarchy in [2] Our method achieves 078 average best covering using 154 segments In a companion paper [18], we also show that the algorithm achieves state-of-the art results when used in a segmentation-based recognition pipeline

...read moreread less

512 citations

Cites methods from "Distinctive Image Features from Sca..."

...On VOC2009 we have also run experiments where we have complemented the initial feature set with additional appearance and shape features — a bag of dense SIFT [68] features computed on the foreground mask, a bag of Local Shape Contexts [69] computed on its boundary, and a HOG pyramid [70] with3 levels computed on the bounding box fitted on the boundary of the segment, for a total of 1,054 features....
[...]
...On VOC2009 we have also run experiments where we have complemented the feature set with additional appearance and shape features— a bag of dense SIFT [19] features on the foreground mask, a bag of Local Shape Contexts [4] on its boundary, and a Pyramid HOG [5] with 3 levels — for a total of 1054 features....
[...]

Book Chapter•DOI•

Adaptive and generic corner detection based on the accelerated segment test

[...]

Elmar Mair¹, Gregory D. Hager², Darius Burschka¹, Michael Suppa³, Gerhard Hirzinger³ - Show less +1 more•Institutions (3)

Technische Universität München¹, Johns Hopkins University², German Aerospace Center³

05 Sep 2010

TL;DR: It is shown how the accelerated segment test, which underlies FAST, can be significantly improved by making it more generic while increasing its performance, by finding the optimal decision tree in an extended configuration space, and demonstrating how specialized trees can be combined to yield an adaptive and generic accelerated segments test.

...read moreread less

Abstract: The efficient detection of interesting features is a crucial step for various tasks in Computer Vision. Corners are favored cues due to their two dimensional constraint and fast algorithms to detect them. Recently, a novel corner detection approach, FAST, has been presentedwhich outperforms previous algorithms in both computational performance and repeatability. We will show how the accelerated segment test, which underlies FAST, can be significantly improved by making it more generic while increasing its performance.We do so by finding the optimal decision tree in an extended configuration space, and demonstrating how specialized trees can be combined to yield an adaptive and generic accelerated segment test. The resulting method provides high performance for arbitrary environments and so unlike FAST does not have to be adapted to a specific scene structure. We will also discuss how different test patterns affect the corner response of the accelerated segment test.

...read moreread less

512 citations

Proceedings Article•DOI•

DeepContour: A deep convolutional feature learned by positive-sharing loss for contour detection

[...]

Wei Shen¹, Xinggang Wang², Yan Wang³, Xiang Bai², Zhang Zhijiang¹ - Show less +1 more•Institutions (3)

Shanghai University¹, Huazhong University of Science and Technology², Nanyang Technological University³

07 Jun 2015

TL;DR: This work shows that contour detection accuracy can be improved by instead making the use of the deep features learned from convolutional neural networks (CNNs), while rather than using the networks as a blackbox feature extractor, it customize the training strategy by partitioning contour (positive) data into subclasses and fitting each subclass by different model parameters.

...read moreread less

Abstract: Contour detection serves as the basis of a variety of computer vision tasks such as image segmentation and object recognition. The mainstream works to address this problem focus on designing engineered gradient features. In this work, we show that contour detection accuracy can be improved by instead making the use of the deep features learned from convolutional neural networks (CNNs). While rather than using the networks as a blackbox feature extractor, we customize the training strategy by partitioning contour (positive) data into subclasses and fitting each subclass by different model parameters. A new loss function, named positive-sharing loss, in which each subclass shares the loss for the whole positive class, is proposed to learn the parameters. Compared to the sofmax loss function, the proposed one, introduces an extra regularizer to emphasizes the losses for the positive and negative classes, which facilitates to explore more discriminative features. Our experimental results demonstrate that learned deep features can achieve top performance on Berkeley Segmentation Dataset and Benchmark (BSDS500) and obtain competitive cross dataset generalization result on the NYUD dataset.

...read moreread less

512 citations

Proceedings Article•DOI•

Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image

[...]

Eric Brachmann¹, Frank Michel¹, Alexander Krull¹, Michael Ying Yang¹, Stefan Gumhold¹, Carsten Rother¹ - Show less +2 more•Institutions (1)

Dresden University of Technology¹

01 Jun 2016

TL;DR: A regularized, auto-context regression framework is developed which iteratively reduces uncertainty in object coordinate and object label predictions and an efficient way to marginalize object coordinate distributions over depth is introduced to deal with missing depth information.

...read moreread less

Abstract: In recent years, the task of estimating the 6D pose of object instances and complete scenes, i.e. camera localization, from a single input image has received considerable attention. Consumer RGB-D cameras have made this feasible, even for difficult, texture-less objects and scenes. In this work, we show that a single RGB image is sufficient to achieve visually convincing results. Our key concept is to model and exploit the uncertainty of the system at all stages of the processing pipeline. The uncertainty comes in the form of continuous distributions over 3D object coordinates and discrete distributions over object labels. We give three technical contributions. Firstly, we develop a regularized, auto-context regression framework which iteratively reduces uncertainty in object coordinate and object label predictions. Secondly, we introduce an efficient way to marginalize object coordinate distributions over depth. This is necessary to deal with missing depth information. Thirdly, we utilize the distributions over object labels to detect multiple objects simultaneously with a fixed budget of RANSAC hypotheses. We tested our system for object pose estimation and camera localization on commonly used data sets. We see a major improvement over competing systems.

...read moreread less

511 citations

Proceedings Article•DOI•

Fast k nearest neighbor search using GPU

[...]

Vincent Garcia¹, Eric Debreuve¹, Michel Barlaud¹•Institutions (1)

Centre national de la recherche scientifique¹

23 Jun 2008

TL;DR: A CUDA implementation of the ldquobrute forcerdquo kNN search and it is shown a speed increase on synthetic and real data by up to one or two orders of magnitude depending on the data, with a quasi-linear behavior with respect to the data size in a given, practical range.

...read moreread less

Abstract: Statistical measures coming from information theory represent interesting bases for image and video processing tasks such as image retrieval and video object tracking. For example, let us mention the entropy and the Kullback-Leibler divergence. Accurate estimation of these measures requires to adapt to the local sample density, especially if the data are high-dimensional. The k nearest neighbor (kNN) framework has been used to define efficient variable-bandwidth kernel-based estimators with such a locally adaptive property. Unfortunately, these estimators are computationally intensive since they rely on searching neighbors among large sets of d-dimensional vectors. This computational burden can be reduced by pre-structuring the data, e.g. using binary trees as proposed by the approximated nearest neighbor (ANN) library. Yet, the recent opening of graphics processing units (GPU) to general-purpose computation by means of the NVIDIA CUDA API offers the image and video processing community a powerful platform with parallel calculation capabilities. In this paper, we propose a CUDA implementation of the ldquobrute forcerdquo kNN search and we compare its performances to several CPU-based implementations including an equivalent brute force algorithm and ANN. We show a speed increase on synthetic and real data by up to one or two orders of magnitude depending on the data, with a quasi-linear behavior with respect to the data size in a given, practical range.

...read moreread less

509 citations

Cites background from "Distinctive Image Features from Sca..."

...2. Compute the description vector for each extracted keypoint [ Low03 , MS05]....
[...]
...Content-based image retrieval (CBIR) [LSDJ06, Low03 ] is the application of computer vision to the image retrieval problem, that is, the pro blem of searching for digital images in large databases....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
…
73
74
75
76
77
78
79
…
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

"Distinctive Image Features from Sca..." refers background or methods in this paper

...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....
[...]
...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....
[...]
...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....
[...]
...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...
[...]
...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....
[...]

Book•

Multiple view geometry in computer vision

[...]

Richard Hartley¹, Andrew Zisserman²•Institutions (2)

Australian National University¹, University of Oxford²

01 Jan 2000

TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.

...read moreread less

Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

...read moreread less

15,558 citations

Multiple View Geometry in Computer Vision.

[...]

Bernhard P. Wrobel

01 Jan 2001

TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.

...read moreread less

Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

...read moreread less

14,282 citations

"Distinctive Image Features from Sca..." refers background in this paper

...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....
[...]

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations