Distinctive Image Features from Scale-Invariant Keypoints

doi:10.1023/B:VISI.0000029664.99615.94

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

read less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles

[...]

Bastian Leibe¹, Konrad Schindler¹, Nico Cornelis², L. Van Gool¹•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

01 Oct 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem is presented, formulated in a minimum description length hypothesis selection framework, which allows the system to recover from mismatches and temporarily lost tracks.

...read moreread less

Abstract: We present a novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem. Our approach is formulated in a minimum description length hypothesis selection framework, which allows our system to recover from mismatches and temporarily lost tracks. Building upon a state-of-the-art object detector, it performs multiview/multicategory object recognition to detect cars and pedestrians in the input images. The 2D object detections are checked for their consistency with (automatically estimated) scene geometry and are converted to 3D observations which are accumulated in a world coordinate frame. A subsequent trajectory estimation module analyzes the resulting 3D observations to find physically plausible spacetime trajectories. Tracking is achieved by performing model selection after every frame. At each time instant, our approach searches for the globally optimal set of spacetime trajectories which provides the best explanation for the current image and for all evidence collected so far while satisfying the constraints that no two objects may occupy the same physical space nor explain the same image pixels at any point in time. Successful trajectory hypotheses are then fed back to guide object detection in future frames. The optimization procedure is kept efficient through incremental computation and conservative hypothesis pruning. We evaluate our approach on several challenging video sequences and demonstrate its performance on both a surveillance-type scenario and a scenario where the input videos are taken from inside a moving vehicle passing through crowded city areas.

...read moreread less

385 citations

Proceedings Article•DOI•

Latent semantic sparse hashing for cross-modal similarity search

[...]

Jile Zhou¹, Guiguang Ding¹, Yuchen Guo¹•Institutions (1)

Tsinghua University¹

03 Jul 2014

TL;DR: A novel Latent Semantic Sparse Hashing (LSSH) is proposed to perform cross-modal similarity search by employing Sparse Coding and Matrix Factorization to capture the salient structures of images and learn the latent concepts from text.

...read moreread less

Abstract: Similarity search methods based on hashing for effective and efficient cross-modal retrieval on large-scale multimedia databases with massive text and images have attracted considerable attention. The core problem of cross-modal hashing is how to effectively construct correlation between multi-modal representations which are heterogeneous intrinsically in the process of hash function learning. Analogous to Canonical Correlation Analysis (CCA), most existing cross-modal hash methods embed the heterogeneous data into a joint abstraction space by linear projections. However, these methods fail to bridge the semantic gap more effectively, and capture high-level latent semantic information which has been proved that it can lead to better performance for image retrieval. To address these challenges, in this paper, we propose a novel Latent Semantic Sparse Hashing (LSSH) to perform cross-modal similarity search by employing Sparse Coding and Matrix Factorization. In particular, LSSH uses Sparse Coding to capture the salient structures of images, and Matrix Factorization to learn the latent concepts from text. Then the learned latent semantic features are mapped to a joint abstraction space. Moreover, an iterative strategy is applied to derive optimal solutions efficiently, and it helps LSSH to explore the correlation between multi-modal representations efficiently and automatically. Finally, the unified hashcodes are generated through the high level abstraction space by quantization. Extensive experiments on three different datasets highlight the advantage of our method under cross-modal scenarios and show that LSSH significantly outperforms several state-of-the-art methods.

...read moreread less

384 citations

Journal Article•

The Pyramid Match Kernel: Efficient Learning with Sets of Features

[...]

Kristen Grauman, Trevor Darrell¹•Institutions (1)

Massachusetts Institute of Technology¹

01 May 2007-Journal of Machine Learning Research

TL;DR: The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears.

...read moreread less

Abstract: In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences---generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches.

...read moreread less

383 citations

Book Chapter•DOI•

The 2005 PASCAL visual object classes challenge

[...]

Mark Everingham¹, Andrew Zisserman¹, Christopher Williams², Luc Van Gool, Moray Allan², Christopher M. Bishop³, Olivier Chapelle⁴, Navneet Dalal⁵, Thomas Deselaers⁶, Gyuri Dorkó⁵, Stefan Duffner⁷, J Eichhorn⁴, Jason Farquhar⁸, Mario Fritz⁹, Christophe Garcia⁷, Tom Griffiths², Frédéric Jurie⁵, Daniel Keysers⁶, Markus Koskela¹⁰, Jorma Laaksonen¹⁰, Diane Larlus⁵, Bastian Leibe⁹, Hongying Meng⁸, Hermann Ney⁶, Bernt Schiele⁹, Cordelia Schmid⁵, Edgar Seemann⁹, John Shawe-Taylor⁸, Amos Storkey², Sandor Szedmak⁸, Bill Triggs⁵, Ilkay Ulusoy¹¹, Ville Viitaniemi¹⁰, Jianguo Zhang⁵ - Show less +30 more•Institutions (11)

University of Oxford¹, University of Edinburgh², Microsoft³, Max Planck Society⁴, French Institute for Research in Computer Science and Automation⁵, RWTH Aachen University⁶, Orange S.A.⁷, University of Southampton⁸, Technische Universität Darmstadt⁹, Helsinki University of Technology¹⁰, Middle East Technical University¹¹

11 Apr 2005

TL;DR: The PASCAL Visual Object Classes Challenge (PASCALVOC) as mentioned in this paper was held from February to March 2005 to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects).

...read moreread less

Abstract: The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved.

...read moreread less

381 citations

Journal Article•DOI•

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

[...]

Yu-Gang Jiang¹, Zuxuan Wu¹, Jun Wang², Xiangyang Xue¹, Shih-Fu Chang³ - Show less +1 more•Institutions (3)

Fudan University¹, East China Normal University², Columbia University³

01 Feb 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics.

...read moreread less

Abstract: In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.

...read moreread less

380 citations

Cites methods from "Distinctive Image Features from Sca..."

...be downloaded after submitting an agreement form. Released resources include the videos, labels, a standard train/test split, a category hierarchy and several pre-computed descriptors (CNN [56], SIFT [79], Improved Dense Trajectories [2], and two audio features). We have also released the textual meta-data (e.g., tags) of the videos to support related research on Internet video analysis. See FCVID web...
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
…
110
111
112
113
114
115
116
…
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

"Distinctive Image Features from Sca..." refers background or methods in this paper

...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....
[...]
...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....
[...]
...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....
[...]
...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...
[...]
...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....
[...]

Book•

Multiple view geometry in computer vision

[...]

Richard Hartley¹, Andrew Zisserman²•Institutions (2)

Australian National University¹, University of Oxford²

01 Jan 2000

TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.

...read moreread less

Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

...read moreread less

15,558 citations

Multiple View Geometry in Computer Vision.

[...]

Bernhard P. Wrobel

01 Jan 2001

TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.

...read moreread less

Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

...read moreread less

14,282 citations

"Distinctive Image Features from Sca..." refers background in this paper

...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....
[...]

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations