Distinctive Image Features from Scale-Invariant Keypoints

doi:10.1023/B:VISI.0000029664.99615.94

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

read less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Nonlinear disparity mapping for stereoscopic 3D

[...]

Manuel Lang¹, Alexander Hornung¹, Oliver Wang¹, Steven Poulakos¹, Aljoscha Smolic¹, Markus Gross¹ - Show less +2 more•Institutions (1)

Disney Research¹

26 Jul 2010

TL;DR: The most important perceptual aspects of stereo vision are discussed and their implications for stereoscopic content creation are formalized into a set of basic disparity mapping operators that enable us to control and retarget the depth of a stereoscopic scene in a nonlinear and locally adaptive fashion.

...read moreread less

Abstract: This paper addresses the problem of remapping the disparity range of stereoscopic images and video. Such operations are highly important for a variety of issues arising from the production, live broadcast, and consumption of 3D content. Our work is motivated by the observation that the displayed depth and the resulting 3D viewing experience are dictated by a complex combination of perceptual, technological, and artistic constraints. We first discuss the most important perceptual aspects of stereo vision and their implications for stereoscopic content creation. We then formalize these insights into a set of basic disparity mapping operators. These operators enable us to control and retarget the depth of a stereoscopic scene in a nonlinear and locally adaptive fashion. To implement our operators, we propose a new strategy based on stereoscopic warping of the input video streams. From a sparse set of stereo correspondences, our algorithm computes disparity and image-based saliency estimates, and uses them to compute a deformation of the input views so as to meet the target disparities. Our approach represents a practical solution for actual stereo production and display that does not require camera calibration, accurate dense depth maps, occlusion handling, or inpainting. We demonstrate the performance and versatility of our method using examples from live action post-production, 3D display size adaptation, and live broadcast. An additional user study and ground truth comparison further provide evidence for the quality and practical relevance of the presented work.

...read moreread less

418 citations

Proceedings Article•DOI•

Hierarchical spatio-temporal context modeling for action recognition

[...]

Ju Sun¹, Xiao Wu², Shuicheng Yan¹, Loong-Fah Cheong¹, Tat-Seng Chua¹, Jintao Li² - Show less +2 more•Institutions (2)

National University of Singapore¹, Chinese Academy of Sciences²

20 Jun 2009

TL;DR: This paper proposes to model the spatio-temporal context information in a hierarchical way, where three levels of context are exploited in ascending order of abstraction, and proposes to employ the Multiple Kernel Learning (MKL) technique to prune the kernels towards speedup in algorithm evaluation.

...read moreread less

Abstract: The problem of recognizing actions in realistic videos is challenging yet absorbing owing to its great potentials in many practical applications. Most previous research is limited due to the use of simplified action databases under controlled environments or focus on excessively localized features without sufficiently encapsulating the spatio-temporal context. In this paper, we propose to model the spatio-temporal context information in a hierarchical way, where three levels of context are exploited in ascending order of abstraction: 1) point-level context (SIFT average descriptor), 2) intra-trajectory context (trajectory transition descriptor), and 3) inter-trajectory context (trajectory proximity descriptor). To obtain efficient and compact representations for the latter two levels, we encode the spatiotemporal context information into the transition matrix of a Markov process, and then extract its stationary distribution as the final context descriptor. Building on the multichannel nonlinear SVMs, we validate this proposed hierarchical framework on the realistic action (HOHA) and event (LSCOM) recognition databases, and achieve 27% and 66% relative performance improvements over the state-of-the-art results, respectively. We further propose to employ the Multiple Kernel Learning (MKL) technique to prune the kernels towards speedup in algorithm evaluation.

...read moreread less

418 citations

Cites methods from "Distinctive Image Features from Sca..."

...In this work, we model the spatio-temporal context information encoded in unconstrained videos based on the SIFTbased [10] trajectory, in a hierarchy of three abstraction levels (see Fig....
[...]
...For spatially salient point detection and representation, we adopt the well established SIFT [10] technique, the effectiveness of which has been validated in numerous visual tasks, such as object recognition [9] and robot navigation [15]....
[...]

Proceedings Article•DOI•

How flickr helps us make sense of the world: context and content in community-contributed media collections

[...]

Lyndon Kennedy¹, Mor Naaman¹, Shane Ahern¹, Rahul Nair¹, Tye Rattenbury¹ - Show less +1 more•Institutions (1)

Yahoo!¹

29 Sep 2007

TL;DR: A location-tag-vision-based approach to retrieving images of geography-related landmarks and features from the Flickr dataset is demonstrated, suggesting that community-contributed media and annotation can enhance and improve access to multimedia resources - and the understanding of the world.

...read moreread less

Abstract: The advent of media-sharing sites like Flickr and YouTube has drastically increased the volume of community-contributed multimedia resources available on the web These collections have a previously unimagined depth and breadth, and have generated new opportunities - and new challenges - to multimedia research How do we analyze, understand and extract patterns from these new collections? How can we use these unstructured, unrestricted community contributions of media (and annotation) to generate "knowledge" As a test case, we study Flickr - a popular photo sharing website Flickr supports photo, time and location metadata, as well as a light-weight annotation model We extract information from this dataset using two different approaches First, we employ a location-driven approach to generate aggregate knowledge in the form of "representative tags" for arbitrary areas in the world Second, we use a tag-driven approach to automatically extract place and event semantics for Flickr tags, based on each tag's metadata patterns With the patterns we extract from tags and metadata, vision algorithms can be employed with greater precision In particular, we demonstrate a location-tag-vision-based approach to retrieving images of geography-related landmarks and features from the Flickr dataset The results suggest that community-contributed media and annotation can enhance and improve our access to multimedia resources - and our understanding of the world

...read moreread less

417 citations

Proceedings Article•DOI•

Learning hierarchical representations for face verification with convolutional deep belief networks

[...]

Gary B. Huang¹, Honglak Lee², Erik Learned-Miller¹•Institutions (2)

University of Massachusetts Amherst¹, University of Michigan²

16 Jun 2012

TL;DR: It is shown that a recognition system using only representations obtained from deep learning can achieve comparable accuracy with a system using a combination of hand-crafted image descriptors, and empirically show that learning weights not only is necessary for obtaining good multilayer representations, but also provides robustness to the choice of the network architecture parameters.

...read moreread less

Abstract: Most modern face recognition systems rely on a feature representation given by a hand-crafted image descriptor, such as Local Binary Patterns (LBP), and achieve improved performance by combining several such representations. In this paper, we propose deep learning as a natural source for obtaining additional, complementary representations. To learn features in high-resolution images, we make use of convolutional deep belief networks. Moreover, to take advantage of global structure in an object class, we develop local convolutional restricted Boltzmann machines, a novel convolutional learning model that exploits the global structure by not assuming stationarity of features across the image, while maintaining scalability and robustness to small misalignments. We also present a novel application of deep learning to descriptors other than pixel intensity values, such as LBP. In addition, we compare performance of networks trained using unsupervised learning against networks with random filters, and empirically show that learning weights not only is necessary for obtaining good multilayer representations, but also provides robustness to the choice of the network architecture parameters. Finally, we show that a recognition system using only representations obtained from deep learning can achieve comparable accuracy with a system using a combination of hand-crafted image descriptors. Moreover, by combining these representations, we achieve state-of-the-art results on a real-world face verification database.

...read moreread less

417 citations

Journal Article•DOI•

Locality Preserving Matching

[...]

Jiayi Ma¹, Ji Zhao, Junjun Jiang², Huabing Zhou³, Xiaojie Guo⁴ - Show less +1 more•Institutions (4)

Wuhan University¹, Harbin Institute of Technology², Wuhan Institute of Technology³, Tianjin University⁴

01 May 2019-International Journal of Computer Vision

TL;DR: The authors' method can accomplish the mismatch removal from thousands of putative correspondences in only a few milliseconds, and achieves better or favorably competitive performance in accuracy while intensively cutting time cost by more than two orders of magnitude.

...read moreread less

Abstract: Seeking reliable correspondences between two feature sets is a fundamental and important task in computer vision. This paper attempts to remove mismatches from given putative image feature correspondences. To achieve the goal, an efficient approach, termed as locality preserving matching (LPM), is designed, the principle of which is to maintain the local neighborhood structures of those potential true matches. We formulate the problem into a mathematical model, and derive a closed-form solution with linearithmic time and linear space complexities. Our method can accomplish the mismatch removal from thousands of putative correspondences in only a few milliseconds. To demonstrate the generality of our strategy for handling image matching problems, extensive experiments on various real image pairs for general feature matching, as well as for point set registration, visual homing and near-duplicate image retrieval are conducted. Compared with other state-of-the-art alternatives, our LPM achieves better or favorably competitive performance in accuracy while intensively cutting time cost by more than two orders of magnitude.

...read moreread less

416 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...Guo and Cao (2012) proposed a triangle constraint, which can produce better putative correspondences in terms of quantity and accuracy compared with the distance ratio in Lowe (2004)....
[...]
...In this paper, the similarity function s is defined as follows: we first establish SIFT putative feature correspondences and subsequently use our LPM to remove false matches, the similarity s(Ii , I j ) is then assigned by the number of preserved matches on the two given images Ii and I j ....
[...]
...Fortunately, there are severalwell-designed feature descriptors (e.g., SIFT Lowe 2004) can efficiently establish putative correspondence between feature sets, therefore, we consider this component as an easy mission....
[...]
...In this paper, we treat the target task as a matching problem between two sets of discrete points, where each point is an image feature extracted by a feature detector and has a local image descriptor, e.g. the scale invariant feature transform (SIFT) (Lowe 2004)....
[...]
...Therefore, they are not associated with local image descriptors such as SIFT....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
…
99
100
101
102
103
104
105
…
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

"Distinctive Image Features from Sca..." refers background or methods in this paper

...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....
[...]
...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....
[...]
...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....
[...]
...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...
[...]
...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....
[...]

Book•

Multiple view geometry in computer vision

[...]

Richard Hartley¹, Andrew Zisserman²•Institutions (2)

Australian National University¹, University of Oxford²

01 Jan 2000

TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.

...read moreread less

Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

...read moreread less

15,558 citations

Multiple View Geometry in Computer Vision.

[...]

Bernhard P. Wrobel

01 Jan 2001

TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.

...read moreread less

Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

...read moreread less

14,282 citations

"Distinctive Image Features from Sca..." refers background in this paper

...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....
[...]

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations