Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•

Super-Bit Locality-Sensitive Hashing

[...]

Jianqiu Ji¹, Jianmin Li¹, Shuicheng Yan², Bo Zhang¹, Qi Tian³ - Show less +1 more•Institutions (3)

Tsinghua University¹, National University of Singapore², University of Texas at San Antonio³

03 Dec 2012

TL;DR: The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity, andSBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.

...read moreread less

Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, π/2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.

...read moreread less

112 citations

Cites background from "Distinctive Image Features from Sca..."

...To test the effect of Super-Bit depth N , we fix K = 120 for Photo Tourism SIFT and K = 3000 for MIR-Flickr respectively, and to test the effect of code length K, we fix N = 120 for Photo Tourism SIFT and N = 3000 for MIR-Flickr....
[...]
..., bag-of-words representation of documents and images, and histogram-based representations like SIFT local descriptor [20]....
[...]
...We conduct the experiment on the following datasets: 1) Photo Tourism patch dataset1 [26], Notre Dame, which contains 104,106 patches, each of which is represented by a 128D SIFT descriptor (Photo Tourism SIFT); and 2) MIR-Flickr2, which contains 25,000 images, each of which is represented by a 3125D bag-of-SIFT-feature histogram; For each dataset, we further conduct a simple preprocessing step as in [12], i.e. mean-centering each data sample, so as to obtain additional mean-centered versions of the above datasets, Photo Tourism SIFT (mean), and MIR-Flickr (mean)....
[...]
..., the normalized bag-of-words representation for documents, images, and videos, and the normalized histogram-based local features like SIFT [20]....
[...]
...In real applications, many vector representations are limited in non-negative orthant with all vector entries being non-negative, e.g., bag-of-words representation of documents and images, and histogram-based representations like SIFT local descriptor [20]....
[...]

Proceedings Article•DOI•

Semantics-aware visual localization under challenging perceptual conditions

[...]

Tayyab Naseer¹, Gabriel L. Oliveira¹, Thomas Brox¹, Wolfram Burgard¹•Institutions (1)

University of Freiburg¹

01 May 2017

TL;DR: This paper proposes a novel approach for learning a discriminative holistic image representation which exploits the image content to create a dense and salient scene description and shows that the learnt image representation outperforms off-the-shelf features from the deep networks and hand-crafted features.

...read moreread less

Abstract: Visual place recognition under difficult perceptual conditions remains a challenging problem due to changing weather conditions, illumination and seasons Long-term visual navigation approaches for robot localization should be robust to these dynamics of the environment Existing methods typically leverage feature descriptions of whole images or image regions from Deep Convolutional Neural Networks Some approaches also exploit sequential information to alleviate the problem of spatially inconsistent and non-perfect image matches In this paper, we propose a novel approach for learning a discriminative holistic image representation which exploits the image content to create a dense and salient scene description These salient descriptions are learnt over a variety of datasets under large perceptual changes Such an approach enables us to precisely segment the regions of an image which are geometrically stable over large time lags We combine features from these salient regions and an off-the-shelf holistic representation to form a more robust scene descriptor We also introduce a semantically labeled dataset which captures extreme perceptual and structural scene dynamics over the course of 3 years We evaluated our approach with extensive experiments on data collected over several kilometers in Freiburg and show that our learnt image representation outperforms off-the-shelf features from the deep networks and hand-crafted features

...read moreread less

112 citations

Cites methods from "Distinctive Image Features from Sca..."

...Traditionally, approaches relied on point descriptors such as SIFT [15] for place recognition....
[...]

Journal Article•DOI•

Label Embedding: A Frugal Baseline for Text Recognition

[...]

Jose A. Rodriguez-Serrano¹, Albert Gordo¹, Florent Perronnin¹•Institutions (1)

Xerox¹

01 Jul 2015-International Journal of Computer Vision

TL;DR: The main conclusion of the paper is that with such a frugal approach it is possible to obtain results which are competitive with standard bottom-up approaches, thus establishing label embedding as an interesting and simple to compute baseline for text recognition.

...read moreread less

Abstract: The standard approach to recognizing text in images consists in first classifying local image regions into candidate characters and then combining them with high-level word models such as conditional random fields. This paper explores a new paradigm that departs from this bottom-up view. We propose to embed word labels and word images into a common Euclidean space. Given a word image to be recognized, the text recognition problem is cast as one of retrieval: find the closest word label in this space. This common space is learned using the Structured SVM framework by enforcing matching label-image pairs to be closer than non-matching pairs. This method presents several advantages: it does not require ad-hoc or costly pre-/post-processing operations, it can build on top of any state-of-the-art image descriptor (Fisher vectors in our case), it allows for the recognition of never-seen-before words (zero-shot recognition) and the recognition process is simple and efficient, as it amounts to a nearest neighbor search. Experiments are performed on challenging datasets of license plates and scene text. The main conclusion of the paper is that with such a frugal approach it is possible to obtain results which are competitive with standard bottom-up approaches, thus establishing label embedding as an interesting and simple to compute baseline for text recognition.

...read moreread less

111 citations

Cites methods from "Distinctive Image Features from Sca..."

...In our implementation, the low-level features are SIFT descriptors (Lowe 2004) whose dimensionality is reduced from 128 down to 32 dimensions....
[...]

Journal Article•DOI•

Landmark Image Super-Resolution by Retrieving Web Images

[...]

Huanjing Yue¹, Xiaoyan Sun², Jingyu Yang¹, Feng Wu²•Institutions (2)

Tianjin University¹, Microsoft²

01 Dec 2013-IEEE Transactions on Image Processing

TL;DR: Experimental results demonstrate that the proposed new super-resolution (SR) scheme achieves significant improvement compared with four state-of-the-art schemes in terms of both subjective and objective qualities.

...read moreread less

Abstract: This paper proposes a new super-resolution (SR) scheme for landmark images by retrieving correlated web images. Using correlated web images significantly improves the exemplar-based SR. Given a low-resolution (LR) image, we extract local descriptors from its up-sampled version and bundle the descriptors according to their spatial relationship to retrieve correlated high-resolution (HR) images from the web. Though similar in content, the retrieved images are usually taken with different illumination, focal lengths, and shot perspectives, resulting in uncertainty for the HR detail approximation. To solve this problem, we first propose aligning these images to the up-sampled LR image through a global registration, which identifies the corresponding regions in these images and reduces the mismatching. Second, we propose a structure-aware matching criterion and adaptive block sizes to improve the mapping accuracy between LR and HR patches. Finally, these matched HR patches are blended together by solving an energy minimization problem to recover the desired HR image. Experimental results demonstrate that our SR scheme achieves significant improvement compared with four state-of-the-art schemes in terms of both subjective and objective qualities.

...read moreread less

111 citations

Cites background from "Distinctive Image Features from Sca..."

...Please refer to [38] for further details on SIFT extraction....
[...]
...SIFT descriptors [38] characterize image regions invariant to image scale, rotation, and 3D camera viewpoint....
[...]
...5, similar to the criterion proposed in [38]....
[...]

Posted Content•

Pixels to Voxels: Modeling Visual Representation in the Human Brain

[...]

Pulkit Agrawal, Dustin E. Stansbury, Jitendra Malik, Jack L. Gallant

18 Jul 2014-arXiv: Neurons and Cognition

TL;DR: The fit models provide a new platform for exploring the functional principles of human vision, and they show that modern methods of computer vision and machine learning provide important tools for characterizing brain function.

...read moreread less

Abstract: The human brain is adept at solving difficult high-level visual processing problems such as image interpretation and object recognition in natural scenes. Over the past few years neuroscientists have made remarkable progress in understanding how the human brain represents categories of objects and actions in natural scenes. However, all current models of high-level human vision operate on hand annotated images in which the objects and actions have been assigned semantic tags by a human operator. No current models can account for high-level visual function directly in terms of low-level visual input (i.e., pixels). To overcome this fundamental limitation we sought to develop a new class of models that can predict human brain activity directly from low-level visual input (i.e., pixels). We explored two classes of models drawn from computer vision and machine learning. The first class of models was based on Fisher Vectors (FV) and the second was based on Convolutional Neural Networks (ConvNets). We find that both classes of models accurately predict brain activity in high-level visual areas, directly from pixels and without the need for any semantic tags or hand annotation of images. This is the first time that such a mapping has been obtained. The fit models provide a new platform for exploring the functional principles of human vision, and they show that modern methods of computer vision and machine learning provide important tools for characterizing brain function.

...read moreread less

111 citations

Cites background from "Distinctive Image Features from Sca..."

...SIFT features capture the distribution of edge orientation energy in each patch [20]....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
…
119
120
121
122
123
124
125
…
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations