Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

What Do We Understand About Convolutional Networks

[...]

Isma Hadji, Richard P. Wildes

23 Mar 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This document will review the most prominent proposals using multilayer convolutional architectures and shed light on the role of each layer of processing involved in a ConvNet architecture, distill what the authors currently understand about ConvNets and highlight critical open problems.

...read moreread less

Abstract: This document will review the most prominent proposals using multilayer convolutional architectures. Importantly, the various components of a typical convolutional network will be discussed through a review of different approaches that base their design decisions on biological findings and/or sound theoretical bases. In addition, the different attempts at understanding ConvNets via visualizations and empirical studies will be reviewed. The ultimate goal is to shed light on the role of each layer of processing involved in a ConvNet architecture, distill what we currently understand about ConvNets and highlight critical open problems.

...read moreread less

82 citations

Proceedings Article•DOI•

Improving image similarity with vectors of locally aggregated tensors

[...]

David Picard¹, Philippe-Henri Gosselin¹•Institutions (1)

Cergy-Pontoise University¹

29 Dec 2011

TL;DR: This paper proposes a novel approach to compute vector representations extending state of the art methods in the field and can be viewed as a linearization of efficient well known kernel methods.

...read moreread less

Abstract: Within the Content Based Image Retrieval (CBIR) framework, three main points can be highlighted: visual descriptors extraction, image signatures and their associated similarity measures, and machine learning based relevance functions. While the first and the last points have vastly improved in recent years, this paper addresses the second point. We propose a novel approach to compute vector representations extending state of the art methods in the field. Furthermore, our method can be viewed as a linearization of efficient well known kernel methods. The evaluation shows that our representation significantly improve state of the art results on the difficult VOC2007 database by a fair margin.

...read moreread less

82 citations

Cites background from "Distinctive Image Features from Sca..."

...All recent approaches in image retrieval and classification use a set of local descriptors to encode the information contained in the image, such as SIFT [1]....
[...]
...However clustering high dimensional feature spaces (typically 128 dimensions for SIFT) into that many clusters is not an easy task....
[...]
...We extracted SIFT features on a dense grid with 3 different scales in order to obtain bags of about 15000 SIFT for each image (about 150 millions of descriptors for the whole dataset)....
[...]
...The first step is the introduction of powerful descriptors with high discriminative capacities like SIFT [1]....
[...]
...IMAGE REPRESENTATION AND SIMILARITIES All recent approaches in image retrieval and classification use a set of local descriptors to encode the information contained in the image, such as SIFT [1]....
[...]

Proceedings Article•DOI•

Bayes Merging of Multiple Vocabularies for Scalable Image Retrieval

[...]

Liang Zheng¹, Shengjin Wang¹, Wengang Zhou², Qi Tian³•Institutions (3)

Tsinghua University¹, University of Science and Technology of China², University of Texas at San Antonio³

23 Jun 2014

TL;DR: In this article, a Bayes merging approach is proposed to down-weight the indexed features in the intersection set to avoid overlapping among different vocabularies and preserve the benefit of high recall.

...read moreread less

Abstract: In the Bag-of-Words (BoW) model, the vocabulary is of key importance. Typically, multiple vocabularies are generated to correct quantization artifacts and improve recall. However, this routine is corrupted by vocabulary correlation, i.e., overlapping among different vocabularies. Vocabulary correlation leads to an over-counting of the indexed features in the overlapped area, or the intersection set, thus compromising the retrieval accuracy. In order to address the correlation problem while preserve the benefit of high recall, this paper proposes a Bayes merging approach to down-weight the indexed features in the intersection set. Through explicitly modeling the correlation problem in a probabilistic view, a joint similarity on both image- and feature-level is estimated for the indexed features in the intersection set. We evaluate our method on three benchmark datasets. Albeit simple, Bayes merging can be well applied in various merging tasks, and consistently improves the baselines on multi-vocabulary merging. Moreover, Bayes merging is efficient in terms of both time and memory cost, and yields competitive performance with the state-of-the-art methods.

...read moreread less

82 citations

Proceedings Article•DOI•

A visual odometry framework robust to motion blur

[...]

Alberto Pretto¹, Emanuele Menegatti¹, Maren Bennewitz², Wolfram Burgard², Enrico Pagello¹ - Show less +1 more•Institutions (2)

University of Padua¹, University of Freiburg²

12 May 2009

TL;DR: A new feature detection and tracking scheme that is robust even to non-uniform motion blur is proposed and a framework for visual odometry based on features extracted out of and matched in monocular image sequences is developed.

...read moreread less

Abstract: Motion blur is a severe problem in images grabbed by legged robots and, in particular, by small humanoid robots. Standard feature extraction and tracking approaches typically fail when applied to sequences of images strongly affected by motion blur. In this paper, we propose a new feature detection and tracking scheme that is robust even to non-uniform motion blur. Furthermore, we developed a framework for visual odometry based on features extracted out of and matched in monocular image sequences. To reliably extract and track the features, we estimate the point spread function (PSF) of the motion blur individually for image patches obtained via a clustering technique and only consider highly distinctive features during matching. We present experiments performed on standard datasets corrupted with motion blur and on images taken by a camera mounted on walking small humanoid robots to show the effectiveness of our approach. The experiments demonstrate that our technique is able to reliably extract and match features and that it is furthermore able to generate a correct visual odometry, even in presence of strong motion blur effects and without the aid of any inertial measurement sensor.

...read moreread less

82 citations

Cites background or methods or result from "Distinctive Image Features from Sca..."

...The best-known and widely used feature detector and descriptor scheme was introduced by Lowe [13] and is called SIFT (Scale-invariant feature transform)....
[...]
...As in [13], the scalespace is divided in octaves (i....
[...]
...Detection of the interest points are then performed using the description method from SIFT [13]: we implement SIFT descriptor, tuning the parameters of the algorithm to improve reliability....
[...]
...According to our experience, the determinant of the Hessian seems to be more stable with motion blurred images than Difference-of-Gaussian (DoG) used in [13]....
[...]
...Indeed, the classical feature detectors and descriptors [13], [1] that proved to work well for wheeled robots, do not perform reliably in presence of motion blur....
[...]

Posted Content•

Person Search by Multi-Scale Matching

[...]

Xu Lan¹, Xiatian Zhu, Shaogang Gong¹•Institutions (1)

Queen Mary University of London¹

23 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of learning more discriminative identity feature representations in a unified end-to-end model that favourably eliminates the need for constructing a computationally expensive image pyramid and a complex multi-branch network architecture.

...read moreread less

Abstract: We consider the problem of person search in unconstrained scene images. Existing methods usually focus on improving the person detection accuracy to mitigate negative effects imposed by misalignment, mis-detections, and false alarms resulted from noisy people auto-detection. In contrast to previous studies, we show that sufficiently reliable person instance cropping is achievable by slightly improved state-of-the-art deep learning object detectors (e.g. Faster-RCNN), and the under-studied multi-scale matching problem in person search is a more severe barrier. In this work, we address this multi-scale person search challenge by proposing a Cross-Level Semantic Alignment (CLSA) deep learning approach capable of learning more discriminative identity feature representations in a unified end-to-end model. This is realised by exploiting the in-network feature pyramid structure of a deep neural network enhanced by a novel cross pyramid-level semantic alignment loss function. This favourably eliminates the need for constructing a computationally expensive image pyramid and a complex multi-branch network architecture. Extensive experiments show the modelling advantages and performance superiority of CLSA over the state-of-the-art person search and multi-scale matching methods on two large person search benchmarking datasets: CUHK-SYSU and PRW.

...read moreread less

82 citations

Additional excerpts

...To this end, we explore the seminal image/feature pyramid concept [1,21,31,8]....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
…
170
171
172
173
174
175
176
…
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations