Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A Video-Based System for Vehicle Speed Measurement in Urban Roadways

[...]

Diogo C. Luvizon, Bogdan Tomoyuki Nassu, Rodrigo Minetto

01 Jun 2017-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A nonintrusive video-based system for vehicle speed measurement in urban roadways using an optimized motion detector and a novel text detector to efficiently locate vehicle license plates in image regions containing motion and outperforms two other published state-of-the-art text detectors, as well as a well-known license plate detector.

...read moreread less

Abstract: In this paper, we propose a nonintrusive video-based system for vehicle speed measurement in urban roadways. Our system uses an optimized motion detector and a novel text detector to efficiently locate vehicle license plates in image regions containing motion. Distinctive features are then selected on the license plate regions, tracked across multiple frames, and rectified for perspective distortion. Vehicle speed is measured by comparing the trajectories of the tracked features to known real-world measures. The proposed system was tested on a data set containing approximately 5 h of videos recorded in different weather conditions by a single low-cost camera, with associated ground truth speeds obtained by an inductive loop detector. Our data set is freely available for research purposes. The measured speeds have an average error of −0.5 km/h, staying inside the [−3, +2] km/h limit determined by regulatory authorities in several countries in over 96.0% of the cases. To the authors' knowledge, there are no other video-based systems able to achieve results comparable to those produced by an inductive loop detector. We also show that our license plate detector outperforms two other published state-of-the-art text detectors, as well as a well-known license plate detector, achieving a precision of 0.93 and a recall of 0.87.

...read moreread less

106 citations

Cites methods from "Distinctive Image Features from Sca..."

...Using the standard parameters proposed by Lowe [5], SIFT features are extracted from an expanded window around the license plate region from image I, and matched to other features extracted from image J, using the nearest neighbor distance ratio matching strategy described by Mikolajczyk [34]....
[...]
...We have used the SIFT (ScaleInvariant Feature Transform) features proposed by Lowe [5]....
[...]
...To cope with large displacements, from vehicles moving at high speeds, an initial motion estimation is performed by matching features extracted by the ScaleInvariant Feature Transform [5]....
[...]

Journal Article•DOI•

Scalable Deep Hashing for Large-Scale Social Image Retrieval

[...]

Hui Cui¹, Lei Zhu¹, Jingjing Li², Yang Yang², Liqiang Nie³ - Show less +1 more•Institutions (3)

Shandong Normal University¹, University of Electronic Science and Technology of China², Shandong University³

01 Jan 2020-IEEE Transactions on Image Processing

TL;DR: This paper proposes a unified scalable deep hash learning framework which explores the weak but free supervision of discriminative user tags that are commonly accompanied with social images and proposes a discrete hash optimization method based on Augmented Lagrangian Multiplier to directly solve the hash codes and avoid the binary quantization information loss.

...read moreread less

Abstract: Recent years have witnessed the wide application of hashing for large-scale image retrieval, because of its high computation efficiency and low storage cost. Particularly, benefiting from current advances in deep learning, supervised deep hashing methods have greatly boosted the retrieval performance, under the strong supervision of large amounts of manually annotated semantic labels. However, their performance is highly dependent upon the supervised labels, which significantly limits the scalability. In contrast, unsupervised deep hashing without label dependence enjoys the advantages of well scalability. Nevertheless, due to the relaxed hash optimization, and more importantly, the lack of semantic guidance, existing methods suffer from limited retrieval performance. In this paper, we propose a SCAlable Deep Hashing (SCADH) to learn enhanced hash codes for social image retrieval. We formulate a unified scalable deep hash learning framework which explores the weak but free supervision of discriminative user tags that are commonly accompanied with social images. It jointly learns image representations and hash functions with deep neural networks, and simultaneously enhances the discriminative capability of image hash codes with the refined semantics from the accompanied social tags. Further, instead of simple relaxed hash optimization, we propose a discrete hash optimization method based on Augmented Lagrangian Multiplier to directly solve the hash codes and avoid the binary quantization information loss. Experiments on two standard social image datasets demonstrate the superiority of the proposed approach compared with state-of-the-art shallow and deep hashing techniques.

...read moreread less

106 citations

Cites methods from "Distinctive Image Features from Sca..."

...All of the shallow methods are tested with hand-crafted features (SIFT) [49] and deep features (VGGNet)....
[...]
...For shallow methods, we use both hand-crafted (SIFT) and deep features (VGGNet) as inputs for comparisons....
[...]

Journal Article•DOI•

Coherent Semantic-Visual Indexing for Large-Scale Image Retrieval in the Cloud

[...]

Richang Hong¹, Lei Li², Junjie Cai³, Dapeng Tao, Meng Wang¹, Qi Tian³ - Show less +2 more•Institutions (3)

Hefei University of Technology¹, South China University of Technology², University of Texas at San Antonio³

01 Sep 2017-IEEE Transactions on Image Processing

TL;DR: This paper constructs a novel joint semantic-visual space by leveraging visual descriptors and semantic attributes, which narrows the semantic gap by combining both attributes and indexing into a single framework and designs an online cloud service to provide a more efficient online multimedia service.

...read moreread less

Abstract: The rapidly increasing number of images on the internet has further increased the need for efficient indexing for digital image searching of large databases The design of a cloud service that provides high efficiency but compact image indexing remains challenging, partly due to the well-known semantic gap between user queries and the rich semantics of large-scale data sets In this paper, we construct a novel joint semantic-visual space by leveraging visual descriptors and semantic attributes, which narrows the semantic gap by combining both attributes and indexing into a single framework Such a joint space embraces the flexibility of coherent semantic-visual indexing, which employs binary codes to boost retrieval speed while maintaining accuracy To solve the proposed model, we make the following contributions First, we propose an interactive optimization method to find the joint semantic and visual descriptor space Second, we prove convergence of our optimization algorithm, which guarantees a good solution after a certain number of iterations Third, we integrate the semantic-visual joint space system with spectral hashing, which finds an efficient solution to search up to billion-scale data sets Finally, we design an online cloud service to provide a more efficient online multimedia service Experiments on two standard retrieval datasets ( ie, Holidays1M, Oxford5K ) show that the proposed method is promising compared with the current state-of-the-art and that the cloud system significantly improves performance

...read moreread less

106 citations

Cites background from "Distinctive Image Features from Sca..."

...2) The visual and semantic spaces are completely different: image descriptors are generally in a continuous space (e.g., SIFT [7] is typically represented as a 128-...
[...]
...SIFT [7] is typically represented as a 128-D real-valued vector), whereas semantic attributes are denoted in a discrete space....
[...]
...In this method, the tree is obtained by hierarchical K-mean clustering of 1 million SIFT descriptors extracted from images randomly crawled from the internet....
[...]
...Although there are similarities between image and text information retrieval, with low-level features quantized into visual words and inverted file indexing applied to index images via visual words [7], two major challenges still exist for image retrieval: algorithm and system design....
[...]

Journal Article•DOI•

CODE: Coherence Based Decision Boundaries for Feature Correspondence

[...]

Wen-Yan Lin¹, Fan Wang¹, Ming-Ming Cheng², Sai-Kit Yeung³, Philip H. S. Torr⁴, Minh N. Do⁵, Jiangbo Lu¹ - Show less +3 more•Institutions (5)

Agency for Science, Technology and Research¹, Nankai University², Singapore University of Technology and Design³, University of Oxford⁴, University of Illinois at Urbana–Champaign⁵

01 Jan 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A non-linear regression technique is proposed that can discover a coherence based separability constraint from highly noisy matches and embed it into a correspondence likelihood model and integrate the model into a full feature correspondence system which reliably generates large numbers of good quality correspondences over wide baselines.

...read moreread less

Abstract: A key challenge in feature correspondence is the difficulty in differentiating true and false matches at a local descriptor level. This forces adoption of strict similarity thresholds that discard many true matches. However, if analyzed at a global level, false matches are usually randomly scattered while true matches tend to be coherent (clustered around a few dominant motions), thus creating a coherence based separability constraint. This paper proposes a non-linear regression technique that can discover such a coherence based separability constraint from highly noisy matches and embed it into a correspondence likelihood model. Once computed, the model can filter the entire set of nearest neighbor matches (which typically contains over 90 percent false matches) for true matches. We integrate our technique into a full feature correspondence system which reliably generates large numbers of good quality correspondences over wide baselines where previous techniques provide few or no matches.

...read moreread less

105 citations

Cites background from "Distinctive Image Features from Sca..."

...To date, feature matching [4], [5], [6] is the correspondence solution of choice for many computer vision systems....
[...]

Journal Article•DOI•

Classifying materials in the real world

[...]

Barbara Caputo¹, Eric Hayman, Mario Fritz², Jan-Olof Eklundh³•Institutions (3)

Idiap Research Institute¹, Technische Universität Darmstadt², Royal Institute of Technology³

01 Jan 2010-Image and Vision Computing

TL;DR: The results demonstrate that material classification is far from being solved in scenarios of practical interest, and that generalisation capabilities: does training on the CUReT database enable recognition of another piece of sandpaper?

...read moreread less

105 citations

Cites background from "Distinctive Image Features from Sca..."

...One contribution of that paper was to demonstrate the suitability of local detectors (scale, rotation, and affine invariant Harris and Laplace) and descriptors (SIFT [27], spin images [28] and RIFT [29]) for these tasks....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
…
128
129
130
131
132
133
134
…
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations