Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Keypoint-Based Keyframe Selection

[...]

Genliang Guan¹, Zhiyong Wang¹, Shiyang Lu¹, Jeremiah D. Deng², David Dagan Feng¹ - Show less +1 more•Institutions (2)

University of Sydney¹, University of Otago²

01 Apr 2013-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This work proposes a keypoint-based framework to address the keyframe selection problem so that local features can be employed in selecting keyframes, and introduces two criteria, coverage and redundancy, based on keypoint matching in the selection process.

...read moreread less

Abstract: Keyframe selection has been crucial for effective and efficient video content analysis. While most of the existing approaches represent individual frames with global features, we, for the first time, propose a keypoint-based framework to address the keyframe selection problem so that local features can be employed in selecting keyframes. In general, the selected keyframes should both be representative of video content and containing minimum redundancy. Therefore, we introduce two criteria, coverage and redundancy, based on keypoint matching in the selection process. Comprehensive experiments demonstrate that our approach outperforms the state of the art.

...read moreread less

134 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...Lowe proposed to improve matching robustness by imposing the ratio test criterion [6] (i....
[...]
...Lowe’s SIFT descriptor [6] is utilized for keypoint extraction and representation, though many other local features [8]...
[...]
...Recently, local features, such as the scale-invariant feature transform (SIFT) descriptor [6], have played a significant role in many application domains of visual content analysis, such as object recognition and image classification, due to their distinctive representation capacity....
[...]

Proceedings Article•DOI•

Detecting emotional stress from facial expressions for driving safety

[...]

Hua Gao¹, Anil Yuce¹, Jean-Philippe Thiran¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Oct 2014

TL;DR: A real-time non-intrusive monitoring system is developed, which detects the emotional states of the driver by analyzing facial expressions, and which operates very well on simulated data even with generic models.

...read moreread less

Abstract: Monitoring the attentive and emotional status of the driver is critical for the safety and comfort of driving. In this work a real-time non-intrusive monitoring system is developed, which detects the emotional states of the driver by analyzing facial expressions. The system considers two negative basic emotions, anger and disgust, as stress related emotions. We detect an individual emotion in each video frame and the decision on the stress level is made on sequence level. Experimental results show that the developed system operates very well on simulated data even with generic models. An additional pose normalization step reduces the impact of pose mismatch due to camera setup and pose variation, and hence improves the detection accuracy further.

...read moreread less

134 citations

Journal Article•DOI•

A survey of document image word spotting techniques

[...]

Angelos P. Giotis, Giorgos Sfikas, Basilis Gatos, Christophoros Nikou¹•Institutions (1)

University of Ioannina¹

01 Aug 2017-Pattern Recognition

TL;DR: The nature of texts and inherent challenges addressed by word spotting methods are thoroughly examined and the use of retrieval enhancement techniques based on relevance feedback which improve the retrieved results are investigated.

...read moreread less

134 citations

Journal Article•DOI•

Accuracy and Precision of Habitat Structural Complexity Metrics Derived from Underwater Photogrammetry

[...]

Will F. Figueira, Renata Ferrari, Elyse Weatherby, Augustine G. Porter, Steven Hawes, Maria Byrne - Show less +2 more

15 Dec 2015-Remote Sensing

TL;DR: This study highlights the utility of several off-the-shelf photogrammetric tools for the measurement of structural complexity across a range of scales relevant to ecologist and managers and provides important information on the accuracy and precision of these systems which should allow for their targeted use by non-experts in computer vision within these contexts.

...read moreread less

Abstract: In tropical reef ecosystems corals are the key habitat builders providing most ecosystem structure, which influences coral reef biodiversity and resilience. Remote sensing applications have progressed significantly and photogrammetry together with application of structure from motion software is emerging as a leading technique to create three-dimensional (3D) models of corals and reefs from which biophysical properties of structural complexity can be quantified. This enables the addressing of a range of important marine research questions, such as what the role of habitat complexity is in driving key ecological processes (i.e., foraging). Yet, it is essential to assess the accuracy and precision of photogrammetric measurements to support their application in mapping, monitoring and quantifying coral reef form and structure. This study evaluated the precision (by repeated modeling) and accuracy (by comparison with laser reference models) of geometry and structural complexity metrics derived from photogrammetric 3D models of marine benthic habitat at two ecologically relevant spatial extents; individual coral colonies of a range of common morphologies and patches of reef area of 100s of square metres. Surface rugosity measurements were generally precise across all morphologies and spatial extents with average differences in the geometry of replicate models of 1–6 mm for coral colonies and 25 mm for the reef area. Precision decreased with complexity of the coral morphology, with metrics for small massive corals being the most precise (1% coefficient of variation (CV) in surface rugosity) and metrics for bottlebrush corals being the least precise (10% CV in surface rugosity). There was no indication however that precision was related to complexity for the patch-scale modelling. The 3D geometry of coral models differed by only 1–3 mm from laser reference models. However, high spatial variation in these differences around the model led to a consistent underestimation of surface rugosity values for all morphs of between 8% and 37%. This study highlights the utility of several off-the-shelf photogrammetry tools for the measurement of structural complexity across a range of scales relevant to ecologist and managers. It also provides important information on the accuracy and precision of these systems which should allow for their targeted use by non-experts in computer vision within these contexts.

...read moreread less

133 citations

Cites methods from "Distinctive Image Features from Sca..."

...Feature detection in VisualSFM used a scale invariant feature transform (SIFT) algorithm [34] and generated a network of camera poses from which the sparse 3D point cloud of the model surface was generated....
[...]

Proceedings Article•DOI•

Hierarchical Part Matching for Fine-Grained Visual Categorization

[...]

Lingxi Xie¹, Qi Tian², Richang Hong³, Shuicheng Yan⁴, Bo Zhang¹ - Show less +1 more•Institutions (4)

Tsinghua University¹, University of Texas at San Antonio², Hefei University of Technology³, National University of Singapore⁴

01 Dec 2013

TL;DR: A powerful flowchart named Hierarchical Part Matching (HPM) is proposed to cope with fine-grained classification tasks and achieves the state-of-the-art classification accuracy in the Caltech-UCSD-Birds-200-2011 dataset by making full use of the ground-truth part annotations.

...read moreread less

Abstract: As a special topic in computer vision, fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with fine-grained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learning (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-of-the-art classification accuracy in the Caltech-UCSD-Birds-200-2011 dataset by making full use of the ground-truth part annotations.

...read moreread less

133 citations

Cites methods from "Distinctive Image Features from Sca..."

...We use the VLFeat [18] library to extract OppSIFT descriptors [17]....
[...]
...We extract SIFT descriptors [13] on the image and obtain a set of local descriptors D:...
[...]
...The description vector dm is a D-dimensional vector, where D = 3× 128 = 384 using OpponentSIFT (OppSIFT) [17] on RGB-images....
[...]
...SIFT[13] GCut[15] LLC[20] UCM-SG GPP[21] HSL Max-Pooling...
[...]
...Starting from raw image data, we first extract SIFT [13] descriptors as local features....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
…
97
98
99
100
101
102
103
…
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations