Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Object recognition with hierarchical kernel descriptors

[...]

Liefeng Bo¹, Kevin Lai¹, Xiaofeng Ren², Dieter Fox¹•Institutions (2)

University of Washington¹, Intel²

20 Jun 2011

TL;DR: H hierarchicalkernel descriptors are proposed that apply kernel descriptors recursively to form image-level features and thus provide a conceptually simple and consistent way to generate image- level features from pixel attributes.

...read moreread less

Abstract: Kernel descriptors [1] provide a unified way to generate rich visual feature sets by turning pixel attributes into patch-level features, and yield impressive results on many object recognition tasks. However, best results with kernel descriptors are achieved using efficient match kernels in conjunction with nonlinear SVMs, which makes it impractical for large-scale problems. In this paper, we propose hierarchical kernel descriptors that apply kernel descriptors recursively to form image-level features and thus provide a conceptually simple and consistent way to generate image-level features from pixel attributes. More importantly, hierarchical kernel descriptors allow linear SVMs to yield state-of-the-art accuracy while being scalable to large datasets. They can also be naturally extended to extract features over depth images. We evaluate hierarchical kernel descriptors both on the CIFAR10 dataset and the new RGB-D Object Dataset consisting of segmented RGB and depth images of 300 everyday objects.

...read moreread less

261 citations

Cites background from "Distinctive Image Features from Sca..."

...The most popular and successful local descriptors are orientation histograms including SIFT [19] and HOG [5], which are robust to minor transformations of images....
[...]
...Kernel descriptors include SIFT and HOG as special cases, and provide a principled way generate rich patch-level features from various pixel attributes....
[...]
...Experiments on both CIFAR10 and the RGB-D Object Dataset (available at http://www.cs.washington.edu/rgbd-dataset) show that hierarchical kernel descriptors outperform kernel descriptors and many state-of-the-art algorithms including deep belief nets, convolutional neural networks, and local coordinate coding with carefully tuned SIFT features....
[...]
...The combination of three hierarchical kernel descriptors has an accuracy of 80.0%, higher than all other competing techniques; its accuracy is 14.4 percent higher than SIFT, 9.0 percent higher than mcRBM combined with DBNs, and 5.5 percent higher than the improved LCC. Hierarchical kernel descriptors slightly outperform the very recent work: the convolutional RBM and the triangle Kmeans with 4000 centers [4]....
[...]
...Hua et al. [10] learned a linear transformation for SIFT using linear discriminant analysis and showed better results with lower dimensionality than SIFT on local feature matching problems....
[...]

Posted Content•

Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images.

[...]

Ebrahim Karami, Siva Prasad, Mohamed Shehata

07 Oct 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper compares the performance of three different image matching techniques, i.e., SIFT, SURF, and ORB, against different kinds of transformations and deformations such as scaling, rotation, noise, fish eye distortion, and shearing and shows that which algorithm is the best more robust against each kind of distortion.

...read moreread less

Abstract: Fast and robust image matching is a very important task with various applications in computer vision and robotics. In this paper, we compare the performance of three different image matching techniques, i.e., SIFT, SURF, and ORB, against different kinds of transformations and deformations such as scaling, rotation, noise, fish eye distortion, and shearing. For this purpose, we manually apply different types of transformations on original images and compute the matching evaluation parameters such as the number of key points in images, the matching rate, and the execution time required for each algorithm and we will show that which algorithm is the best more robust against each kind of distortion. Index Terms-Image matching, scale invariant feature transform (SIFT), speed up robust feature (SURF), robust independent elementary features (BRIEF), oriented FAST, rotated BRIEF (ORB).

...read moreread less

261 citations

Cites background from "Distinctive Image Features from Sca..."

...Thirdly, a key point orientation assignment based on local image gradient and lastly a descriptor generator to compute the local image descriptor for each key point based on image gradient magnitude and orientation [3]....
[...]
...Although SIFT has proven to be very efficient in object recognition applications, it requires a large computational complexity which is a major drawback especially for real-time applications [3, 4]....
[...]
...Scale Invariant Feature Transform (SIFT) is a feature detector developed by Lowe in 2004 [3]....
[...]

Proceedings Article•

Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations

[...]

Zhenyao Zhu¹, Ping Luo¹, Xiaogang Wang¹, Xiaoou Tang¹•Institutions (1)

The Chinese University of Hong Kong¹

08 Dec 2014

TL;DR: This paper proposes a novel deep neural net, named multi-view perceptron (MVP), which can untangle the identity and view features, and in the meanwhile infer a full spectrum of multi- view images, given a single 2D face image.

...read moreread less

Abstract: Various factors, such as identity, view, and illumination, are coupled in face images. Disentangling the identity and view representations is a major challenge in face recognition. Existing face recognition systems either use handcrafted features or learn features discriminatively to improve recognition accuracy. This is different from the behavior of primate brain. Recent studies [5, 19] discovered that primate brain has a face-processing network, where view and identity are processed by different neurons. Taking into account this instinct, this paper proposes a novel deep neural net, named multi-view perceptron (MVP), which can untangle the identity and view features, and in the meanwhile infer a full spectrum of multi-view images, given a single 2D face image. The identity features of MVP achieve superior performance on the MultiPIE dataset. MVP is also capable to interpolate and predict images under viewpoints that are unobserved in the training data.

...read moreread less

258 citations

Journal Article•DOI•

KinectFaceDB: A Kinect Database for Face Recognition

[...]

Rui Min¹, Neslihan Kose², Jean-Luc Dugelay²•Institutions (2)

University of North Carolina at Chapel Hill¹, Institut Eurécom²

28 Jul 2014

TL;DR: This paper presents the first publicly available face database based on the Kinect sensor, and conducts benchmark evaluations on the proposed database using standard face recognition methods, and demonstrates the gain in performance when integrating the depth data with the RGB data via score-level fusion.

...read moreread less

Abstract: The recent success of emerging RGB-D cameras such as the Kinect sensor depicts a broad prospect of 3-D data-based computer applications. However, due to the lack of a standard testing database, it is difficult to evaluate how the face recognition technology can benefit from this up-to-date imaging sensor. In order to establish the connection between the Kinect and face recognition research, in this paper, we present the first publicly available face database (i.e., KinectFaceDB 1 ) based on the Kinect sensor. The database consists of different data modalities (well-aligned and processed 2-D, 2.5-D, 3-D, and video-based face data) and multiple facial variations. We conducted benchmark evaluations on the proposed database using standard face recognition methods, and demonstrated the gain in performance when integrating the depth data with the RGB data via score-level fusion. We also compared the 3-D images of Kinect (from the KinectFaceDB) with the traditional high-quality 3-D scans (from the FRGC database) in the context of face biometrics, which reveals the imperative needs of the proposed database for face recognition research. 1 Online at http://rgb-d.eurecom.fr

...read moreread less

257 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...Tables VIII and IX illustrate the fusion results from both the RGB and depth using PCA, LBP, and LGBP (SIFT is not used because it cannot capture the correct information from depth images as shown in Section IV-D2)....
[...]
...Because the depth map is highly smooth, the SIFT-based method is inappropriate for 2.5-...
[...]
...PCA [19] (i.e., the Eigenface method), LBP [21], SIFT [81], and LGBP [34]-based methods are selected as the baseline techniques for the 2-...
[...]
...SIFT-based method extracts the key points from all training and testing images, where the similarity measure is achieved by key points matching....
[...]
...In the contrary, since LBP, SIFT, and LGBP are local-based methods, they are more robust to such local distortions....
[...]

Journal Article•DOI•

Uniform Robust Scale-Invariant Feature Matching for Optical Remote Sensing Images

[...]

Amin Sedaghat¹, Mehdi Mokhtarzade¹, Hamid Ebadi¹•Institutions (1)

K.N.Toosi University of Technology¹

31 May 2011-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Comprehensive evaluation of efficiency, distribution quality, and positional accuracy of the extracted point pairs proves the capabilities of the proposed matching algorithm on a variety of optical remote sensing images.

...read moreread less

Abstract: Extracting well-distributed, reliable, and precisely aligned point pairs for accurate image registration is a difficult task, particularly for multisource remote sensing images that have significant illumination, rotation, and scene differences. The scale-invariant feature transform (SIFT) approach, as a well-known feature-based image matching algorithm, has been successfully applied in a number of automatic registration of remote sensing images. Regardless of its distinctiveness and robustness, the SIFT algorithm suffers from some problems in the quality, quantity, and distribution of extracted features particularly in multisource remote sensing imageries. In this paper, an improved SIFT algorithm is introduced that is fully automated and applicable to various kinds of optical remote sensing images, even with those that are five times the difference in scale. The main key of the proposed approach is a selection strategy of SIFT features in the full distribution of location and scale where the feature qualities are quarantined based on the stability and distinctiveness constraints. Then, the extracted features are introduced to an initial cross-matching process followed by a consistency check in the projective transformation model. Comprehensive evaluation of efficiency, distribution quality, and positional accuracy of the extracted point pairs proves the capabilities of the proposed matching algorithm on a variety of optical remote sensing images.

...read moreread less

255 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...In our research, regarding [21], the feature contrast (i....
[...]
...Therefore, extrema in which the ratio of H eigenvalues is above a threshold, for example, Tr = 10 (proposed in [21]), are considered as points corresponding to edges and discarded (for more explanations, see [21])....
[...]
...03 (a threshold proposed by Lowe [21])....
[...]
...SIFT features are scale invariant and accurate, and they are robust against illumination differences, changes in 3-D viewpoint, and image noise [21]....
[...]
...The famous SIFT algorithm proposed by Lowe [21] consists of three main modules: feature extraction, feature description,...
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
…
35
36
37
38
39
40
41
…
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations