Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Graphical Representation for Heterogeneous Face Recognition

[...]

Chunlei Peng¹, Xinbo Gao¹, Nannan Wang¹, Jie Li¹•Institutions (1)

Xidian University¹

01 Feb 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, a graphical representation based HFR method (G-HFR) is proposed to represent heterogeneous image patches separately, which takes the spatial compatibility between neighboring image patches into consideration.

...read moreread less

Abstract: Heterogeneous face recognition (HFR) refers to matching face images acquired from different sources (i.e., different sensors or different wavelengths) for identification. HFR plays an important role in both biometrics research and industry. In spite of promising progresses achieved in recent years, HFR is still a challenging problem due to the difficulty to represent two heterogeneous images in a homogeneous manner. Existing HFR methods either represent an image ignoring the spatial information, or rely on a transformation procedure which complicates the recognition task. Considering these problems, we propose a novel graphical representation based HFR method (G-HFR) in this paper. Markov networks are employed to represent heterogeneous image patches separately, which takes the spatial compatibility between neighboring image patches into consideration. A coupled representation similarity metric (CRSM) is designed to measure the similarity between obtained graphical representations. Extensive experiments conducted on multiple HFR scenarios (viewed sketch, forensic sketch, near infrared image, and thermal infrared image) show that the proposed method outperforms state-of-the-art methods.

...read moreread less

122 citations

Journal Article•DOI•

Keypoints and Local Descriptors of Scalar Functions on 2D Manifolds

[...]

Andrei Zaharescu, Edmond Boyer¹, Radu Horaud¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Oct 2012-International Journal of Computer Vision

TL;DR: This paper proposes a local feature detector (MeshDOG) and region descriptor (MeshHOG) for polygonal meshes, and provides a methodological framework for analyzing real-valued functions defined over a 2D manifold, embedded in the 3D Euclidean space.

...read moreread less

Abstract: This paper addresses the problem of describing surfaces using local features and descriptors. While methods for the detection of interest points in images and their description based on local image features are very well understood, their extension to discrete manifolds has not been well investigated. We provide a methodological framework for analyzing real-valued functions defined over a 2D manifold, embedded in the 3D Euclidean space, e.g., photometric information, local curvature, etc. Our work is motivated by recent advancements in multiple-camera reconstruction and image-based rendering of 3D objects: there is a growing need for describing object surfaces, matching two surfaces, or tracking them over time. Considering polygonal meshes, we propose a new methodological framework for the scale-space representations of scalar functions defined over such meshes. We propose a local feature detector (MeshDOG) and region descriptor (MeshHOG). Unlike the standard image features, the proposed surface features capture both the local geometry of the underlying manifold and the scale-space differential properties of the real-valued function itself. We provide a thorough experimental evaluation. The repeatability of the feature detector and the robustness of feature descriptor are tested, by applying a large number of deformations to the manifold or to the scalar function.

...read moreread less

122 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...MeshDOG is a generalization of the Difference of Gaussian (DOG) operator (Marr and Hildreth, 1980; Lowe, 2004) and it is used to build a discrete Laplacian operator on a mesh....
[...]
...Observe that the MeshHOG descriptor generates very few false positives in comparison with the SIFT equivalent, clearly demonstrating the advantages of the proposed approach....
[...]
...However, when the threshold response is known a priori for a particular scalar function, such as it is the case in Lowe (2004) with image intensity, it can be easily used instead....
[...]
...2D feature descriptors are generally designed to be robust to changes in illumination and invariant to image transformations such as translation, rotation, or scale (Matas et al, 2004; Lowe, 2004; Dufournaud et al, 2004; Dalal and Triggs, 2005; Bay et al, 2008) and, more generally, to 2D affine transformations (Mikolajczyk and Schmid, 2004)....
[...]
...More recently, feature-based image analysis has become very popular (Lowe, 2004; Mikolajczyk and Schmid, 2005)....
[...]

Proceedings Article•DOI•

Urban Tracker: Multiple object tracking in urban mixed traffic

[...]

Jean-Philippe Jodoin¹, Guillaume-Alexandre Bilodeau¹, Nicolas Saunier¹•Institutions (1)

École Polytechnique de Montréal¹

24 Mar 2014

TL;DR: The Urban Tracker algorithm is validated on four outdoor urban videos involving mixed traffic that includes pedestrians, cars, large vehicles, etc and compares favorably to a current state of the art feature-based tracker for urban traffic scenes on pedestrians and mixed traffic.

...read moreread less

Abstract: In this paper, we study the problem of detecting and tracking multiple objects of various types in outdoor urban traffic scenes. This problem is especially challenging due to the large variation of road user appearances. To handle that variation, our system uses background subtraction to detect moving objects. In order to build the object tracks, an object model is built and updated through time inside a state machine using feature points and spatial information. When an occlusion occurs between multiple objects, the positions of feature points at previous observations are used to estimate the positions and sizes of the individual occluded objects. Our Urban Tracker algorithm is validated on four outdoor urban videos involving mixed traffic that includes pedestrians, cars, large vehicles, etc. Our method compares favorably to a current state of the art feature-based tracker for urban traffic scenes on pedestrians and mixed traffic.

...read moreread less

122 citations

Cites background from "Distinctive Image Features from Sca..."

...The tests defined in [12], the ratio and the symmetry test, are run for each pair of points to filter bad matches....
[...]

Proceedings Article•DOI•

Delving deeper into convolutional neural networks for camera relocalization

[...]

Jian Wu¹, Liwei Ma², Xiaolin Hu¹•Institutions (2)

Tsinghua University¹, Intel²

01 May 2017

TL;DR: A variant of Euler angles named Euler6 is proposed to represent orientation and a data augmentation method named pose synthesis is designed to reduce spsarsity of poses in the whole pose space to cope with overfitting in training as well as a multi-task CNN named BranchNet to deal with the complex coupling of orientation and translation.

...read moreread less

Abstract: Convolutional Neural Networks (CNNs) have been applied to camera relocalization, which is to infer the pose of the camera given a single monocular image. However, there are still many open problems for camera relocalization with CNNs. We delve into the CNNs for camera relocalization. First, a variant of Euler angles named Euler6 is proposed to represent orientation. Then a data augmentation method named pose synthesis is designed to reduce spsarsity of poses in the whole pose space to cope with overfitting in training. Third, a multi-task CNN named BranchNet is proposed to deal with the complex coupling of orientation and translation. The network consists of several shared convolutional layers and splits into two branches which predict orientation and translation, respectively. Experiments on the 7Scenes dataset show that incorporating these techniques one by one into an existing model PoseNet always leads to better results. Together these techniques reduce the orientation error by 15.9% and the translation error by 38.3% compared to the state-of-the-art model Bayesian PoseNet. We implement BranchNet on an Intel NUC mobile platform and reach a speed of 43 fps, which meets the real-time requirement of many robotic applications.

...read moreread less

122 citations

Cites background from "Distinctive Image Features from Sca..."

...Local features such as SIFT [9] and ORB [10] are exploited to register points....
[...]

Journal Article•DOI•

Person Re-Identification by Dual-Regularized KISS Metric Learning

[...]

Dapeng Tao¹, Yanan Guo¹, Mingli Song², Yaotang Li¹, Zhengtao Yu³, Yuan Yan Tang⁴ - Show less +2 more•Institutions (4)

Yunnan University¹, Zhejiang University², Kunming University of Science and Technology³, University of Macau⁴

01 Jun 2016-IEEE Transactions on Image Processing

TL;DR: Dual-regularized KISS (DR-KISS) metric learning improves on KISS by reducing overestimation of large eigenvalues of the two estimated covariance matrices and guarantees that the covariance matrix is irreversible.

...read moreread less

Abstract: Person re-identification aims to match the images of pedestrians across different camera views from different locations. This is a challenging intelligent video surveillance problem that remains an active area of research due to the need for performance improvement. Person re-identification involves two main steps: feature representation and metric learning. Although the keep it simple and straightforward (KISS) metric learning method for discriminative distance metric learning has been shown to be effective for the person re-identification, the estimation of the inverse of a covariance matrix is unstable and indeed may not exist when the training set is small, resulting in poor performance. Here, we present dual-regularized KISS (DR-KISS) metric learning. By regularizing the two covariance matrices, DR-KISS improves on KISS by reducing overestimation of large eigenvalues of the two estimated covariance matrices and, in doing so, guarantees that the covariance matrix is irreversible. Furthermore, we provide theoretical analyses for supporting the motivations. Specifically, we first prove why the regularization is necessary. Then, we prove that the proposed method is robust for generalization. We conduct extensive experiments on three challenging person re-identification datasets, VIPeR, GRID, and CUHK 01, and show that DR-KISS achieves new state-of-the-art performance.

...read moreread less

121 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
…
108
109
110
111
112
113
114
…
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations