Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

DetNet: Design Backbone for Object Detection

[...]

Zeming Li¹, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng¹, Jian Sun - Show less +2 more•Institutions (1)

Tsinghua University¹

08 Sep 2018

TL;DR: DetNet is proposed, which is a novel backbone network specifically designed for object detection that includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers.

...read moreread less

Abstract: Recent CNN based object detectors, either one-stage methods like YOLO, SSD, and RetinaNet, or two-stage detectors like Faster R-CNN, R-FCN and FPN, are usually trying to directly finetune from ImageNet pre-trained models designed for the task of image classification. However, there has been little work discussing the backbone feature extractor specifically designed for the task of object detection. More importantly, there are several differences between the tasks of image classification and object detection. (i) Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales. (ii) Object detection not only needs to recognize the category of the object instances but also spatially locate them. Large downsampling factors bring large valid receptive field, which is good for image classification, but compromises the object location ability. Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. Moreover, DetNet includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers. Without any bells and whistles, state-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on our DetNet (4.8G FLOPs) backbone. Codes will be released (https://github.com/zengarden/DetNet).

...read moreread less

233 citations

Cites methods from "Distinctive Image Features from Sca..."

...Old detectors extract image features by using hand-engineered object component descriptors, such as HOG [5], SIFT [26], Selective Search [37], Edge Box [41]....
[...]

Proceedings Article•DOI•

To Aggregate or Not to aggregate: Selective Match Kernels for Image Search

[...]

Giorgos Tolias¹, Yannis Avrithis¹, Hervé Jégou•Institutions (1)

National Technical University of Athens¹

01 Dec 2013

TL;DR: A match kernel is proposed that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel, providing a large scale image search both precise and scalable, as shown by the experiments on several benchmarks.

...read moreread less

Abstract: This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the VLAD descriptor and matching techniques such as Hamming Embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. Finally, the representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks.

...read moreread less

233 citations

Journal Article•DOI•

Remote Sensing of Vegetation Structure Using Computer Vision

[...]

Jonathan P. Dandois, Erle C. Ellis

21 Apr 2010-Remote Sensing

TL;DR: It is demonstrated that computer vision can support ultra-low-cost, user-deployed high spatial resolution 3D remote sensing of vegetation structure and spectral characteristics from ordinary digital photographs acquired using inexpensive hobbyist aerial platforms.

...read moreread less

Abstract: High spatial resolution measurements of vegetation structure in three-dimensions (3D) are essential for accurate estimation of vegetation biomass, carbon accounting, forestry, fire hazard evaluation and other land management and scientific applications. Light Detection and Ranging (LiDAR) is the current standard for these measurements but requires bulky instruments mounted on commercial aircraft. Here we demonstrate that high spatial resolution 3D measurements of vegetation structure and spectral characteristics can be produced by applying open-source computer vision algorithms to ordinary digital photographs acquired using inexpensive hobbyist aerial platforms. Digital photographs were acquired using a kite aerial platform across two 2.25 ha test sites in Baltimore, MD, USA. An open-source computer vision algorithm generated 3D point cloud datasets with RGB spectral attributes from the photographs and these were geocorrected to a horizontal precision of 0.82) than computer vision (R2 > 0.64), primarily because of difficulties observing terrain under closed canopy forest. Results confirm that computer vision can support ultra-low-cost, user-deployed high spatial resolution 3D remote sensing of vegetation structure.

...read moreread less

232 citations

Cites methods from "Distinctive Image Features from Sca..."

...Bundler is a new open-source SfM software package [19] that combines the SIFT algorithm (Scale Invariant Feature Transform) [20] for keypoint extraction with bundle adjustment using the Sparse Remote Sens. 2010, 2 1159 Bundle Adjustment package (SBA) [21]....
[...]
...Bundler is a new open-source SfM software package [19] that combines the SIFT algorithm (Scale Invariant Feature Transform) [20] for keypoint extraction with bundle adjustment using the Sparse...
[...]

Proceedings Article•DOI•

Probabilistic Elastic Matching for Pose Variant Face Verification

[...]

Haoxiang Li¹, Gang Hua¹, Zhe Lin², Jonathan Brandt², Jianchao Yang² - Show less +1 more•Institutions (2)

Stevens Institute of Technology¹, Adobe Systems²

23 Jun 2013

TL;DR: This work proposes a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy.

...read moreread less

Abstract: Pose variation remains to be a major challenge for real-world face recognition. We approach this problem through a probabilistic elastic matching method. We take a part based representation by extracting local features (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each feature with its location, a Gaussian mixture model (GMM) is trained to capture the spatial-appearance distribution of all face images in the training corpus. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms. Each Gaussian component builds correspondence of a pair of features to be matched between two faces/face tracks. For face verification, we train an SVM on the vector concatenating the difference vectors of all the feature pairs to decide if a pair of faces/face tracks is matched or not. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that our method outperforms the state-of-the-art in the most restricted protocol on Labeled Face in the Wild (LFW) and the YouTube video face database by a significant margin.

...read moreread less

232 citations

Cites background from "Distinctive Image Features from Sca..."

...Each face image is densely partitioned into overlapping patches at multiple scales, from each of which a local feature such as Local Binary Pattern (LBP) [1] or SIFT [19] is extracted....
[...]
...Our method only employed simple visual features such as LBP and SIFT....
[...]
...As shown in Figure 1, SIFT and LBP features are extracted over each scale for a 3-scale Gaussian image pyramid with scaling factor 0.9....
[...]
...We take a part based representation by extracting local features (e.g., LBP or SIFT) from densely sampled multi-scale image patches....
[...]
...SIFT features are extracted from patches from a 8x8 sliding window with 4-pixel spacing, and LBP features2 are extracted from a 32x32 sliding window with 4-pixel spacing....
[...]

Book Chapter•DOI•

Low-Cost and open-source solutions for automated image orientation --- a critical overview

[...]

Fabio Remondino¹, Silvio Del Pizzo², Thomas P. Kersten³, Salvatore Troisi²•Institutions (3)

Kessler Foundation¹, Parthenope University of Naples², HafenCity University Hamburg³

29 Oct 2012

TL;DR: The article presents an investigation of automated image orientation packages in order to clarify potentialities and performances when dealing with large and complex datasets.

...read moreread less

Abstract: The recent developments in automated image processing for 3D reconstruction purposes have led to the diffusion of low-cost and open-source solutions which can be nowadays used by everyone to produce 3D models. The level of automation is so high that many solutions are black-boxes with poor repeatability and low reliability. The article presents an investigation of automated image orientation packages in order to clarify potentialities and performances when dealing with large and complex datasets.

...read moreread less

231 citations

Cites background from "Distinctive Image Features from Sca..."

...- Bundler: developed by the University of Washington & Microsoft, it was created with the aim of reconstructing 3D scene using a huge number of images downloaded approach, with RANSAC to estimate the F matrix from the extracted SIFT features and reject possible outlier for every couple of images....
[...]
...Nowadays SIFT [28] and SURF [29] algorithms provide highly distinctive features invariant to image scaling and rotations with an associate descriptor (64- or 128-dimensional vector) to each extracted image feature....
[...]
...This software uses a modified SIFT++ feature extractor [35] and allows to choose between several camera model (Brown’s, fisheye, etc.)....
[...]
...The numerical solution to the problem of function minimization is generally sough with methods like Levenberg-Marquardt, Gauss-Newton or Gauss-Markov. quadratic matching or kd-tree feature extraction: SIFT-like, SURF, etc. descriptor comparison and pairwise correspondences extraction son robust outlier rejection: E/F matrix or T tensor with RANSAC, MAPSAC or LMedS method bundle adjustment concatenation of all image combinations and extraction of image correspondences for the entire image block the datum (or gauge) definition problem....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
…
40
41
42
43
44
45
46
…
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations