Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Visual Link Retrieval in a Database of Paintings

[...]

Benoit Seguin¹, Carlotta Striolo¹, Isabella diLenardo¹, Frédéric Kaplan¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

08 Oct 2016

TL;DR: It is shown that pre-trained convolutional neural network can perform better for this task than other machine vision methods aimed at photograph analysis and retrieval performance can be significantly improved by fine-tuning a network specifically for thistask.

...read moreread less

Abstract: This paper examines how far state-of-the-art machine vision algorithms can be used to retrieve common visual patterns shared by series of paintings. The research of such visual patterns, central to Art History Research, is challenging because of the diversity of similarity criteria that could relevantly demonstrate genealogical links. We design a methodology and a tool to annotate efficiently clusters of similar paintings and test various algorithms in a retrieval task. We show that pre-trained convolutional neural network can perform better for this task than other machine vision methods aimed at photograph analysis. We also show that retrieval performance can be significantly improved by fine-tuning a network specifically for this task.

...read moreread less

76 citations

Cites methods from "Distinctive Image Features from Sca..."

...We computed the SIFT descriptors for every image of the dataset....
[...]
...The main class of algorithms used very successfully in the problem of visual instance retrieval are based on local visual descriptors (mainly SIFT [24])....
[...]
...However, previous works on cross-domain matching [5,12,33] have shown that while these methods perform well on photographs, the performance of SIFT across domains drops drastically....
[...]
...The extreme variability in patterns, style and colors seems to be too strong for a dictionary of SIFT descriptor to handle....
[...]

Journal Article•DOI•

SPHORB: A Fast and Robust Binary Feature on the Sphere

[...]

Qiang Zhao¹, Wei Feng¹, Liang Wan², Jiawan Zhang¹•Institutions (2)

Tianjin University¹, Civil Aviation University of China²

01 Jun 2015-International Journal of Computer Vision

TL;DR: Extensive experiments show that SPHORB consistently outperforms other existing spherical features in accuracy, efficiency and robustness to camera movements, and has been validated by real-world matching tests.

...read moreread less

Abstract: In this paper, we propose SPHORB, a new fast and robust binary feature detector and descriptor for spherical panoramic images. In contrast to state-of-the-art spherical features, our approach stems from the geodesic grid, a nearly equal-area hexagonal grid parametrization of the sphere used in climate modeling. It enables us to directly build fine-grained pyramids and construct robust features on the hexagonal spherical grid, thus avoiding the costly computation of spherical harmonics and their associated bandwidth limitation. We further study how to achieve scale and rotation invariance for the proposed SPHORB feature. Extensive experiments show that SPHORB consistently outperforms other existing spherical features in accuracy, efficiency and robustness to camera movements. The superior performance of SPHORB has also been validated by real-world matching tests.

...read moreread less

76 citations

Journal Article•DOI•

Adaptive hash retrieval with kernel based similarity

[...]

Xiao Bai¹, Cheng Yan¹, Haichuan Yang¹, Lu Bai², Jun Zhou³, Edwin R. Hancock⁴ - Show less +2 more•Institutions (4)

Beihang University¹, Central University of Finance and Economics², Griffith University³, University of York⁴

01 Mar 2018-Pattern Recognition

TL;DR: A novel adaptive similarity measure which is consistent with k-nearest neighbor search is presented, and it is proved that it leads to a valid kernel if the original similarity function is a kernel function.

...read moreread less

76 citations

Journal Article•DOI•

SLAM-based dense surface reconstruction in monocular Minimally Invasive Surgery and its application to Augmented Reality

[...]

Long Chen¹, Wen Tang¹, Nigel W. John², Tao Ruan Wan³, Jian J. Zhang¹ - Show less +1 more•Institutions (3)

Bournemouth University¹, University of Chester², University of Bradford³

08 Feb 2018-Computer Methods and Programs in Biomedicine

TL;DR: A novel intra-operative dense surface reconstruction framework that is capable of providing geometry information from only monocular MIS videos for geometry-aware AR applications such as site measurements and depth cues is presented.

...read moreread less

76 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...Since ORB [40] is a binary feature point descriptor, it is an order of magnitude faster than SURF [1] and more than two orders faster than SIFT [27] with better accuracy....
[...]
...2016] [27] Lowe DG (2004) Distinctive image features from scale-invariant keypoints....
[...]
...Traditional tracking methods for AR in MIS usually involve feature points based tracking such as Scale-Invariant Feature Transform (SIFT) [18], Speeded Up Robust Features (SURF) [22], Optical Flow tracking [38] or other approaches specifically designed to work with soft tissues that account for changes in scale, rotation and brightness [31]....
[...]

Posted Content•

Visual descriptors for content-based retrieval of remote sensing images

[...]

Paolo Napoletano¹•Institutions (1)

University of Milano-Bicocca¹

02 Feb 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, an extensive evaluation of visual descriptors for the content-based retrieval of remote sensing (RS) images is presented, which includes global hand-crafted, local handcrafted, and convolutional neural network (CNNs) features coupled with four different Content-Based Image Retrieval schemes.

...read moreread less

Abstract: In this paper we present an extensive evaluation of visual descriptors for the content-based retrieval of remote sensing (RS) images. The evaluation includes global hand-crafted, local hand-crafted, and Convolutional Neural Network (CNNs) features coupled with four different Content-Based Image Retrieval schemes. We conducted all the experiments on two publicly available datasets: the 21-class UC Merced Land Use/Land Cover (LandUse) dataset and 19-class High-resolution Satellite Scene dataset (SceneSat). The content of RS images might be quite heterogeneous, ranging from images containing fine grained textures, to coarse grained ones or to images containing objects. It is therefore not obvious in this domain, which descriptor should be employed to describe images having such a variability. Results demonstrate that CNN-based features perform better than both global and and local hand-crafted features whatever is the retrieval scheme adopted. Features extracted from SatResNet-50, a residual CNN suitable fine-tuned on the RS domain, shows much better performance than a residual CNN pre-trained on multimedia scene and object images. Features extracted from NetVLAD, a CNN that considers both CNN and local features, works better than others CNN solutions on those images that contain fine-grained textures and objects.

...read moreread less

76 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
…
185
186
187
188
189
190
191
…
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations