Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Modeling and Calibration of Inertial and Vision Sensors

[...]

Jeroen D. Hol¹, Thomas B. Schön², Fredrik Gustafsson²•Institutions (2)

Xsens¹, Linköping University²

01 Feb 2010-The International Journal of Robotics Research

TL;DR: A new algorithm for estimating the relative translation and orientation of an inertial measurement unit and a camera, which does not require any additional hardware, except a piece of paper with a checkerboard pattern on it, which works well in practice, both for perspective and spherical cameras.

...read moreread less

Abstract: This paper is concerned with the problem of estimating the relative translation and orientation of an inertial measurement unit and a camera, which are rigidly connected. The key is to realize that this problem is in fact an instance of a standard problem within the area of system identification, referred to as a gray-box problem. We propose a new algorithm for estimating the relative translation and orientation, which does not require any additional hardware, except a piece of paper with a checkerboard pattern on it. The method is based on a physical model which can also be used in solving, for example, sensor fusion problems. The experimental results show that the method works well in practice, both for perspective and spherical cameras.

...read moreread less

87 citations

Cites methods from "Distinctive Image Features from Sca..."

...Common examples are SIFT [Lowe, 2004] and more recently SURF [Bay et al., 2008] and FERNS [Ozuysal et al., 2007]....
[...]
...Common examples are SIFT (Lowe 2004) and more recently SURF (Bay et al....
[...]

Proceedings Article•DOI•

An interactive region-of-interest video streaming system for online lecture viewing

[...]

Aditya Mavlankar¹, Piyush Agrawal¹, Derek Pang¹, Sherif Halawa¹, Ngai-Man Cheung¹, Bernd Girod¹ - Show less +2 more•Institutions (1)

Stanford University¹

01 Dec 2010

TL;DR: A design overview of the ClassX system and the evaluation results of a 3-month pilot deployment demonstrate that the system is a low-cost, efficient and pragmatic solution to interactive online lecture viewing.

...read moreread less

Abstract: ClassX is an interactive online lecture viewing system developed at Stanford University. Unlike existing solutions that restrict the user to watch only a pre-defined view, ClassX allows interactive pan/tilt/zoom while watching the video. The interactive video streaming paradigm avoids sending the entire field-of-view in the recorded high resolution, thus reducing the required data rate. To alleviate the navigation burden on the part of the online viewer, ClassX offers automatic tracking of the lecturer. ClassX also employs slide recognition technology, which allows automatic synchronization of digital presentation slides with those appearing in the lecture video. This paper presents a design overview of the ClassX system and the evaluation results of a 3-month pilot deployment at Stanford University. The results demonstrate that our system is a low-cost, efficient and pragmatic solution to interactive online lecture viewing.

...read moreread less

87 citations

Proceedings Article•DOI•

Burst deblurring: Removing camera shake through fourier burst accumulation

[...]

Mauricio Delbracio¹, Guilermo Sapiro¹•Institutions (1)

Duke University¹

07 Jun 2015

TL;DR: If the photographer takes a burst of images, a modality available in virtually all modern digital cameras, it is shown that it is possible to combine them to get a clean sharp version without explicitly solving any blur estimation and subsequent inverse problem.

...read moreread less

Abstract: Numerous recent approaches attempt to remove image blur due to camera shake, either with one or multiple input images, by explicitly solving an inverse and inherently ill-posed deconvolution problem. If the photographer takes a burst of images, a modality available in virtually all modern digital cameras, we show that it is possible to combine them to get a clean sharp version. This is done without explicitly solving any blur estimation and subsequent inverse problem. The proposed algorithm is strikingly simple: it performs a weighted average in the Fourier domain, with weights depending on the Fourier spectrum magnitude. The method's rationale is that camera shake has a random nature and therefore each image in the burst is generally blurred differently. Experiments with real camera data show that the proposed Fourier Burst Accumulation algorithm achieves state-of-the-art results an order of magnitude faster, with simplicity for on-board implementation on camera phones.

...read moreread less

87 citations

Cites methods from "Distinctive Image Features from Sca..."

...Image correspondences are found using SIFT features [19] and then filtered out through the orsa algorithm [21], a variant of the so called ransac method [10]....
[...]

Journal Article•DOI•

Airport Detection and Aircraft Recognition Based on Two-Layer Saliency Model in High Spatial Resolution Remote-Sensing Images

[...]

Libao Zhang¹, Yingying Zhang¹•Institutions (1)

Beijing Normal University¹

01 Apr 2017-IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

TL;DR: A novel airport detection and aircraft recognition method that is based on the two-layer visual saliency analysis model and support vector machines is proposed for high-resolution broad-area remote-sensing images and produces more robust results in complex scenes.

...read moreread less

Abstract: Efficient airport detection and aircraft recognition are essential due to the strategic importance of these regions and targets in economic and military construction. In this paper, a novel airport detection and aircraft recognition method that is based on the two-layer visual saliency analysis model and support vector machines (SVMs) is proposed for high-resolution broad-area remote-sensing images. In the first layer saliency (FLS) model, we introduce a spatial-frequency visual saliency analysis algorithm that is based on a CIE Lab color space to reduce the interference of backgrounds and efficiently detect well-defined airport regions in broad-area remote-sensing images. In the second layer saliency model, we propose a saliency analysis strategy that is based on an edge feature preserving wavelet transform and high-frequency wavelet coefficient reconstruction to complete the pre-extraction of aircraft candidates from airport regions that are detected by the FLS and crudely extract as many aircraft candidates as possible for additional classification in detected airport regions. Then, we utilize feature descriptors that are based on a dense SIFT and Hu moment to accurately describe these features of the aircraft candidates. Finally, these object features are inputted to the SVM, and the aircraft are recognized. The experimental results indicate that the proposed method not only reliably and effectively detects targets in high-resolution broad-area remote-sensing images but also produces more robust results in complex scenes.

...read moreread less

87 citations

Cites methods from "Distinctive Image Features from Sca..."

...[38] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int....
[...]
...(16) The SIFT is a local feature descriptor that was initially presented by Lowe [38] as a feature of keypoints....
[...]

Proceedings Article•DOI•

Real-Time Human Objects Tracking for Smart Surveillance at the Edge

[...]

Ronghua Xu¹, Seyed Yahya Nikouei¹, Yu Chen¹, Aleksey S. Polunchenko¹, Sejun Song², Chengbin Deng¹, Timothy R. Faughnan¹ - Show less +3 more•Institutions (2)

Binghamton University¹, University of Missouri–Kansas City²

20 May 2018

TL;DR: This paper investigated the feasibility of processing surveillance video streaming at the network edge for real-time, uninterrupted moving human objects tracking, and an efficient multi-object tracking algorithm based on Kernelized Correlation Filters is proposed.

...read moreread less

Abstract: Allowing computation to be performed at the edge of a network, edge computing has been recognized as a promising approach to address some challenges in the cloud computing paradigm, particularly to the delay-sensitive and mission-critical applications like real-time surveillance. Prevalence of networked cameras and smart mobile devices enable video analytics at the network edge. However, human objects detection and tracking are still conducted at cloud centers, as real-time, online tracking is computationally expensive. In this paper, we investigated the feasibility of processing surveillance video streaming at the network edge for real-time, uninterrupted moving human objects tracking. Moving human detection based on Histogram of Oriented Gradients (HOG) and linear Support Vector Machine (SVM) is illustrated for features extraction, and an efficient multi-object tracking algorithm based on Kernelized Correlation Filters (KCF) is proposed. Implemented and tested on Raspberry Pi 3, our experimental results are very encouraging, which validated the feasibility of the proposed approach toward a real-time surveillance solution at the edge of networks.

...read moreread less

86 citations

Cites background from "Distinctive Image Features from Sca..."

...Grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection [7] and HOG+SVM algorithm [14] has better performance in human detection....
[...]
...Scale invariance feature transformation (SIFT) provides an alternative algorithm for human detection through extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene [14]....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
…
160
161
162
163
164
165
166
…
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations