Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•

Robot learning manipulation action plans by Watching unconstrained videos from the world wide web

[...]

Yezhou Yang¹, Yi Li², Cornelia Fermüller¹, Yiannis Aloimonos¹•Institutions (2)

University of Maryland, College Park¹, NICTA²

25 Jan 2015

TL;DR: A system that learns manipulation action plans by processing unconstrained videos from the World Wide Web to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots.

...read moreread less

Abstract: In order to advance action generation and creation in robots beyond simple learned schemas we need computational tools that allow us to automatically interpret and represent human actions. This paper presents a system that learns manipulation action plans by processing unconstrained videos from the World Wide Web. Its goal is to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots. The lower level of the system consists of two convolutional neural network (CNN) based recognition modules, one for classifying the hand grasp type and the other for object recognition. The higher level is a probabilistic manipulation action grammar based parsing module that aims at generating visual sentences for robot manipulation. Experiments conducted on a publicly available unconstrained video dataset show that the system is able to learn manipulation actions by "watching" unconstrained videos with high accuracy.

...read moreread less

202 citations

Proceedings Article•DOI•

BRIEF-Gist - closing the loop by simple means

[...]

Niko Sünderhauf¹, Peter Protzel¹•Institutions (1)

Chemnitz University of Technology¹

05 Dec 2011

TL;DR: BRIEF-Gist, a very simplistic appearance-based place recognition system based on the BRIEF descriptor, is proposed, which is much more easy to implement and more efficient compared to recent approaches like FAB-Map.

...read moreread less

Abstract: The ability to recognize known places is an essential competence of any intelligent system that operates autonomously over longer periods of time. Approaches that rely on the visual appearance of distinct scenes have recently been developed and applied to large scale SLAM scenarios. FAB-Map is maybe the most successful of these systems. Our paper proposes BRIEF-Gist, a very simplistic appearance-based place recognition system based on the BRIEF descriptor. BRIEF-Gist is much more easy to implement and more efficient compared to recent approaches like FAB-Map. Despite its simplicity, we can show that it performs comparably well as a front-end for large scale SLAM. We benchmark our approach using two standard datasets and perform SLAM on the 66 km long urban St. Lucia dataset.

...read moreread less

202 citations

Cites methods from "Distinctive Image Features from Sca..."

...It was found to be superior to the established SIFT [12] or SURF [2] descriptors, both in recognition performance and runtime behaviour....
[...]

Journal Article•DOI•

Automated Diagnosis of Epilepsy Using Key-Point-Based Local Binary Pattern of EEG Signals

[...]

Ashwani Kumar Tiwari, Ram Bilas Pachori¹, Vivek Kanhangad¹, Bijaya Ketan Panigrahi²•Institutions (2)

Indian Institute of Technology Indore¹, Indian Institute of Technology Delhi²

01 Jul 2017-IEEE Journal of Biomedical and Health Informatics

TL;DR: The proposed methodology based on the LBP computed at key points is simple and easy to implement for real-time epileptic seizure detection and has been compared with existing methods for the classification of the aforementioned problems.

...read moreread less

Abstract: The electroencephalogram (EEG) signals are commonly used for diagnosis of epilepsy. In this paper, we present a new methodology for EEG-based automated diagnosis of epilepsy. Our method involves detection of key points at multiple scales in EEG signals using a pyramid of difference of Gaussian filtered signals. Local binary patterns (LBPs) are computed at these key points and the histogram of these patterns are considered as the feature set, which is fed to the support vector machine (SVM) for the classification of EEG signals. The proposed methodology has been investigated for the four well-known classification problems namely, 1) normal and epileptic seizure, 2) epileptic seizure and seizure free, 3) normal, epileptic seizure, and seizure free, and 4) epileptic seizure and nonseizure EEG signals using publically available university of Bonn EEG database. Our experimental results in terms of classification accuracies have been compared with existing methods for the classification of the aforementioned problems. Further, performance evaluation on another EEG dataset shows that our approach is effective for classification of seizure and seizure-free EEG signals. The proposed methodology based on the LBP computed at key points is simple and easy to implement for real-time epileptic seizure detection.

...read moreread less

202 citations

Cites methods from "Distinctive Image Features from Sca..."

...In order to detect key points in EEG signals, we have adopted a technique employed in scale invariant feature transformation [22], which has been a very successful approach for image matching....
[...]

Journal Article•DOI•

Gaussian Processes for Object Categorization

[...]

Ashish Kapoor¹, Kristen Grauman², Raquel Urtasun³, Trevor Darrell³•Institutions (3)

Microsoft¹, University of Texas at Austin², University of California, Berkeley³

01 Jun 2010-International Journal of Computer Vision

TL;DR: This work shows that with an appropriate combination of kernels a significant boost in classification performance is possible, and indicates the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.

...read moreread less

Abstract: Discriminative methods for visual object category recognition are typically non-probabilistic, predicting class labels but not directly providing an estimate of uncertainty. Gaussian Processes (GPs) provide a framework for deriving regression techniques with explicit uncertainty models; we show here how Gaussian Processes with covariance functions defined based on a Pyramid Match Kernel (PMK) can be used for probabilistic object category recognition. Our probabilistic formulation provides a principled way to learn hyperparameters, which we utilize to learn an optimal combination of multiple covariance functions. It also offers confidence estimates at test points, and naturally allows for an active learning paradigm in which points are optimally selected for interactive labeling. We show that with an appropriate combination of kernels a significant boost in classification performance is possible. Further, our experiments indicate the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.

...read moreread less

202 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...To extract local features, we can exploit a wealth of interest operators designed to detect a sparse set of salient regions (e.g., Lowe 2004; Mikolajczyk and Schmid 2004; Kadir and Brady 2003), or simply sample densely at regular intervals and at multiple scales....
[...]
...We use PCA to reduce the dimensionality of the SIFT descriptors to 10 before adding the position, yielding features having a total of 12 dimensions....
[...]
...For example, F might be the space of SIFT (Lowe 2004) descriptors (d = 128), or image coordinate positions (d = 2), etc.; a set F contains a collection of these descriptors extracted from a single image or object....
[...]
...To describe each region or patch, we can choose from an array of descriptors designed to capture local texture while maintaining some invariance to small shifts and rotations, such as SIFT (Lowe 2004), shape context (Belongie et al. 2001), or geometric blur (Berg and Malik 2001)....
[...]
...For example, F might be the space of SIFT (Lowe 2004) descriptors (d = 128), or image coordinate positions (d = 2), etc....
[...]

Journal Article•DOI•

Deformable Medical Image Registration: Setting the State of the Art with Discrete Methods*

[...]

Ben Glocker¹, Aristeidis Sotiras, Nikos Komodakis, Nikos Paragios•Institutions (1)

Technische Universität München¹

14 Jul 2011-Annual Review of Biomedical Engineering

TL;DR: A novel deformable image registration paradigm that exploits Markov random field formulation and powerful discrete optimization algorithms is introduced, leading to a modular, powerful, and flexible formulation that can account for arbitrary image-matching criteria, various local deformation models, and regularization constraints.

...read moreread less

Abstract: This review introduces a novel deformable image registration paradigm that exploits Markov random field formulation and powerful discrete optimization algorithms. We express deformable registration as a minimal cost graph problem, where nodes correspond to the deformation grid, a node's connectivity corresponds to regularization constraints, and labels correspond to 3D deformations. To cope with both iconic and geometric (landmark-based) registration, we introduce two graphical models, one for each subproblem. The two graphs share interconnected variables, leading to a modular, powerful, and flexible formulation that can account for arbitrary image-matching criteria, various local deformation models, and regularization constraints. To cope with the corresponding optimization problem, we adopt two optimization strategies: a computationally efficient one and a tight relaxation alternative. Promising results demonstrate the potential of this approach. Discrete methods are an important new trend in medical image registration, as they provide several improvements over the more traditional continuous methods. This is illustrated with several key examples where the presented framework outperforms existing general-purpose registration methods in terms of both performance and computational complexity. Our methods become of particular interest in applications where computation time is a critical issue, as in intraoperative imaging, or where the huge variation in data demands complex and application-specific matching criteria, as in large-scale multimodal population studies. The proposed registration framework, along with a graphical interface and corresponding publications, is available for download for research purposes (for Windows and Linux platforms) from http://www.mrf-registration.net.

...read moreread less

202 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
…
52
53
54
55
56
57
58
…
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations