Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Benefit of large field-of-view cameras for visual odometry

[...]

Zichao Zhang¹, Henri Rebecq¹, Christian Forster¹, Davide Scaramuzza¹•Institutions (1)

University of Zurich¹

16 May 2016

TL;DR: This work analyzes two common operational environments in mobile robotics: an urban environment and an indoor scene and implements a state-of-the-art VO pipeline that works with large FoV fisheye and catadioptric cameras.

...read moreread less

Abstract: The transition of visual-odometry technology from research demonstrators to commercial applications naturally raises the question: “what is the optimal camera for vision-based motion estimation?” This question is crucial as the choice of camera has a tremendous impact on the robustness and accuracy of the employed visual odometry algorithm. While many properties of a camera (e.g. resolution, frame-rate, global-shutter/rolling-shutter) could be considered, in this work we focus on evaluating the impact of the camera field-of-view (FoV) and optics (i.e., fisheye or catadioptric) on the quality of the motion estimate. Since the motion-estimation performance depends highly on the geometry of the scene and the motion of the camera, we analyze two common operational environments in mobile robotics: an urban environment and an indoor scene. To confirm the theoretical observations, we implement a state-of-the-art VO pipeline that works with large FoV fisheye and catadioptric cameras. We evaluate the proposed VO pipeline in both synthetic and real experiments. The experiments point out that it is advantageous to use a large FoV camera (e.g., fisheye or catadioptric) for indoor scenes and a smaller FoV for urban canyon environments.

...read moreread less

140 citations

Additional excerpts

..., SIFT [11]) to establish feature correspondence....
[...]
...Most VO algorithms for omnidirectional cameras [7], [8], [9], [10] rely on robust feature descriptors (e.g., SIFT [11]) to establish feature correspondence....
[...]

Proceedings Article•DOI•

Object recognition as ranking holistic figure-ground hypotheses

[...]

Fuxin Li¹, Joao Carreira¹, Cristian Sminchisescu¹•Institutions (1)

University of Bonn¹

13 Jun 2010

TL;DR: Results are demonstrated beyond the current state of the art for image classification, object detection and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape and PASCAL VOC 2009.

...read moreread less

Abstract: We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions are performed based on continuous estimates of the spatial overlap between image segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption that good object-level segments can be obtained in a feed-forward fashion, but also in framing recognition as a regression problem. Instead of focusing on a one-vs-all winning margin that can scramble ordering inside the non-maximum (non-winning) set, learning produces a globally consistent ranking with close ties to segment quality, hence to the extent entire object or part hypotheses spatially overlap with the ground truth. We demonstrate results beyond the current state of the art for image classification, object detection and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape and PASCAL VOC 2009.

...read moreread less

140 citations

Cites methods from "Distinctive Image Features from Sca..."

...In order to model object appearance we extracted four bags of words of gray-level SIFT [19] and color SIFT [25], computed on a regular grid, two on the foreground and two on the background of each segment....
[...]
...In order to model the object appearance we extract four bags of words of gray-level SIFT [39] and color SIFT [49], on a regular grid, two on the foreground and two on the background of each segment....
[...]

Journal Article•DOI•

Structure from Motion (SfM) Photogrammetry with Drone Data: A Low Cost Method for Monitoring Greenhouse Gas Emissions from Forests in Developing Countries

[...]

Reason Mlambo, Iain Woodhouse, Karen Anderson

03 Mar 2017-Forests

TL;DR: The study concluded that although SfM from UAVs performs poorly in closed canopies, it can still provide a low cost solution in those developing countries where forests have sparse canopy cover (<50%) with individual tree crowns and ground surfaces well-captured by S fM photogrammetry.

...read moreread less

Abstract: Structure from Motion (SfM) photogrammetry applied to photographs captured from Unmanned Aerial Vehicle (UAV) platforms is increasingly being utilised for a wide range of applications including structural characterisation of forests. The aim of this study was to undertake a first evaluation of whether SfM from UAVs has potential as a low cost method for forest monitoring within developing countries in the context of Reducing Emissions from Deforestation and forest Degradation (REDD+). The project evaluated SfM horizontal and vertical accuracy for measuring the height of individual trees. Aerial image data were collected for two test sites; Meshaw (Devon, UK) and Dryden (Scotland, UK) using a Quest QPOD fixed wing UAV and DJI Phantom 2 quadcopter UAV, respectively. Comparisons were made between SfM and airborne LiDAR point clouds and surface models at the Meshaw site, while at Dryden, SfM tree heights were compared to ground measured tree heights. Results obtained showed a strong correlation between SfM and LiDAR digital surface models (R2 = 0.89) and canopy height models (R2 = 0.75). However, at Dryden, a poor correlation was observed between SfM tree heights and ground measured heights (R2 = 0.19). The poor results at Dryden were explained by the fact that the forest plot had a closed canopy structure such that SfM failed to generate enough below-canopy ground points. Finally, an evaluation of UAV surveying methods was also undertaken to determine their usefulness and cost-effectiveness for plot-level forest monitoring. The study concluded that although SfM from UAVs performs poorly in closed canopies, it can still provide a low cost solution in those developing countries where forests have sparse canopy cover (<50%) with individual tree crowns and ground surfaces well-captured by SfM photogrammetry. Since more than half of the forest covered areas of the world have canopy cover <50%, we can conclude that SfM has enormous potential for forest mapping in developing countries.

...read moreread less

140 citations

Cites methods from "Distinctive Image Features from Sca..."

... These proces s were executed using 3 different algorithms, namely SiftGPU [51,52], Clustering View for Multi‐view Stereo (CMVS) and Patch‐based Multi‐view Stereo (PMVS2) [53], all of which are packaged as part of VisualSFM....
[...]
...These proces s were executed using 3 different algorithms, namely SiftGPU [51,52], Clustering View for Multi‐view Stereo (CMVS) and Patch‐based Multi‐view Stereo (PMVS2) [53], all of which are packaged as part of VisualSFM....
[...]
...These processes were executed using 3 different algorithms, namely SiftGPU [51,52], Clustering View for Multi-view Stereo (CMVS) and Patch-based Multi-view Stereo (PMVS2) [53], all of which are packaged as part of VisualSFM....
[...]

Book Chapter•DOI•

Semantic Match Consistency for Long-Term Visual Localization

[...]

Carl Toft¹, Erik Stenborg¹, Lars Hammarstrand¹, Lucas Brynte¹, Marc Pollefeys², Torsten Sattler², Fredrik Kahl¹ - Show less +3 more•Institutions (2)

Chalmers University of Technology¹, ETH Zurich²

08 Sep 2018

TL;DR: This paper presents a method for scoring the individual correspondences by exploiting semantic information about the query image and the scene, and shows that the localization performance can be significantly improved compared to the state-of-the-art, as evaluated on two challenging long-term localization benchmarks.

...read moreread less

Abstract: Robust and accurate visual localization across large appearance variations due to changes in time of day, seasons, or changes of the environment is a challenging problem which is of importance to application areas such as navigation of autonomous robots. Traditional feature-based methods often struggle in these conditions due to the significant number of erroneous matches between the image and the 3D model. In this paper, we present a method for scoring the individual correspondences by exploiting semantic information about the query image and the scene. In this way, erroneous correspondences tend to get a low semantic consistency score, whereas correct correspondences tend to get a high score. By incorporating this information in a standard localization pipeline, we show that the localization performance can be significantly improved compared to the state-of-the-art, as evaluated on two challenging long-term localization benchmarks.

...read moreread less

140 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...Such structure-based methods assign one or more feature descriptors, e.g., SIFT [30] or LIFT [53], to each 3D point....
[...]
...test to filter out ambiguous matches [30]....
[...]
..., SIFT [30] or LIFT [53], to each 3D point....
[...]
...1 shows our full localization pipeline: Given a query image, we extract local (SIFT [30]) features and compute its semantic segmentation....
[...]

Proceedings Article•DOI•

A textured object recognition pipeline for color and depth image data

[...]

Jie Tang¹, Stephen Miller¹, Arjun Singh¹, Pieter Abbeel¹•Institutions (1)

University of California, Berkeley¹

14 May 2012

TL;DR: An object recognition system which leverages the additional sensing and calibration information available in a robotics setting together with large amounts of training data to build high fidelity object models for a dataset of textured household objects is presented.

...read moreread less

Abstract: We present an object recognition system which leverages the additional sensing and calibration information available in a robotics setting together with large amounts of training data to build high fidelity object models for a dataset of textured household objects. We then demonstrate how these models can be used for highly accurate detection and pose estimation in an end-to-end robotic perception system incorporating simultaneous segmentation, object classification, and pose fitting. The system can handle occlusions, illumination changes, multiple objects, and multiple instances of the same object. The system placed first in the ICRA 2011 Solutions in Perception instance recognition challenge. We believe the presented paradigm of building rich 3D models at training time and including depth information at test time is a promising direction for practical robotic perception systems.

...read moreread less

140 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
…
90
91
92
93
94
95
96
…
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations