Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Pairwise geometric matching for large-scale object retrieval

[...]

Xinchao Li¹, Martha Larson¹, Alan Hanjalic¹•Institutions (1)

Delft University of Technology¹

07 Jun 2015

TL;DR: This paper considers the pairwise geometric relations between correspondences and proposes a strategy to incorporate these relations at significantly reduced computational cost, which makes it suitable for large-scale object retrieval.

...read moreread less

Abstract: Spatial verification is a key step in boosting the performance of object-based image retrieval. It serves to eliminate unreliable correspondences between salient points in a given pair of images, and is typically performed by analyzing the consistency of spatial transformations between the image regions involved in individual correspondences. In this paper, we consider the pairwise geometric relations between correspondences and propose a strategy to incorporate these relations at significantly reduced computational cost, which makes it suitable for large-scale object retrieval. In addition, we combine the information on geometric relations from both the individual correspondences and pairs of correspondences to further improve the verification accuracy. Experimental results on three reference datasets show that the proposed approach results in a substantial performance improvement compared to the existing methods, without making concessions regarding computational efficiency.

...read moreread less

108 citations

Cites methods from "Distinctive Image Features from Sca..."

...Then, similarly to [17, 14], we reduce this set even further, by deploying Hough voting in the scaling and rotation transformation space....
[...]
...For instance, in the SIFT [17] scheme, which is widely deployed for this purpose, salient points are detected by a Difference of Gaussians (DOG) function applied in the scale space....
[...]
...Different from RANSAC-based methods, Lowe [17] applied Hough transform to the geometric transformation space to find groups of consistently transformed correspondences prior to estimating the transformation model....
[...]
...We now depart from the set C1vs1 and follow the strategy from [17, 14] to apply a Hough voting scheme in search for dominant ranges of the target transformation parameters, specifically for the rotation and scaling, in the transformation space....
[...]

Journal Article•DOI•

A Novel Vision-Based Tracking Algorithm for a Human-Following Mobile Robot

[...]

Meenakshi Gupta¹, Swagat Kumar², Laxmidhar Behera¹, Venkatesh K. Subramanian¹•Institutions (2)

Indian Institutes of Technology¹, Tata Consultancy Services²

01 Jul 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An improved version of a visual servo controller is proposed that uses feedback linearization to overcome the chattering phenomenon present in sliding mode-based controllers used previously.

...read moreread less

Abstract: The ability to follow a human is an important requirement for a service robot designed to work along side humans in homes or in work places. This paper describes the development and implementation of a novel robust visual controller for the human-following robot. This visual controller consists of two parts: 1) a robust algorithm that tracks a human visible in its camera view and 2) a servo controller that generates necessary motion commands so that the robot can follow the target human. The tracking algorithm uses point-based features, like speeded up robust feature, to detect human under challenging conditions, such as, variation in illumination, pose change, full or partial occlusion, and abrupt camera motion. The novel contributions in the tracking algorithm include the following: 1) a dynamic object model that evolves over time to deal with short-term changes, while maintaining stability over long run; 2) an online K-D tree-based classifier along with a Kalman filter is used to differentiate a case of pose change from a case of partial or full occlusion; and 3) a method is proposed to detect pose change due to out-of-plane rotations , which is a difficult problem that leads to frequent tracking failures in a human following robot. An improved version of a visual servo controller is proposed that uses feedback linearization to overcome the chattering phenomenon present in sliding mode-based controllers used previously. The efficacy of the proposed approach is demonstrated through various simulations and real-life experiments with an actual mobile robot platform.

...read moreread less

108 citations

Cites background from "Distinctive Image Features from Sca..."

...As far as human tracking is concerned, tracking-bydetection framework [7], [9], [30] is becoming increasingly popular that uses locally invariant features like SURF [8] or scale invariant feature transformation (SIFT) [31], [32] to recognize target in each frame....
[...]
...Since SURF is known to be computationally more efficient compared to SIFT features, the current review is limited only to SURF-based methods....
[...]

Proceedings Article•DOI•

Accurate Geo-Registration by Ground-to-Aerial Image Matching

[...]

Qi Shan¹, Changchang Wu², Brian Curless¹, Yasutaka Furukawa², Carlos Hernández², Steven M. Seitz¹ - Show less +2 more•Institutions (2)

University of Washington¹, Washington University in St. Louis²

08 Dec 2014

TL;DR: This work proposes a fully automated geo-registration pipeline with a novel viewpoint-dependent matching method that handles ground to aerial viewpoint variation, and demonstrates a high success rate for the task, and dramatically outperforms state-of-the-art techniques.

...read moreread less

Abstract: We address the problem of geo-registering ground-based multi-view stereo models by ground-to-aerial image matching. The main contribution is a fully automated geo-registration pipeline with a novel viewpoint-dependent matching method that handles ground to aerial viewpoint variation. We conduct large-scale experiments which consist of many popular outdoor landmarks in Rome. The proposed approach demonstrates a high success rate for the task, and dramatically outperforms state-of-the-art techniques, yielding geo-registration at pixel-level accuracy.

...read moreread less

108 citations

Cites methods from "Distinctive Image Features from Sca..."

...The contrast mismatch has a significant effect in SIFT matching....
[...]
...[21] obtain best results to date using SIFT feature matching [18]....
[...]
...Invariant features (e.g., SIFT) are typically used to tackle viewpoint changes....
[...]
...As expected, SIFT and ASIFT are not capable of handling the drastic viewpoint changes (See Fig....
[...]
...Shan et al. [21] obtain best results to date using SIFT feature matching [18]....
[...]

Journal Article•DOI•

Characterization and Analysis of Edges Using the Continuous Shearlet Transform

[...]

Kanghui Guo, Demetrio Labate

01 Jul 2009-Siam Journal on Imaging Sciences

TL;DR: The continuous shearlet transform, a novel directional multiscale transform recently introduced by the authors and their collaborators, provides a precise geometrical characterization for the boundary curves of very general planar regions.

...read moreread less

Abstract: This paper shows that the continuous shearlet transform, a novel directional multiscale transform recently introduced by the authors and their collaborators, provides a precise geometrical characterization for the boundary curves of very general planar regions. This study is motivated by imaging applications, where such boundary curves represent edges of images. The shearlet approach is able to characterize both locations and orientations of the edge points, including corner points and junctions, where the edge curves exhibit abrupt changes in tangent or curvature. Our results encompass and greatly extend previous results based on the shearlet and curvelet transforms which were limited to very special cases such as polygons and smooth boundary curves with nonvanishing curvature.

...read moreread less

107 citations

Cites background from "Distinctive Image Features from Sca..."

...Indeed such points (frequently indicated as corner points or junctions) usually provide the most conspicuous and useful features for many algorithms of edge analysis and feature extraction [17, 12]....
[...]

Journal Article•DOI•

Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition

[...]

Shuqiang Jiang¹, Weiqing Min¹, Linhu Liu¹, Zhengdong Luo¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2020-IEEE Transactions on Image Processing

TL;DR: A multi-scale multi-view feature aggregation (MSMVFA) scheme that can aggregate high-level semantic features, mid-level attribute features, and deep visual features into a unified representation for food recognition and achieves state-of-the-art recognition performance on three popular large-scale food benchmark datasets.

...read moreread less

Abstract: Recently, food recognition has received more and more attention in image processing and computer vision for its great potential applications in human health. Most of the existing methods directly extracted deep visual features via convolutional neural networks (CNNs) for food recognition. Such methods ignore the characteristics of food images and are, thus, hard to achieve optimal recognition performance. In contrast to general object recognition, food images typically do not exhibit distinctive spatial arrangement and common semantic patterns. In this paper, we propose a multi-scale multi-view feature aggregation (MSMVFA) scheme for food recognition. MSMVFA can aggregate high-level semantic features, mid-level attribute features, and deep visual features into a unified representation. These three types of features describe the food image from different granularity. Therefore, the aggregated features can capture the semantics of food images with the greatest probability. For that solution, we utilize additional ingredient knowledge to obtain mid-level attribute representation via ingredient-supervised CNNs. High-level semantic features and deep visual features are extracted from class-supervised CNNs. Considering food images do not exhibit distinctive spatial layout in many cases, MSMVFA fuses multi-scale CNN activations for each type of features to make aggregated features more discriminative and invariable to geometrical deformation. Finally, the aggregated features are more robust, comprehensive, and discriminative via two-level fusion, namely multi-scale fusion for each type of features and multi-view aggregation for different types of features. In addition, MSMVFA is general and different deep networks can be easily applied into this scheme. Extensive experiments and evaluations demonstrate that our method achieves state-of-the-art recognition performance on three popular large-scale food benchmark datasets in Top-1 recognition accuracy. Furthermore, we expect this paper will further the agenda of food recognition in the community of image processing and computer vision.

...read moreread less

107 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
…
125
126
127
128
129
130
131
…
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations