Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Local Intensity Order Pattern for feature description

[...]

Zhenhua Wang¹, Bin Fan¹, Fuchao Wu¹•Institutions (1)

Chinese Academy of Sciences¹

06 Nov 2011

TL;DR: It is shown that the proposed descriptor is not only invariant to monotonic intensity changes and image rotation but also robust to many other geometric and photometric transformations such as viewpoint change, image blur and JEPG compression.

...read moreread less

Abstract: This paper presents a novel method for feature description based on intensity order. Specifically, a Local Intensity Order Pattern(LIOP) is proposed to encode the local ordinal information of each pixel and the overall ordinal information is used to divide the local patch into subregions which are used for accumulating the LIOPs respectively. Therefore, both local and overall intensity ordinal information of the local patch are captured by the proposed LIOP descriptor so as to make it a highly discriminative descriptor. It is shown that the proposed descriptor is not only invariant to monotonic intensity changes and image rotation but also robust to many other geometric and photometric transformations such as viewpoint change, image blur and JEPG compression. The proposed descriptor has been evaluated on the standard Oxford dataset and four additional image pairs with complex illumination changes. The experimental results show that the proposed descriptor obtains a significant improvement over the existing state-of-the-art descriptors.

...read moreread less

340 citations

Cites background or methods or result from "Distinctive Image Features from Sca..."

...In our experiments (Intel Core2 Quad CPU 2.83GHz), the average time for constructing a feature descriptor is: 2.1ms for SIFT, 3.8ms for DAISY, 5.3ms for HRI-CSLTP and 5.5ms for LIOP....
[...]
...It is worth noting that different from other methods such as [13, 11, 22, 9], we do not rotate the local patch according to the local consistent orientation(e....
[...]
...For example, Harris corner [10] and DoG(Difference of Gaussian) [13] for interest point detection, and Harris-affine [15], Hessian-affine [17], MSER(Maximally Stable Extremal Region) [14], IBR(Intensity-Based Region) and EBR(EdgeBased Region) [24] for affine covariant region detection....
[...]
...More recently, the local binary pattern [19] based methods are proposed and achieve high performance comparable to SIFT....
[...]
...Local image features have been widely used in many computer vision applications such as object recognition [13] , texture recognition [12], wide baseline matching [24], image retrieval [18] and panoramic image stitching [3]....
[...]

Journal Article•DOI•

Information Forensics: An Overview of the First Decade

[...]

Matthew C. Stamm¹, Min Wu¹, K. J. Ray Liu¹•Institutions (1)

University of Maryland, College Park¹

10 May 2013-IEEE Access

TL;DR: An overview on what has been done over the last decade in the new and emerging field of information forensics regarding theories, methodologies, state-of-the-art techniques, major applications, and to provide an outlook of the future is provided.

...read moreread less

Abstract: In recent decades, we have witnessed the evolution of information technologies from the development of VLSI technologies, to communication and networking infrastructure, to the standardization of multimedia compression and coding schemes, to effective multimedia content search and retrieval. As a result, multimedia devices and digital content have become ubiquitous. This path of technological evolution has naturally led to a critical issue that must be addressed next, namely, to ensure that content, devices, and intellectual property are being used by authorized users for legitimate purposes, and to be able to forensically prove with high confidence when otherwise. When security is compromised, intellectual rights are violated, or authenticity is forged, forensic methodologies and tools are employed to reconstruct what has happened to digital content in order to answer who has done what, when, where, and how. The goal of this paper is to provide an overview on what has been done over the last decade in the new and emerging field of information forensics regarding theories, methodologies, state-of-the-art techniques, major applications, and to provide an outlook of the future.

...read moreread less

340 citations

Proceedings Article•DOI•

A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK

[...]

Shaharyar Ahmed Khan Tareen¹, Zahra Saleem²•Institutions (2)

University of the Sciences¹, Fatima Jinnah Women University²

03 Mar 2018

TL;DR: SIFT and BRISK are found to be the most accurate algorithms while ORB and BRK are most efficient and a benchmark for researchers, regardless of any particular area is set.

...read moreread less

Abstract: Image registration is the process of matching, aligning and overlaying two or more images of a scene, which are captured from different viewpoints. It is extensively used in numerous vision based applications. Image registration has five main stages: Feature Detection and Description; Feature Matching; Outlier Rejection; Derivation of Transformation Function; and Image Reconstruction. Timing and accuracy of feature-based Image Registration mainly depend on computational efficiency and robustness of the selected feature-detector-descriptor, respectively. Therefore, choice of feature-detector-descriptor is a critical decision in feature-matching applications. This article presents a comprehensive comparison of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK algorithms. It also elucidates a critical dilemma: Which algorithm is more invariant to scale, rotation and viewpoint changes? To investigate this problem, image matching has been performed with these features to match the scaled versions (5% to 500%), rotated versions (0° to 360°), and perspective-transformed versions of standard images with the original ones. Experiments have been conducted on diverse images taken from benchmark datasets: University of OXFORD, MATLAB, VLFeat, and OpenCV. Nearest-Neighbor-Distance-Ratio has been used as the feature-matching strategy while RANSAC has been applied for rejecting outliers and fitting the transformation models. Results are presented in terms of quantitative comparison, feature-detection-description time, feature-matching time, time of outlier-rejection and model fitting, repeatability, and error in recovered results as compared to the ground-truths. SIFT and BRISK are found to be the most accurate algorithms while ORB and BRISK are most efficient. The article comprises rich information that will be very useful for making important decisions in vision based applications and main aim of this work is to set a benchmark for researchers, regardless of any particular area.

...read moreread less

339 citations

Cites methods from "Distinctive Image Features from Sca..."

...Lowe for matching SIFT features in [13] and by K....
[...]
...In this article, SIFT (blobs) [13], SURF (blobs) [14], KAZE (blobs) [15], AKAZE (blobs) [16], ORB (corners) [17], and BRISK (corners) [18] algorithms are compared for image matching and registration....
[...]
...Lowe introduced Scale Invariant Feature Transform (SIFT) in 2004 [13], which is the most renowned featuredetection-description algorithm....
[...]

Journal Article•DOI•

Image completion using planar structure guidance

[...]

Jia-Bin Huang¹, Sing Bing Kang², Narendra Ahuja¹, Johannes Kopf²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Microsoft²

27 Jul 2014

TL;DR: This work proposes a method for automatically guiding patch-based image completion using mid-level structural cues by first estimates planar projection parameters, softly segments the known region into planes, and discovers translational regularity within these planes.

...read moreread less

Abstract: We propose a method for automatically guiding patch-based image completion using mid-level structural cues. Our method first estimates planar projection parameters, softly segments the known region into planes, and discovers translational regularity within these planes. This information is then converted into soft constraints for the low-level completion algorithm by defining prior probabilities for patch offsets and transformations. Our method handles multiple planes, and in the absence of any detected planes falls back to a baseline fronto-parallel image completion algorithm. We validate our technique through extensive comparisons with state-of-the-art algorithms on a variety of scenes.

...read moreread less

337 citations

Proceedings Article•DOI•

Multi-target tracking by on-line learned discriminative appearance models

[...]

Cheng-Hao Kuo¹, Chang Huang¹, Ramakant Nevatia¹•Institutions (1)

University of Southern California¹

13 Jun 2010

TL;DR: OLDAMs have significantly higher discrimination between different targets than conventional holistic color histograms, and when integrated into a hierarchical association framework, they help improve the tracking accuracy, particularly reducing the false alarms and identity switches.

...read moreread less

Abstract: We present an approach for online learning of discriminative appearance models for robust multi-target tracking in a crowded scene from a single camera. Although much progress has been made in developing methods for optimal data association, there has been comparatively less work on the appearance models, which are key elements for good performance. Many previous methods either use simple features such as color histograms, or focus on the discriminability between a target and the background which does not resolve ambiguities between the different targets. We propose an algorithm for learning a discriminative appearance model for different targets. Training samples are collected online from tracklets within a time sliding window based on some spatial-temporal constraints; this allows the models to adapt to target instances. Learning uses an Ad-aBoost algorithm that combines effective image descriptors and their corresponding similarity measurements. We term the learned models as OLDAMs. Our evaluations indicate that OLDAMs have significantly higher discrimination between different targets than conventional holistic color histograms, and when integrated into a hierarchical association framework, they help improve the tracking accuracy, particularly reducing the false alarms and identity switches.

...read moreread less

333 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
…
23
24
25
26
27
28
29
…
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations