Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

A Siamese Long Short-Term Memory Architecture for Human Re-Identification

[...]

Rahul Rama Varior¹, Bing Shuai¹, Jiwen Lu¹, Dong Xu², Gang Wang³ - Show less +1 more•Institutions (3)

Nanyang Technological University¹, Tsinghua University², University of Sydney³

28 Jul 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel siamese Long Short-Term Memory (LSTM) architecture that can process image regions sequentially and enhance the discriminative capability of local feature representation by leveraging contextual information.

...read moreread less

Abstract: Matching pedestrians across multiple camera views known as human re-identification (re-identification) is a challenging problem in visual surveillance. In the existing works concentrating on feature extraction, representations are formed locally and independent of other regions. We present a novel siamese Long Short-Term Memory (LSTM) architecture that can process image regions sequentially and enhance the discriminative capability of local feature representation by leveraging contextual information. The feedback connections and internal gating mechanism of the LSTM cells enable our model to memorize the spatial dependencies and selectively propagate relevant contextual information through the network. We demonstrate improved performance compared to the baseline algorithm with no LSTM units and promising results compared to state-of-the-art methods on Market-1501, CUHK03 and VIPeR datasets. Visualization of the internal mechanism of LSTM cells shows meaningful patterns can be learned by our method.

...read moreread less

468 citations

Cites methods from "Distinctive Image Features from Sca..."

...Color histograms [33, 57, 64, 65], Local Binary Patterns [39,57], Color Names [59,68], Scale Invariant Feature Transforms [35,64,65] etc are commonly used features for re-identification in order to address the changes in view-point, illumination and pose....
[...]

Journal Article•DOI•

Protein-retention expansion microscopy of cells and tissues labeled using standard fluorescent proteins and antibodies

[...]

Paul W. Tillberg¹, Fei Chen¹, Kiryl D. Piatkevich¹, Yongxin Zhao¹, Chih-Chieh Yu¹, Brian P. English², Linyi Gao¹, Anthony J Martorell¹, Ho-Jun Suk¹, Fumiaki Yoshida¹, Fumiaki Yoshida³, Ellen M DeGennaro¹, Douglas H. Roossien⁴, Guanyu Gong¹, Uthpala Seneviratne¹, Steven R. Tannenbaum¹, Robert Desimone¹, Dawen Cai⁴, Edward S. Boyden¹ - Show less +15 more•Institutions (4)

Massachusetts Institute of Technology¹, Howard Hughes Medical Institute², Osaka University³, University of Michigan⁴

01 Sep 2016-Nature Biotechnology

TL;DR: ProExM as mentioned in this paper is a variant of ExM in which proteins are anchored to the swellable gel, allowing the use of conventional fluorescently labeled antibodies and streptavidin, and fluorescent proteins.

...read moreread less

Abstract: Expansion microscopy (ExM) enables imaging of preserved specimens with nanoscale precision on diffraction-limited instead of specialized super-resolution microscopes. ExM works by physically separating fluorescent probes after anchoring them to a swellable gel. The first ExM method did not result in the retention of native proteins in the gel and relied on custom-made reagents that are not widely available. Here we describe protein retention ExM (proExM), a variant of ExM in which proteins are anchored to the swellable gel, allowing the use of conventional fluorescently labeled antibodies and streptavidin, and fluorescent proteins. We validated and demonstrated the utility of proExM for multicolor super-resolution (∼70 nm) imaging of cells and mammalian tissues on conventional microscopes.

...read moreread less

458 citations

Neural Codes for Image Retrieval

[...]

David Stutz

01 Jan 2015

TL;DR: A thorough discussion of several state-of-the-art techniques in image retrieval by considering the associated subproblems: image description, descriptor compression, nearest-neighbor search and query expansion, and the combined use of deep architectures and hand-crafted image representations for accurate and efficient image retrieval.

...read moreread less

Abstract: This seminar report focuses on using convolutional neural networks for image retrieval. Firstly, we give a thorough discussion of several state-of-the-art techniques in image retrieval by considering the associated subproblems: image description, descriptor compression, nearest-neighbor search and query expansion. We discuss both the aggregation of local descriptors using clustering and metric learning techniques as well as global descriptors. Subsequently, we briefly introduce the basic concepts of deep convolutional neural networks, focusing on the architecture proposed by Krizhevsky et al. [KSH12]. We discuss different types of layers commonly used in recent architectures, for example convolutional layers, non-linearity and rectification layers, pooling layers as well as local contrast normalization layers. Finally, we shortly review supervised training techniques based on stochastic gradient descent and regularization techniques such as dropout and weight decay. Finally, following Babenko et al. [BSCL14], we discuss the use of feature activations in intermediate layers as image representation for image retrieval. After presenting experiments and comparing convolutional neural networks for image retrieval with other state-of-the-art techniques, we conclude by motivating the combined use of deep architectures and hand-crafted image representations for accurate and efficient image retrieval.

...read moreread less

457 citations

Journal Article•DOI•

Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization

[...]

Torsten Sattler¹, Bastian Leibe², Leif Kobbelt²•Institutions (2)

ETH Zurich¹, RWTH Aachen University²

01 Sep 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents an approach for large scale image-based localization that is both efficient and effective and demonstrates that it offers the best combination of efficiency and effectiveness among current state-of-the-art approaches for localization.

...read moreread less

Abstract: Accurately determining the position and orientation from which an image was taken, i.e., computing the camera pose, is a fundamental step in many Computer Vision applications. The pose can be recovered from 2D-3D matches between 2D image positions and points in a 3D model of the scene. Recent advances in Structure-from-Motion allow us to reconstruct large scenes and thus create the need for image-based localization methods that efficiently handle large-scale 3D models while still being effective, i.e., while localizing as many images as possible. This paper presents an approach for large scale image-based localization that is both efficient and effective. At the core of our approach is a novel prioritized matching step that enables us to first consider features more likely to yield 2D-to-3D matches and to terminate the correspondence search as soon as enough matches have been found. Matches initially lost due to quantization are efficiently recovered by integrating 3D-to-2D search. We show how visibility information from the reconstruction process can be used to improve the efficiency of our approach. We evaluate the performance of our method through extensive experiments and demonstrate that it offers the best combination of efficiency and effectiveness among current state-of-the-art approaches for localization.

...read moreread less

455 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...In this paper, we rely on SIFT descriptors [25] and use a common quantization that stores each descriptor entry using a 1 byte integer value....
[...]
...is passed [25], where t is typically from the range 1⁄20:6; 0:8 ....
[...]
...All query images have a maximum size of 1,600 1,600 pixels and come with pre-computed SIFT descriptors [25]....
[...]
...Lowe’s ratio test [25], used to detect ambiguous correspondences, rejects more and more correct matches for larger models [26]....
[...]

Journal Article•DOI•

A review of semantic segmentation using deep neural networks

[...]

Yanming Guo¹, Yu Liu¹, Theodoros Georgiou¹, Michael S. Lew¹•Institutions (1)

Leiden University¹

01 Jun 2018-International Journal of Multimedia Information Retrieval

TL;DR: The field of semantic segmentation as pertaining to deep convolutional neural networks is reviewed and comprehensive coverage of the top approaches is provided and the strengths, weaknesses and major challenges are summarized.

...read moreread less

Abstract: During the long history of computer vision, one of the grand challenges has been semantic segmentation which is the ability to segment an unknown image into different parts and objects (e.g., beach, ocean, sun, dog, swimmer). Furthermore, segmentation is even deeper than object recognition because recognition is not necessary for segmentation. Specifically, humans can perform image segmentation without even knowing what the objects are (for example, in satellite imagery or medical X-ray scans, there may be several objects which are unknown, but they can still be segmented within the image typically for further investigation). Performing segmentation without knowing the exact identity of all objects in the scene is an important part of our visual understanding process which can give us a powerful model to understand the world and also be used to improve or augment existing computer vision techniques. Herein this work, we review the field of semantic segmentation as pertaining to deep convolutional neural networks. We provide comprehensive coverage of the top approaches and summarize the strengths, weaknesses and major challenges.

...read moreread less

451 citations

Cites methods from "Distinctive Image Features from Sca..."

...Although the CNN feature has been repeatedly shown to give higher performance as compared to conventional hand-crafted features like SIFT [28] and HOG [29], it is not specifically designed for the image segmentation task....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
…
14
15
16
17
18
19
20
…
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations