Object recognition from local scale-invariant features

doi:10.1109/ICCV.1999.790410

Home
/
Papers
/
Object recognition from local scale-invariant features

Proceedings Article•DOI•

Object recognition from local scale-invariant features

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999-Vol. 2, pp 1150-1157

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

read less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Cachier: Edge-Caching for Recognition Applications

[...]

Utsav Drolia¹, Katherine Guo², Jiaqi Tan², Rajeev Gandhi², Priya Narasimhan² - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Bell Labs²

01 Jun 2017

TL;DR: This work proposes Cachier, a system that uses the caching model along with novel optimizations to minimize latency by adaptively balancing load between the edge and the cloud, by leveraging spatiotemporal locality of requests, using offline analysis of applications, and online estimates of network conditions.

...read moreread less

Abstract: Recognition and perception based mobile applications, such as image recognition, are on the rise. These applications recognize the user's surroundings and augment it with information and/or media. These applications are latency-sensitive. They have a soft-realtime nature - late results are potentially meaningless. On the one hand, given the compute-intensive nature of the tasks performed by such applications, execution is typically offloaded to the cloud. On the other hand, offloading such applications to the cloud incurs network latency, which can increase the user-perceived latency. Consequently, edge computing has been proposed to let devices offload intensive tasks to edge servers instead of the cloud, to reduce latency. In this paper, we propose a different model for using edge servers. We propose to use the edge as a specialized cache for recognition applications and formulate the expected latency for such a cache. We show that using an edge server like a typical web cache, for recognition applications, can lead to higher latencies. We propose Cachier, a system that uses the caching model along with novel optimizations to minimize latency by adaptively balancing load between the edge and the cloud, by leveraging spatiotemporal locality of requests, using offline analysis of applications, and online estimates of network conditions. We evaluate Cachier for image-recognition applications and show that our techniques yield 3x speedup in responsiveness, and perform accurately over a range of operating conditions. To the best of our knowledge, this is the first work that models edge servers as caches for compute-intensive recognition applications, and Cachier is the first system that uses this model to minimize latency for these applications.

...read moreread less

135 citations

Cites background from "Object recognition from local scale..."

...Some notable local features are SIFT [28], SURF [13] and ORB [34]....
[...]

Posted Content•

A Survey of Structure from Motion

[...]

Onur Ozyesil, Vladislav Voroninski, Ronen Basri¹, Amit Singer²•Institutions (2)

Weizmann Institute of Science¹, Princeton University²

30 Jan 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors provide a detailed account of some of the recent camera location estimation methods in the literature, followed by discussion of notable techniques for $3$D structure recovery.

...read moreread less

Abstract: The structure from motion (SfM) problem in computer vision is the problem of recovering the three-dimensional ($3$D) structure of a stationary scene from a set of projective measurements, represented as a collection of two-dimensional ($2$D) images, via estimation of motion of the cameras corresponding to these images. In essence, SfM involves the three main stages of (1) extraction of features in images (e.g., points of interest, lines, etc.) and matching these features between images, (2) camera motion estimation (e.g., using relative pairwise camera positions estimated from the extracted features), and (3) recovery of the $3$D structure using the estimated motion and features (e.g., by minimizing the so-called reprojection error). This survey mainly focuses on relatively recent developments in the literature pertaining to stages (2) and (3). More specifically, after touching upon the early factorization-based techniques for motion and structure estimation, we provide a detailed account of some of the recent camera location estimation methods in the literature, followed by discussion of notable techniques for $3$D structure recovery. We also cover the basics of the simultaneous localization and mapping (SLAM) problem, which can be viewed as a specific case of the SfM problem. Further, our survey includes a review of the fundamentals of feature extraction and matching (i.e., stage (1) above), various recent methods for handling ambiguities in $3$D scenes, SfM techniques involving relatively uncommon camera models and image features, and popular sources of data and SfM software.

...read moreread less

135 citations

Posted Content•

Invariant Representations without Adversarial Training

[...]

Daniel Moyer, Shuyang Gao, Rob Brekelmans, Greg Ver Steeg, Aram Galstyan - Show less +1 more

24 May 2018-arXiv: Learning

TL;DR: The authors show that adversarial training is unnecessary and sometimes counter-productive, and instead cast invariant representation learning as a single information-theoretic objective that can be directly optimized.

...read moreread less

Abstract: Representations of data that are invariant to changes in specified factors are useful for a wide range of problems: removing potential biases in prediction problems, controlling the effects of covariates, and disentangling meaningful factors of variation. Unfortunately, learning representations that exhibit invariance to arbitrary nuisance factors yet remain useful for other tasks is challenging. Existing approaches cast the trade-off between task performance and invariance in an adversarial way, using an iterative minimax optimization. We show that adversarial training is unnecessary and sometimes counter-productive; we instead cast invariant representation learning as a single information-theoretic objective that can be directly optimized. We demonstrate that this approach matches or exceeds performance of state-of-the-art adversarial approaches for learning fair representations and for generative modeling with controllable transformations.

...read moreread less

135 citations

Journal Article•DOI•

Status quo and open challenges in vision-based sensing and tracking of temporary resources on infrastructure construction sites

[...]

Jochen Teizer

01 Apr 2015-Advanced Engineering Informatics

TL;DR: An overview to vision-based sensing technology available for temporary resource tracking at infrastructure construction sites is presented and the status quo of research applications is provided by highlighting exemplary case.

...read moreread less

134 citations

Journal Article•DOI•

Keyframe-based monocular SLAM

[...]

Georges Younes¹, Daniel Asmar², Elie Shammas², John Zelek¹•Institutions (2)

University of Waterloo¹, American University of Beirut²

01 Dec 2017-Robotics and Autonomous Systems

TL;DR: This paper presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution.

...read moreread less

134 citations

Cites background from "Object recognition from local scale..."

...For higher order feature descriptors —such as ORB, SIFT, or SURF—the L1-norm, the L2-norm, or Hamming distances may be used; however, establishing matches using these measures is computationally intensive and may, if not carefully applied, degrade real-time performance....
[...]
...While the KD-tree is meant to accelerate SIFT feature matching, updating it with new features is computationally intensive....
[...]
...The system extracts phonySIFT descriptors as described in Wagner et al. (2010) and establishes feature correspondences using an accelerated matching method through hierarchical k-means....
[...]
...The first step is an offline procedure during which, SIFT features are780 extracted and triangulated using SfM methods into 3D landmarks....
[...]
...In the first pass, SIFT features from the current frame are matched to the selected keyframes, using the offline-generated vocabulary790 tree....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
…
135
136
137
138
139
140
141
…
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Color indexing

[...]

Michael J. Swain, Dana H. Ballard

01 Nov 1991-International Journal of Computer Vision

TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.

...read moreread less

Abstract: Computer vision is moving into a new era in which the aim is to develop visual skills for robots that allow them to interact with a dynamic, unconstrained environment. To achieve this aim, new kinds of vision algorithms need to be developed which run in real time and subserve the robot's goals. Two fundamental goals are determining the identity of an object with a known location, and determining the location of a known object. Color can be successfully used for both tasks. This dissertation demonstrates that color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models. It shows that color histograms are stable object representations in the presence of occlusion and over change in view, and that they can differentiate among a large number of objects. For solving the identification problem, it introduces a technique called Histogram Intersection, which matches model and image histograms and a fast incremental version of Histogram Intersection which allows real-time indexing into a large database of stored models. It demonstrates techniques for dealing with crowded scenes and with models with similar color signatures. For solving the location problem it introduces an algorithm called Histogram Backprojection which performs this task efficiently in crowded scenes.

...read moreread less

5,672 citations

Journal Article•DOI•

Generalizing the hough transform to detect arbitrary shapes

[...]

Dana H. Ballard¹•Institutions (1)

University of Rochester¹

01 Jan 1987-Pattern Recognition

TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

...read moreread less

4,310 citations

Journal Article•DOI•

Visual learning and recognition of 3-D objects from appearance

[...]

Hiroshi Murase, Shree K. Nayar¹•Institutions (1)

Columbia University¹

01 Jan 1995-International Journal of Computer Vision

TL;DR: A near real-time recognition system with 20 complex objects in the database has been developed and a compact representation of object appearance is proposed that is parametrized by pose and illumination.

...read moreread less

Abstract: The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image. A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.

...read moreread less

2,037 citations

Journal Article•DOI•

Local grayvalue invariants for image retrieval

[...]

Cordelia Schmid, Roger Mohr

01 May 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.

...read moreread less

Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

...read moreread less

1,756 citations

"Object recognition from local scale..." refers background or methods in this paper

...This allows for the use of more distinctive image descriptors than the rotation-invariant ones used by Schmid and Mohr, and the descriptor is further modified to improve its stability to changes in affine projection and illumination....
[...]
...For the object recognition problem, Schmid & Mohr [19] also used the Harris corner detector to identify interest points, and then created a local image descriptor at each interest point from an orientation-invariant vector of derivative-of-Gaussian image measurements....
[...]
..., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....
[...]
...However, recent research on the use of dense local features (e.g., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....
[...]

Journal Article•DOI•

A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry

[...]

Zhengyou Zhang, Rachid Deriche, Olivier Faugeras, Quang-Tuan Luong

15 Oct 1995-Artificial Intelligence

TL;DR: A robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint, is proposed and a new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity.

...read moreread less

1,574 citations