Object recognition from local scale-invariant features

doi:10.1109/ICCV.1999.790410

Home
/
Papers
/
Object recognition from local scale-invariant features

Proceedings Article•DOI•

Object recognition from local scale-invariant features

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999-Vol. 2, pp 1150-1157

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

read less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Feature detection on 3D face surfaces for pose normalisation and recognition

[...]

Chris Maes¹, Thomas Fabry¹, Johannes Keustermans¹, Dirk Smeets¹, Paul Suetens¹, Dirk Vandermeulen¹ - Show less +2 more•Institutions (1)

Katholieke Universiteit Leuven¹

11 Nov 2010

TL;DR: A SIFT algorithm adapted for 3D surfaces (called meshSIFT) and its applications to 3D face pose normalisation and recognition that outperform most other algorithms found in literature.

...read moreread less

Abstract: This paper presents a SIFT algorithm adapted for 3D surfaces (called meshSIFT) and its applications to 3D face pose normalisation and recognition. The algorithm allows reliable detection of scale space extrema as local feature locations. The scale space contains the mean curvature in each vertex on different smoothed versions of the input mesh. The meshSIFT algorithm then describes the neighbourhood of every scale space extremum in a feature vector consisting of concatenated histograms of shape indices and slant angles. The feature vectors are reliably matched by comparing the angle in feature space. Using RANSAC, the best rigid transformation can be estimated based on the matched features leading to 84% correct pose normalisation of 3D faces from the Bosphorus database. Matches are mostly found between two face surfaces of the same person, allowing the algorithm to be used for 3D face recognition. Simply counting the number of matches allows 93.7% correct identification for face surfaces in the Bosphorus database and 97.7% when only frontal images are considered. In the verification scenario, we obtain an equal error rate of 15.0% to 5.1% (depending on the investigated face surfaces). These results outperform most other algorithms found in literature.

...read moreread less

122 citations

Cites methods from "Object recognition from local scale..."

...This ratio was determined emperically, and is the same as in the original SIFT algorithm [20]....
[...]
...The use of meshSIFT for 3D face recognition is a natural way to compare faces based on characteristic features in the human face [20]....
[...]

Proceedings Article•DOI•

Menu-Match: Restaurant-Specific Food Logging from Images

[...]

Oscar Beijbom¹, Neel Joshi², Dan Morris³, Scott Saponas³, Siddharth Khullar¹ - Show less +1 more•Institutions (3)

University of California, San Diego¹, Wyss Institute for Biologically Inspired Engineering², Microsoft³

05 Jan 2015

TL;DR: This work presents an automated computer vision system for logging food and calorie intake using images and introduces a key insight that addresses this problem specifically: restaurant plates are often both nutritionally and visually consistent across many servings.

...read moreread less

Abstract: Logging food and calorie intake has been shown to facilitate weight management. Unfortunately, current food logging methods are time-consuming and cumbersome, which limits their effectiveness. To address this limitation, we present an automated computer vision system for logging food and calorie intake using images. We focus on the "restaurant" scenario, which is often a challenging aspect of diet management. We introduce a key insight that addresses this problem specifically: restaurant plates are often both nutritionally and visually consistent across many servings. This insight provides a path to robust calorie estimation from a single RGB photograph: using a database of known food items together with restaurant-specific classifiers, calorie estimation can be achieved through identification followed by calorie lookup. As demonstrated on a challenging Menu-Match dataset and an existing third party dataset, our approach outperforms previous computer vision methods and a commercial calorie estimation app. Our Menu-Match dataset of realistic restaurant meals is made publicly available.

...read moreread less

122 citations

Cites background from "Object recognition from local scale..."

...The gradient-based HOG [11] and SIFT [24] that are widely used for object recognition are weaker, supporting the intuition that texture and color are the most useful features to describe food images....
[...]
...For SIFT they use sparse coding, and mean pooling across the whole images plane....
[...]
...The improved pooling and encoding scheme may explain why our SIFT descriptor is significantly stronger....
[...]
...The SIFT base feature was extracted at patch sizes of 8, 16, and 24 pixels at each location in the image....
[...]
...In the first step, five types of base features are extracted from the images: color [19], histogram of oriented gradients (HOG) [11], scale-invariant feature transforms (SIFT) [24], local binary patterns (LBP) [27], and filter responses from the MR8 filter bank [33]....
[...]

Proceedings Article•DOI•

Classification of weather situations on single color images

[...]

Martin Roser, Frank Moosmann

04 Jun 2008

TL;DR: This work presents an approach that is able to distinguish between multiple weather situations based on the classification of single monocular color images, without any additional assumptions or prior knowledge.

...read moreread less

Abstract: Present vision based driver assistance systems are designed to perform under good-natured weather conditions. However, limited visibility caused by heavy rain or fog strongly affects vision systems. To improve machine vision in bad weather situations, a reliable detection system is necessary as a ground base. We present an approach that is able to distinguish between multiple weather situations based on the classification of single monocular color images, without any additional assumptions or prior knowledge. The proposed image descriptor clearly outperforms existing descriptors for that task. Experimental results on real traffic images are characterized by high accuracy, efficiency, and versatility with respect to driver assistance systems.

...read moreread less

122 citations

Cites background from "Object recognition from local scale..."

...In oder to benchmark its performance, we additionally extracted color wavelets as well as a combination of SIFT features and color histograms and compared the classification results....
[...]
...Different kinds of local features have been proposed with histogram-based features like SIFT [12], HOG [3], and shape context [1] being among the most discriminant....
[...]

Proceedings Article•DOI•

Robot vision for the visually impaired

[...]

Vivek Pradeep¹, Gerard Medioni¹, James D. Weiland¹•Institutions (1)

University of Southern California¹

01 Jan 2010

TL;DR: A head-mounted, stereo-vision based navigational assistance device for the visually impaired that enables subjects to stand and scan the scene for integrating wide-field information, compared to shoulder or waist-mounted designs in literature which require body rotations.

...read moreread less

Abstract: We present a head-mounted, stereo-vision based navigational assistance device for the visually impaired. The head-mounted design enables our subjects to stand and scan the scene for integrating wide-field information, compared to shoulder or waist-mounted designs in literature which require body rotations. In order to extract and maintain orientation information for creating a sense of egocentricity in blind users, we incorporate visual odometry and feature based metric-topological SLAM into our system. Using camera pose estimates with dense 3D data obtained from stereo triangulation, we build a vicinity map of the user's environment. On this map, we perform 3D traversability analysis to steer subjects away from obstacles in the path. A tactile interface consisting of microvibration motors provides cues for taking evasive action, as determined by our vision processing algorithms. We report experimental results of our system (running at 10 Hz) and conduct mobility tests with blindfolded subjects to demonstrate the usefulness of our approach over conventional navigational aids like the white cane.

...read moreread less

121 citations

Cites methods from "Object recognition from local scale..."

...The local submap level estimates state information corresponding to the six dimensional camera trajectory st and sparse map mt, given feature observations (KLT/SIFT) zt and camera motion estimates ut collected until the current time t....
[...]
...The SLAM implementation is a Rao-Blackwellised particle filter (RBPF) [25] in a FastSLAM [24, 23] framework using a combination of KLT [20] and SIFT [19] tracking to solve for data association....
[...]
...KLT [20] and SIFT [19] tracking to solve for data association....
[...]

Book Chapter•DOI•

Hand posture recognition using adaboost with SIFT for human robot interaction

[...]

Chieh-Chih Wang¹, Ko-Chih Wang¹•Institutions (1)

National Taiwan University¹

01 Jan 2007

TL;DR: A hand posture recognition system using the discrete Adaboost learning algorithm with Lowe’s scale invariant feature transform (SIFT) features is proposed to tackle the degraded performance due to background noise in training images and the in-plane rotation variant detection.

...read moreread less

Abstract: Hand posture understanding is essential to human robot interaction. The existing hand detection approaches using a Viola-Jones detector have two fundamental issues, the degraded performance due to background noise in training images and the in-plane rotation variant detection. In this paper, a hand posture recognition system using the discrete Adaboost learning algorithm with Lowe’s scale invariant feature transform (SIFT) features is proposed to tackle these issues simultaneously. In addition, we apply a sharing feature concept to increase the accuracy of multi-class hand posture recognition. The experimental results demonstrate that the proposed approach successfully recognizes three hand posture classes and can deal with the background noise issues. Our detector is in-plane rotation invariant, and achieves satisfactory multi-view hand detection.

...read moreread less

121 citations

Cites methods from "Object recognition from local scale..."

...Lowe also provided a matching algorithm for recognize the same object in different images....
[...]
...The Scale Invariant Feature Transform (SIFT) feature introduced by Lowe [7] consists of a histogram representing gradient orientation and magnitude information within a small image patch....
[...]
...1 SIFT The Scale Invariant Feature Transform (SIFT) feature introduced by Lowe [7] consists of a histogram representing gradient orientation and magnitude information within a small image patch....
[...]
...In this paper, a hand posture recognition system using the discrete Adaboost learning algorithm with Lowe’s scale invariant feature transform (SIFT) features is proposed to tackle these issues simultaneously....
[...]
...In this paper, a discrete Adaboost learning algorithm with Lowe’s SIFT features [8] is proposed and applied to achieve inplane rotation invariant hand detection....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
…
150
151
152
153
154
155
156
…
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Color indexing

[...]

Michael J. Swain, Dana H. Ballard

01 Nov 1991-International Journal of Computer Vision

TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.

...read moreread less

Abstract: Computer vision is moving into a new era in which the aim is to develop visual skills for robots that allow them to interact with a dynamic, unconstrained environment. To achieve this aim, new kinds of vision algorithms need to be developed which run in real time and subserve the robot's goals. Two fundamental goals are determining the identity of an object with a known location, and determining the location of a known object. Color can be successfully used for both tasks. This dissertation demonstrates that color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models. It shows that color histograms are stable object representations in the presence of occlusion and over change in view, and that they can differentiate among a large number of objects. For solving the identification problem, it introduces a technique called Histogram Intersection, which matches model and image histograms and a fast incremental version of Histogram Intersection which allows real-time indexing into a large database of stored models. It demonstrates techniques for dealing with crowded scenes and with models with similar color signatures. For solving the location problem it introduces an algorithm called Histogram Backprojection which performs this task efficiently in crowded scenes.

...read moreread less

5,672 citations

Journal Article•DOI•

Generalizing the hough transform to detect arbitrary shapes

[...]

Dana H. Ballard¹•Institutions (1)

University of Rochester¹

01 Jan 1987-Pattern Recognition

TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

...read moreread less

4,310 citations

Journal Article•DOI•

Visual learning and recognition of 3-D objects from appearance

[...]

Hiroshi Murase, Shree K. Nayar¹•Institutions (1)

Columbia University¹

01 Jan 1995-International Journal of Computer Vision

TL;DR: A near real-time recognition system with 20 complex objects in the database has been developed and a compact representation of object appearance is proposed that is parametrized by pose and illumination.

...read moreread less

Abstract: The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image. A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.

...read moreread less

2,037 citations

Journal Article•DOI•

Local grayvalue invariants for image retrieval

[...]

Cordelia Schmid, Roger Mohr

01 May 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.

...read moreread less

Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

...read moreread less

1,756 citations

"Object recognition from local scale..." refers background or methods in this paper

...This allows for the use of more distinctive image descriptors than the rotation-invariant ones used by Schmid and Mohr, and the descriptor is further modified to improve its stability to changes in affine projection and illumination....
[...]
...For the object recognition problem, Schmid & Mohr [19] also used the Harris corner detector to identify interest points, and then created a local image descriptor at each interest point from an orientation-invariant vector of derivative-of-Gaussian image measurements....
[...]
..., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....
[...]
...However, recent research on the use of dense local features (e.g., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....
[...]

Journal Article•DOI•

A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry

[...]

Zhengyou Zhang, Rachid Deriche, Olivier Faugeras, Quang-Tuan Luong

15 Oct 1995-Artificial Intelligence

TL;DR: A robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint, is proposed and a new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity.

...read moreread less

1,574 citations