Object recognition from local scale-invariant features

doi:10.1109/ICCV.1999.790410

Home
/
Papers
/
Object recognition from local scale-invariant features

Proceedings Article•DOI•

Object recognition from local scale-invariant features

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999-Vol. 2, pp 1150-1157

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

read less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Local neighborhood difference pattern: A new feature descriptor for natural and texture image retrieval

[...]

Manisha Verma¹, Balasubramanian Raman²•Institutions (2)

Indian Institute of Technology Gandhinagar¹, Indian Institute of Technology Roorkee²

01 May 2018-Multimedia Tools and Applications

TL;DR: A new image retrieval technique using local neighborhood difference pattern (LNDP) has been proposed for local features and shows a significant improvement in the proposed method over existing methods.

...read moreread less

Abstract: A new image retrieval technique using local neighborhood difference pattern (LNDP) has been proposed for local features. The conventional local binary pattern (LBP) transforms every pixel of image into a binary pattern based on their relationship with neighboring pixels. The proposed feature descriptor differs from local binary pattern as it transforms the mutual relationship of all neighboring pixels in a binary pattern. Both LBP and LNDP are complementary to each other as they extract different information using local pixel intensity. In the proposed work, both LBP and LNDP features are combined to extract the most of the information that can be captured using local intensity differences. To prove the excellence of the proposed method, experiments have been conducted on four different databases of texture images and natural images. The performance has been observed using well-known evaluation measures, precision and recall and compared with some state-of-art local patterns. Comparison shows a significant improvement in the proposed method over existing methods.

...read moreread less

123 citations

Cites background from "Object recognition from local scale..."

...SIFT transforms image data into scale-invariant coordinates relative to local features [25]....
[...]
..., human detection [7], person re-identification [49, 50], object recognition [4, 25], etc....
[...]

Proceedings Article•

Comparing image classification methods: K-nearest-neighbor and support-vector-machines

[...]

Jin-Ho Kim, Byung-soo Kim¹, Silvio Savarese¹•Institutions (1)

University of Michigan¹

25 Jan 2012

TL;DR: A general Bag of Words model is used in order to compare two different classification methods, both K-Nearest-Neighbor and Support-Vector-Machine, and it is observed that the SVM classifier outperformed the KNN classifier.

...read moreread less

Abstract: In order for a robot or a computer to perform tasks, it must recognize what it is looking at. Given an image a computer must be able to classify what the image represents. While this is a fairly simple task for humans, it is not an easy task for computers. Computers must go through a series of steps in order to classify a single image. In this paper, we used a general Bag of Words model in order to compare two different classification methods. Both K-Nearest-Neighbor (KNN) and Support-Vector-Machine (SVM) classification are well known and widely used. We were able to observe that the SVM classifier outperformed the KNN classifier. For future work, we hope to use more categories for the objects and to use more sophisticated classifiers.

...read moreread less

122 citations

Additional excerpts

...We extracted features using SIFT [8]....
[...]

Book Chapter•DOI•

Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding

[...]

Zhenheng Yang¹, Peng Wang², Yang Wang², Wei Xu, Ram Nevatia¹ - Show less +1 more•Institutions (2)

University of Southern California¹, Baidu²

08 Sep 2018

TL;DR: Experiments on KITTI 2015 dataset show that the estimated geometry, 3D motion and moving object masks, not only are constrained to be consistent, but also significantly outperforms other SOTA algorithms, demonstrating the benefits of the approach.

...read moreread less

Abstract: Learning to estimate 3D geometry in a single image by watching unlabeled videos via deep convolutional network has made significant process recently. Current state-of-the-art (SOTA) methods, are based on the learning framework of rigid structure-from-motion, where only 3D camera ego motion is modeled for geometry estimation. However, moving objects also exist in many videos, e.g. moving cars in a street scene. In this paper, we tackle such motion by additionally incorporating per-pixel 3D object motion into the learning framework, which provides holistic 3D scene flow understanding and helps single image geometry estimation. Specifically, given two consecutive frames from a video, we adopt a motion network to predict their relative 3D camera pose and a segmentation mask distinguishing moving objects and rigid background. An optical flow network is used to estimate dense 2D per-pixel correspondence. A single image depth network predicts depth maps for both images. The four types of information, i.e. 2D flow, camera pose, segment mask and depth maps, are integrated into a differentiable holistic 3D motion parser (HMP), where per-pixel 3D motion for rigid background and moving objects are recovered. We design various losses w.r.t. the two types of 3D motions for training the depth and motion networks, yielding further error reduction for estimated geometry. Finally, in order to solve the 3D motion confusion from monocular videos, we combine stereo images into joint training. Experiments on KITTI 2015 dataset show that our estimated geometry, 3D motion and moving object masks, not only are constrained to be consistent, but also significantly outperforms other SOTA algorithms, demonstrating the benefits of our approach.

...read moreread less

122 citations

Cites result from "Object recognition from local scale..."

...58,1,40] yields more robust matching and shows additional improvement on depth estimation. Structural matching has long been a center area for computer vision or optical ﬂow based on SIFT [59] or HOG [60] descriptors. Most recently, unsupervised learning of dense matching [8] using deep CNN which integrates local and global context achieves impressive results according to the KITTI benchmark 1. In our...
[...]

Proceedings Article•DOI•

Full orientation invariance and improved feature selectivity of 3D SIFT with application to medical image analysis

[...]

Stephane Allaire, J.J. Kim, Stephen Breen, D.A. Jaffray, Vladimir Pekar - Show less +1 more

23 Jun 2008

TL;DR: This paper presents a comprehensive extension of the Scale Invariant Feature Transform (SIFT), originally introduced in 2D, to volumetric images, and achieves, for the first time, full 3D orientation invariance of the descriptors, which is essential for 3D feature matching.

...read moreread less

Abstract: This paper presents a comprehensive extension of the Scale Invariant Feature Transform (SIFT), originally introduced in 2D, to volumetric images. While tackling the significant computational efforts required by such multiscale processing of large data volumes, our implementation addresses two important mathematical issues related to the 2D-to-3D extension. It includes efficient steps to filter out extracted point candidates that have low contrast or are poorly localized along edges or ridges. In addition, it achieves, for the first time, full 3D orientation invariance of the descriptors, which is essential for 3D feature matching. An application of this technique is demonstrated to the feature-based automated registration and segmentation of clinical datasets in the context of radiation therapy.

...read moreread less

122 citations

Cites methods from "Object recognition from local scale..."

...It can be used, for example, for feature-based image registration [1, 11], object recognition [9], image segmentation, atlas generation and variability analysis [14], and image retrieval in databases....
[...]
...Extending from [9], it is efficient to detect stable feature point locations in the 4D scale space using extrema out of the convolution of the difference-of-Gaussian (DoG) function with the image, D(x, y, z, kσ)....
[...]

Journal Article•DOI•

A Survey on Odometry for Autonomous Navigation Systems

[...]

Sherif A. S. Mohamed¹, Mohammad-Hashem Haghbayan¹, Tomi Westerlund¹, Jukka Heikkonen¹, Hannu Tenhunen¹, Juha Plosila¹ - Show less +2 more•Institutions (1)

University of Turku¹

16 Jul 2019-IEEE Access

TL;DR: A general and comprehensive overview of the state of the art in the field of self-contained, i.e., GPS denied odometry systems, and identifies the out-coming challenges that demand further research in future are provided.

...read moreread less

Abstract: The development of a navigation system is one of the major challenges in building a fully autonomous platform. Full autonomy requires a dependable navigation capability not only in a perfect situation with clear GPS signals but also in situations, where the GPS is unreliable. Therefore, self-contained odometry systems have attracted much attention recently. This paper provides a general and comprehensive overview of the state of the art in the field of self-contained, i.e., GPS denied odometry systems, and identifies the out-coming challenges that demand further research in future. Self-contained odometry methods are categorized into five main types, i.e., wheel, inertial, laser, radar, and visual, where such categorization is based on the type of the sensor data being used for the odometry. Most of the research in the field is focused on analyzing the sensor data exhaustively or partially to extract the vehicle pose. Different combinations and fusions of sensor data in a tightly/loosely coupled manner and with filtering or optimizing fusion method have been investigated. We analyze the advantages and weaknesses of each approach in terms of different evaluation metrics, such as performance, response time, energy efficiency, and accuracy, which can be a useful guideline for researchers and engineers in the field. In the end, some future research challenges in the field are discussed.

...read moreread less

122 citations

Cites methods from "Object recognition from local scale..."

...[64] N. M. Suaib, M. H. Marhaban, M. I. Saripan, and S. A. Ahmad, ‘‘Performance evaluation of feature detection and feature matching for stereo visual odometry using SIFT and SURF,’’ in Proc....
[...]
...[99] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, ‘‘Orb: An efficient alternative to SIFT or SURF,’’ in Proc....
[...]
...Consequently, several proposed methods are based on corners, for instance, Harris detector [53], SIFT [96], SURF [97], FAST [98], and ORB [99])....
[...]
...In [40], the amplitude gridmap accumulated from the radar scan is transformed into a grayscale image and then interesting points are detected using feature extraction techniques, e.g, SIFT....
[...]
...In addition, the feature detection and description are based on ORB rather than using a more robust but slow descriptor such as SIFT and FAST....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
…
149
150
151
152
153
154
155
…
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Color indexing

[...]

Michael J. Swain, Dana H. Ballard

01 Nov 1991-International Journal of Computer Vision

TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.

...read moreread less

Abstract: Computer vision is moving into a new era in which the aim is to develop visual skills for robots that allow them to interact with a dynamic, unconstrained environment. To achieve this aim, new kinds of vision algorithms need to be developed which run in real time and subserve the robot's goals. Two fundamental goals are determining the identity of an object with a known location, and determining the location of a known object. Color can be successfully used for both tasks. This dissertation demonstrates that color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models. It shows that color histograms are stable object representations in the presence of occlusion and over change in view, and that they can differentiate among a large number of objects. For solving the identification problem, it introduces a technique called Histogram Intersection, which matches model and image histograms and a fast incremental version of Histogram Intersection which allows real-time indexing into a large database of stored models. It demonstrates techniques for dealing with crowded scenes and with models with similar color signatures. For solving the location problem it introduces an algorithm called Histogram Backprojection which performs this task efficiently in crowded scenes.

...read moreread less

5,672 citations

Journal Article•DOI•

Generalizing the hough transform to detect arbitrary shapes

[...]

Dana H. Ballard¹•Institutions (1)

University of Rochester¹

01 Jan 1987-Pattern Recognition

TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

...read moreread less

4,310 citations

Journal Article•DOI•

Visual learning and recognition of 3-D objects from appearance

[...]

Hiroshi Murase, Shree K. Nayar¹•Institutions (1)

Columbia University¹

01 Jan 1995-International Journal of Computer Vision

TL;DR: A near real-time recognition system with 20 complex objects in the database has been developed and a compact representation of object appearance is proposed that is parametrized by pose and illumination.

...read moreread less

Abstract: The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image. A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.

...read moreread less

2,037 citations

Journal Article•DOI•

Local grayvalue invariants for image retrieval

[...]

Cordelia Schmid, Roger Mohr

01 May 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.

...read moreread less

Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

...read moreread less

1,756 citations

"Object recognition from local scale..." refers background or methods in this paper

...This allows for the use of more distinctive image descriptors than the rotation-invariant ones used by Schmid and Mohr, and the descriptor is further modified to improve its stability to changes in affine projection and illumination....
[...]
...For the object recognition problem, Schmid & Mohr [19] also used the Harris corner detector to identify interest points, and then created a local image descriptor at each interest point from an orientation-invariant vector of derivative-of-Gaussian image measurements....
[...]
..., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....
[...]
...However, recent research on the use of dense local features (e.g., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....
[...]

Journal Article•DOI•

A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry

[...]

Zhengyou Zhang, Rachid Deriche, Olivier Faugeras, Quang-Tuan Luong

15 Oct 1995-Artificial Intelligence

TL;DR: A robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint, is proposed and a new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity.

...read moreread less

1,574 citations