Object recognition from local scale-invariant features

doi:10.1109/ICCV.1999.790410

Home
/
Papers
/
Object recognition from local scale-invariant features

Proceedings Article•DOI•

Object recognition from local scale-invariant features

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999-Vol. 2, pp 1150-1157

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

read less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Cluster Boosted Tree Classifier for Multi-View, Multi-Pose Object Detection

[...]

Bo Wu¹, Ramakant Nevatia¹•Institutions (1)

University of Southern California¹

26 Dec 2007

TL;DR: This paper proposes a boosting based learning method, called Cluster Boosted Tree (CBT), to automatically construct tree structured object detectors, and shows that this approach outperforms the state-of-the-art methods.

...read moreread less

Abstract: Detection of object of a known class is a fundamental problem of computer vision. The appearance of objects can change greatly due to illumination, view point, and articulation. For object classes with large intra-class variation, some divide-and-conquer strategy is necessary. Tree structured classifier models have been used for multi-view multi- pose object detection in previous work. This paper proposes a boosting based learning method, called Cluster Boosted Tree (CBT), to automatically construct tree structured object detectors. Instead of using predefined intra-class sub- categorization based on domain knowledge, we divide the sample space by unsupervised clustering based on discriminative image features selected by boosting algorithm. The sub-categorization information of the leaf nodes is sent back to refine their ancestors' classification functions. We compare our approach with previous related methods on several public data sets. The results show that our approach outperforms the state-of-the-art methods.

...read moreread less

222 citations

Proceedings Article•DOI•

From co-saliency to co-segmentation: An efficient and fully unsupervised energy minimization model

[...]

Kai-Yueh Chang¹, Tyng-Luh Liu¹, Shang-Hong Lai²•Institutions (2)

Academia Sinica¹, National Tsing Hua University²

20 Jun 2011

TL;DR: This work addresses two key issues of co-segmentation over multiple images by establishing an MRF optimization model that has an energy function with nice properties and can be shown to effectively resolve the two difficulties.

...read moreread less

Abstract: We address two key issues of co-segmentation over multiple images. The first is whether a pure unsupervised algorithm can satisfactorily solve this problem. Without the user's guidance, segmenting the foregrounds implied by the common object is quite a challenging task, especially when substantial variations in the object's appearance, shape, and scale are allowed. The second issue concerns the efficiency if the technique can lead to practical uses. With these in mind, we establish an MRF optimization model that has an energy function with nice properties and can be shown to effectively resolve the two difficulties. Specifically, instead of relying on the user inputs, our approach introduces a co-saliency prior as the hint about possible foreground locations, and uses it to construct the MRF data terms. To complete the optimization framework, we include a novel global term that is more appropriate to co-segmentation, and results in a submodular energy function. The proposed model can thus be optimally solved by graph cuts. We demonstrate these advantages by testing our method on several benchmark datasets.

...read moreread less

221 citations

Cites methods from "Object recognition from local scale..."

...And in such distinct areas, we sample a point every five pixels and describe it by a SIFT feature [20]....
[...]

Proceedings Article•DOI•

Multi-task low-rank affinity pursuit for image segmentation

[...]

Bin Cheng¹, Guangcan Liu¹, Jingdong Wang², Zhongyang Huang³, Shuicheng Yan¹ - Show less +1 more•Institutions (3)

National University of Singapore¹, Microsoft², Panasonic³

06 Nov 2011

TL;DR: Compared to previous methods, which are usually based on a single type of features, the proposed method seamlessly integrates multiple types of features to jointly produce the affinity matrix within a single inference step, and produces more accurate and reliable segmentation results.

...read moreread less

Abstract: This paper investigates how to boost region-based image segmentation by pursuing a new solution to fuse multiple types of image features. A collaborative image segmentation framework, called multi-task low-rank affinity pursuit, is presented for such a purpose. Given an image described with multiple types of features, we aim at inferring a unified affinity matrix that implicitly encodes the segmentation of the image. This is achieved by seeking the sparsity-consistent low-rank affinities from the joint decompositions of multiple feature matrices into pairs of sparse and low-rank matrices, the latter of which is expressed as the production of the image feature matrix and its corresponding image affinity matrix. The inference process is formulated as a constrained nuclear norm and l 2;1 -norm minimization problem, which is convex and can be solved efficiently with the Augmented Lagrange Multiplier method. Compared to previous methods, which are usually based on a single type of features, the proposed method seamlessly integrates multiple types of features to jointly produce the affinity matrix within a single inference step, and produces more accurate and reliable segmentation results. Experiments on the MSRC dataset and Berkeley segmentation dataset well validate the superiority of using multiple features over single feature and also the superiority of our method over conventional methods for feature fusion. Moreover, our method is shown to be very competitive while comparing to other state-of-the-art methods.

...read moreread less

221 citations

Patent•

Image base inquiry system for search engines for mobile telephones with integrated camera

[...]

Sr. Hartmut Neven

20 Feb 2004

TL;DR: In this article, it was shown that it is possible to convert the graphical information into a symbolic format, for example, plain text, in order to then access information about the object shown.

...read moreread less

Abstract: An increasing number of mobile telephones and computers are being equipped with a camera. Thus, instead of simple text strings, it is also possible to send images as queries to search engines or databases. Moreover, advances in image recognition allow a greater degree of automated recognition of objects, strings of letters, or symbols in digital images. This makes it possible to convert the graphical information into a symbolic format, for example, plain text, in order to then access information about the object shown.

...read moreread less

221 citations

Proceedings Article•DOI•

Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval

[...]

Li Liu¹, Fumin Shen, Yuming Shen¹, Xianglong Liu², Ling Shao¹ - Show less +1 more•Institutions (2)

University of East Anglia¹, Beihang University²

21 Jul 2017

TL;DR: This paper introduces a novel binary coding method, named Deep Sketch Hashing (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework, and is the first hashing work specifically designed for category-level SBIR with an end to end deep architecture.

...read moreread less

Abstract: Free-hand sketch-based image retrieval (SBIR) is a specific cross-view retrieval task, in which queries are abstract and ambiguous sketches while the retrieval database is formed with natural images. Work in this area mainly focuses on extracting representative and shared features for sketches and natural images. However, these can neither cope well with the geometric distortion between sketches and images nor be feasible for large-scale SBIR due to the heavy continuous-valued distance computation. In this paper, we speed up SBIR by introducing a novel binary coding method, named Deep Sketch Hashing (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework. Specifically, three convolutional neural networks are utilized to encode free-hand sketches, natural images and, especially, the auxiliary sketch-tokens which are adopted as bridges to mitigate the sketch-image geometric distortion. The learned DSH codes can effectively capture the cross-view similarities as well as the intrinsic semantic correlations between different categories. To the best of our knowledge, DSH is the first hashing work specifically designed for category-level SBIR with an end-to-end deep architecture. The proposed DSH is comprehensively evaluated on two large-scale datasets of TU-Berlin Extension and Sketchy, and the experiments consistently show DSHs superior SBIR accuracies over several state-of-the-art methods, while achieving significantly reduced retrieval time and memory footprint.

...read moreread less

221 citations

Cites methods from "Object recognition from local scale..."

..., SIFT [40], HOG [8], gradient field HOG [18, 19], histogram of edge local orientations (HELO) [51, 49] and Learned KeyShapes...
[...]
...After that, hand-crafted features (e.g., SIFT [39], HOG [8], gradient field HOG [18, 19], histogram of edge local orientations (HELO) [48, 46] and Learned KeyShapes ar X iv :1 70 3....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
…
80
81
82
83
84
85
86
…
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Color indexing

[...]

Michael J. Swain, Dana H. Ballard

01 Nov 1991-International Journal of Computer Vision

TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.

...read moreread less

Abstract: Computer vision is moving into a new era in which the aim is to develop visual skills for robots that allow them to interact with a dynamic, unconstrained environment. To achieve this aim, new kinds of vision algorithms need to be developed which run in real time and subserve the robot's goals. Two fundamental goals are determining the identity of an object with a known location, and determining the location of a known object. Color can be successfully used for both tasks. This dissertation demonstrates that color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models. It shows that color histograms are stable object representations in the presence of occlusion and over change in view, and that they can differentiate among a large number of objects. For solving the identification problem, it introduces a technique called Histogram Intersection, which matches model and image histograms and a fast incremental version of Histogram Intersection which allows real-time indexing into a large database of stored models. It demonstrates techniques for dealing with crowded scenes and with models with similar color signatures. For solving the location problem it introduces an algorithm called Histogram Backprojection which performs this task efficiently in crowded scenes.

...read moreread less

5,672 citations

Journal Article•DOI•

Generalizing the hough transform to detect arbitrary shapes

[...]

Dana H. Ballard¹•Institutions (1)

University of Rochester¹

01 Jan 1987-Pattern Recognition

TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

...read moreread less

4,310 citations

Journal Article•DOI•

Visual learning and recognition of 3-D objects from appearance

[...]

Hiroshi Murase, Shree K. Nayar¹•Institutions (1)

Columbia University¹

01 Jan 1995-International Journal of Computer Vision

TL;DR: A near real-time recognition system with 20 complex objects in the database has been developed and a compact representation of object appearance is proposed that is parametrized by pose and illumination.

...read moreread less

Abstract: The problem of automatically learning object models for recognition and pose estimation is addressed. In contrast to the traditional approach, the recognition problem is formulated as one of matching appearance rather than shape. The appearance of an object in a two-dimensional image depends on its shape, reflectance properties, pose in the scene, and the illumination conditions. While shape and reflectance are intrinsic properties and constant for a rigid object, pose and illumination vary from scene to scene. A compact representation of object appearance is proposed that is parametrized by pose and illumination. For each object of interest, a large set of images is obtained by automatically varying pose and illumination. This image set is compressed to obtain a low-dimensional subspace, called the eigenspace, in which the object is represented as a manifold. Given an unknown input image, the recognition system projects the image to eigenspace. The object is recognized based on the manifold it lies on. The exact position of the projection on the manifold determines the object's pose in the image. A variety of experiments are conducted using objects with complex appearance characteristics. The performance of the recognition and pose estimation algorithms is studied using over a thousand input images of sample objects. Sensitivity of recognition to the number of eigenspace dimensions and the number of learning samples is analyzed. For the objects used, appearance representation in eigenspaces with less than 20 dimensions produces accurate recognition results with an average pose estimation error of about 1.0 degree. A near real-time recognition system with 20 complex objects in the database has been developed. The paper is concluded with a discussion on various issues related to the proposed learning and recognition methodology.

...read moreread less

2,037 citations

Journal Article•DOI•

Local grayvalue invariants for image retrieval

[...]

Cordelia Schmid, Roger Mohr

01 May 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.

...read moreread less

Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

...read moreread less

1,756 citations

"Object recognition from local scale..." refers background or methods in this paper

...This allows for the use of more distinctive image descriptors than the rotation-invariant ones used by Schmid and Mohr, and the descriptor is further modified to improve its stability to changes in affine projection and illumination....
[...]
...For the object recognition problem, Schmid & Mohr [19] also used the Harris corner detector to identify interest points, and then created a local image descriptor at each interest point from an orientation-invariant vector of derivative-of-Gaussian image measurements....
[...]
..., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....
[...]
...However, recent research on the use of dense local features (e.g., Schmid & Mohr [19]) has shown that efficient recognition can often be achieved by using local image descriptors sampled at a large number of repeatable locations....
[...]

Journal Article•DOI•

A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry

[...]

Zhengyou Zhang, Rachid Deriche, Olivier Faugeras, Quang-Tuan Luong

15 Oct 1995-Artificial Intelligence

TL;DR: A robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint, is proposed and a new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity.

...read moreread less

1,574 citations