Distinctive Image Features from Scale-Invariant Keypoints

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.

read less

Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Real-time indoor scene understanding using Bayesian filtering with motion cues

[...]

Grace Tsai¹, Changhai Xu², Jingen Liu¹, Benjamin Kuipers¹•Institutions (2)

University of Michigan¹, University of Texas at Austin²

06 Nov 2011

TL;DR: This work presents a single-image analysis, not to attempt to identify a single accurate model, but to propose a set of plausible hypotheses about the structure of the environment from an initial frame, and uses data from subsequent frames to update a Bayesian posterior probability distribution over the set of hypotheses.

...read moreread less

Abstract: We present a method whereby an embodied agent using visual perception can efficiently create a model of a local indoor environment from its experience of moving within it. Our method uses motion cues to compute likelihoods of indoor structure hypotheses, based on simple, generic geometric knowledge about points, lines, planes, and motion. We present a single-image analysis, not to attempt to identify a single accurate model, but to propose a set of plausible hypotheses about the structure of the environment from an initial frame. We then use data from subsequent frames to update a Bayesian posterior probability distribution over the set of hypotheses. The likelihood function is efficiently computable by comparing the predicted location of point features on the environment model to their actual tracked locations in the image stream. Our method runs in real-time, and it avoids the need of extensive prior training and the Manhattan-world assumption, which makes it more practical and efficient for an intelligent robot to understand its surroundings compared to most previous scene understanding methods. Experimental results on a collection of indoor videos suggest that our method is capable of an unprecedented combination of accuracy and efficiency.

...read moreread less

72 citations

Cites methods from "Distinctive Image Features from Sca..."

...Any method can be used but in this paper, we use KLT [20] tracking because it is more efficient than SIFT [15] and SURF [2], and it works well in our experiments....
[...]
...Bundler [21] has trouble with simple forward motion because it only considered SIFT points that frequently appear among the image set for 3D reconstructions and camera pose estimation....
[...]

Proceedings Article•DOI•

Fast geometric re-ranking for image-based retrieval

[...]

Sam S. Tsai¹, David Chen¹, Gabriel Takacs¹, Vijay Chandrasekhar¹, Ramakrishna Vedantham², Radek Grzeszczuk², Bernd Girod¹ - Show less +3 more•Institutions (2)

Stanford University¹, Nokia²

03 Dec 2010

TL;DR: This work presents a fast and efficient geometric re-ranking method that can be incorporated in a feature based image-based retrieval system that utilizes a Vocabulary Tree (VT), and shows in experiments that re- ranking schemes can substantially improve recognition accuracy.

...read moreread less

Abstract: We present a fast and efficient geometric re-ranking method that can be incorporated in a feature based image-based retrieval system that utilizes a Vocabulary Tree (VT). We form feature pairs by comparing descriptor classification paths in the VT and calculate geometric similarity score of these pairs. We propose a location geometric similarity scoring method that is invariant to rotation, scale, and translation, and can be easily incorporated in mobile visual search and augmented reality systems. We compare the performance of the location geometric scoring scheme to orientation and scale geometric scoring schemes. We show in our experiments that re-ranking schemes can substantially improve recognition accuracy. We can also reduce the worst case server latency up to 1 sec and still improve the recognition performance.

...read moreread less

72 citations

Cites background or methods from "Distinctive Image Features from Sca..."

...In this process, features of the query object are matched with features of the database objects using nearest descriptor or the ratio test [7]....
[...]
...By representing images or objects using sets of local features [7, 8, 9], recognition can be achieved by matching features between the query image and candidate database image....
[...]

Journal Article•DOI•

Cross-Domain Color Facial Expression Recognition Using Transductive Transfer Subspace Learning

[...]

Wenming Zheng¹, Yuan Zong¹, Xiaoyan Zhou², Minghai Xin¹•Institutions (2)

Southeast University¹, Nanjing University²

01 Jan 2018-IEEE Transactions on Affective Computing

TL;DR: A novel transductive transfer subspace learning method for cross-domain facial expression recognition that achieves much better recognition performance compared with the state-of-the-art methods.

...read moreread less

Abstract: Facial expression recognition across domains, e.g., training and testing facial images come from different facial poses, is very challenging due to the different marginal distributions between training and testing facial feature vectors. To deal with such challenging cross-domain facial expression recognition problem, a novel transductive transfer subspace learning method is proposed in this paper. In this method, a labelled facial image set from source domain is combined with an unlabelled auxiliary facial image set from target domain to jointly learn a discriminative subspace and make the class labels prediction of the unlabelled facial images, where a transductive transfer regularized least-squares regression (TTRLSR) model is proposed to this end. Then, based on the auxiliary facial image set, we train a SVM classifier for classifying the expressions of other facial images in the target domain. Moreover, we also investigate the use of color facial features to evaluate the recognition performance of the proposed facial expression recognition method, where color scale invariant feature transform (CSIFT) features associated with 49 landmark facial points are extracted to describe each color facial image. Finally, extensive experiments on BU-3DFE and Multi-PIE multiview color facial expression databases are conducted to evaluate the cross-database & cross-view facial expression recognition performance of the proposed method. Comparisons with state-of-the-art domain adaption methods are also included in the experiments. The experimental results demonstrate that the proposed method achieves much better recognition performance compared with the state-of-the-art methods.

...read moreread less

72 citations

Cites methods from "Distinctive Image Features from Sca..."

...In this paper, we will use SIFT features [7], [24] to describe each facial image....
[...]

Proceedings Article•DOI•

Facial-component-based bag of words and PHOG descriptor for facial expression recognition

[...]

Zisheng Li¹, Jun-ichi Imai¹, Masahide Kaneko¹•Institutions (1)

University of Electro-Communications¹

11 Oct 2009

TL;DR: A novel framework of facial appearance and shape information extraction for facial expression recognition that provides holistic characteristics for the local texture and shape features by enhancing the structure-based spatial information, and makes the local descriptors be possible to be used in facial expression Recognition for the first time.

...read moreread less

Abstract: A novel framework of facial appearance and shape information extraction for facial expression recognition is proposed. For appearance extraction, a facial-component-based bag of words method is presented. We segment face images into 4 component regions, and sub-divide them into 4×4 sub-regions. Dense SIFT (Scale-Invariant Feature Transform) features are calculated over the sub-regions and vector quantized into 4×4 sets of codeword distributions. For shape extraction, PHOG (Pyramid Histogram of Orientated Gradient) descriptors are computed on the 4 facial component regions to obtain the spatial distribution of edges. Our framework provides holistic characteristics for the local texture and shape features by enhancing the structure-based spatial information, and makes the local descriptors be possible to be used in facial expression recognition for the first time. The recognition rate achieved by the fusion of appearance and shape features at decision level using the Cohn-Kanade database is 96.33%, which outperforms the state of the arts.

...read moreread less

72 citations

Cites methods from "Distinctive Image Features from Sca..."

...We firstly segment face images into 4 regions which contain different facial components, then equally divide each region into 4 sub-regions and calculate SIFT [19] (Scale-Invariant Feature Transform) descriptors on a sliding grid over each sub-region....
[...]

Journal Article•DOI•

Skeleton Search: Category-Specific Object Recognition and Segmentation Using a Skeletal Shape Model

[...]

Nhon H. Trinh¹, Benjamin B. Kimia²•Institutions (2)

SRI International¹, Brown University²

01 Sep 2011-International Journal of Computer Vision

TL;DR: A fragment-based generative model for shape that is based on the shock graph and has minimal dependency among its shape fragments is proposed, capable of generating a wide variation of shapes as instances of a given object category.

...read moreread less

Abstract: We describe a top-down object detection and segmentation approach that uses a skeleton-based shape model and that works directly on real images. The approach is based on three components. First, we propose a fragment-based generative model for shape that is based on the shock graph and has minimal dependency among its shape fragments. The model is capable of generating a wide variation of shapes as instances of a given object category. Second, we develop a progressive selection mechanism to search among the generated shapes for the category instances that are present in the image. The search begins with a large pool of candidates identified by a dynamic programming (DP) algorithm and progressively reduces it in size by applying series of criteria, namely, local minimum criterion, extent of shape overlap, and thresholding of the objective function to select the final object candidates. Third, we propose the Partitioned Chamfer Matching (PCM) measure to capture the support of image edges for a hypothesized shape. This measure overcomes the shortcomings of the Oriented Chamfer Matching and is robust against spurious edges, missing edges, and accidental alignment between the image edges and the shape boundary contour. We have evaluated our approach on the ETHZ dataset and found it to perform well in both object detection and object segmentation tasks.

...read moreread less

72 citations

Cites methods from "Distinctive Image Features from Sca..."

...Appearance-based methods generally rely on feature points such as SIFT (Lowe 2004) and others (Mikolajczyk and Schmid 2005), and have had remarkable success in detecting the presence of objects (Dorkó and Schmid 2003; Fergus et al....
[...]
...Appearance-based methods generally rely on feature points such as SIFT (Lowe 2004) and others (Mikolajczyk and Schmid 2005), and have had remarkable success in detecting the presence of objects (Dorkó and Schmid 2003; Fergus et al. 2003; Csurka et al. 2004; Leibe and Schiele 2004; Jurie and Triggs 2005; Berg et al. 2005; Kumar et al. 2005; Winn and Jojic 2005; Lazebnik et al. 2006; Shotton et al. 2006; Todorovic and Ahuja 2006), some of these methods also localize objects (Viola and Jones 2001; Leibe and Schiele 2004; Torralba et al. 2004; Berg et al. 2005; Kumar et al. 2005)....
[...]
...Appearance-based methods generally rely on feature points such as SIFT (Lowe 2004) and others (Mikolajczyk and Schmid 2005), and have had remarkable success in detecting the presence of objects (Dorkó and Schmid 2003; Fergus et al. 2003; Csurka et al. 2004; Leibe and Schiele 2004; Jurie and Triggs…...
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
…
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

A performance evaluation of local descriptors

[...]

Krystian Mikolajczyk¹, Cordelia Schmid²•Institutions (2)

University of Oxford¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

...read moreread less

7,057 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations