Distinctive Image Features from Scale-Invariant Keypoints

doi:10.1023/B:VISI.0000029664.99615.94

Home
/
Papers
/
Distinctive Image Features from Scale-Invariant Keypoints

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

read less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A Comprehensive Survey on Pose-Invariant Face Recognition

[...]

Changxing Ding¹, Dacheng Tao¹•Institutions (1)

University of Technology, Sydney¹

11 Feb 2016-ACM Transactions on Intelligent Systems and Technology

TL;DR: The inherent difficulties in PIFR are discussed and a comprehensive review of established techniques are presented, that is, pose-robust feature extraction approaches, multiview subspace learning approaches, face synthesis approaches, and hybrid approaches.

...read moreread less

Abstract: The capacity to recognize faces under varied poses is a fundamental human ability that presents a unique challenge for computer vision systems. Compared to frontal face recognition, which has been intensively studied and has gradually matured in the past few decades, Pose-Invariant Face Recognition (PIFR) remains a largely unsolved problem. However, PIFR is crucial to realizing the full potential of face recognition for real-world applications, since face recognition is intrinsically a passive biometric technology for recognizing uncooperative subjects. In this article, we discuss the inherent difficulties in PIFR and present a comprehensive review of established techniques. Existing PIFR methods can be grouped into four categories, that is, pose-robust feature extraction approaches, multiview subspace learning approaches, face synthesis approaches, and hybrid approaches. The motivations, strategies, pros/cons, and performance of representative approaches are described and compared. Moreover, promising directions for future research are discussed.

...read moreread less

269 citations

Cites methods from "Distinctive Image Features from Sca..."

...Similarly, Biswas et al (2013) described each landmark with SIFT features (Lowe 2004) and concatenated the SIFT features of all landmarks as the face representation. More recent engineered features benefit from the rapid progress in face alignment (Wang et al 2014a), which makes dense landmark detection more reliable. For example, Chen et al (2013) extracted multi-scale Local Binary Patterns (LBP) features from patches around 27 landmarks. LBP features for all patches are concatenated to become a high-dimensional feature vector as the pose-robust feature. A similar idea is adopted for feature extraction in (Prince et al 2008; Zhang et al 2013b). Intuitively, the larger the number of landmarks employed, the tighter semantic correspondence that can be achieved. Li et al (2009) proposed the detection of a number of landmarks with the help of a generic 3D face model. In comparison, Yi et al (2013) proposed a more accurate approach by employing a deformable 3D face model with 352 pre-labeled landmarks. Similar to (Li et al 2009), the 2D face image is aligned to the deformable 3D face model using the weak perspective projection model, after which the dense landmarks on the 3D model are projected to the 2D image. Lastly, Gabor magnitude coefficients at all landmarks are extracted and concatenated as the pose-robust feature. Concatenating the features of all landmarks across the face brings about highly non-linear intra-personal variation. To relieve this problem, Ding et al (2014) combined the component-level and landmark-level methods. In their approach, the Dual-Cross Patterns (DCP) (Ding et al 2014) features of landmarks belonging to the same facial component are concatenated as the description of the component. The pose-robust face representation incorporates a set of features of facial components. While the above methods crop patches centered around facial landmarks, Fischer et al (2012) found that the location of the patches for non-frontal faces has a noticeable impact on the recognition results. For example, the positions of patches around some landmarks, e.g., the nose tip and mouth corners, for face images of extreme pose should be adjusted so that fewer background pixels are included. The accuracy and reliability of dense landmark detection are critical for building semantic correspondence. However, accurate landmark detection in unconstrained images is still challenging. To handle this problem, Zhao and Gao (2009); Liao et al (2013b); Weng et al (2013) proposed alignment-free approaches to extract features around the so-called facial keypoints....
[...]
...Similarly, Biswas et al (2013) described each landmark with SIFT features (Lowe 2004) and concatenated the SIFT features of all landmarks as the face representation. More recent engineered features benefit from the rapid progress in face alignment (Wang et al 2014a), which makes dense landmark detection more reliable. For example, Chen et al (2013) extracted multi-scale Local Binary Patterns (LBP) features from patches around 27 landmarks. LBP features for all patches are concatenated to become a high-dimensional feature vector as the pose-robust feature. A similar idea is adopted for feature extraction in (Prince et al 2008; Zhang et al 2013b). Intuitively, the larger the number of landmarks employed, the tighter semantic correspondence that can be achieved. Li et al (2009) proposed the detection of a number of landmarks with the help of a generic 3D face model. In comparison, Yi et al (2013) proposed a more accurate approach by employing a deformable 3D face model with 352 pre-labeled landmarks. Similar to (Li et al 2009), the 2D face image is aligned to the deformable 3D face model using the weak perspective projection model, after which the dense landmarks on the 3D model are projected to the 2D image. Lastly, Gabor magnitude coefficients at all landmarks are extracted and concatenated as the pose-robust feature. Concatenating the features of all landmarks across the face brings about highly non-linear intra-personal variation. To relieve this problem, Ding et al (2014) combined the component-level and landmark-level methods....
[...]
...Similarly, Biswas et al (2013) described each landmark with SIFT features (Lowe 2004) and concatenated the SIFT features of all landmarks as the face representation. More recent engineered features benefit from the rapid progress in face alignment (Wang et al 2014a), which makes dense landmark detection more reliable. For example, Chen et al (2013) extracted multi-scale Local Binary Patterns (LBP) features from patches around 27 landmarks. LBP features for all patches are concatenated to become a high-dimensional feature vector as the pose-robust feature. A similar idea is adopted for feature extraction in (Prince et al 2008; Zhang et al 2013b). Intuitively, the larger the number of landmarks employed, the tighter semantic correspondence that can be achieved. Li et al (2009) proposed the detection of a number of landmarks with the help of a generic 3D face model. In comparison, Yi et al (2013) proposed a more accurate approach by employing a deformable 3D face model with 352 pre-labeled landmarks. Similar to (Li et al 2009), the 2D face image is aligned to the deformable 3D face model using the weak perspective projection model, after which the dense landmarks on the 3D model are projected to the 2D image. Lastly, Gabor magnitude coefficients at all landmarks are extracted and concatenated as the pose-robust feature. Concatenating the features of all landmarks across the face brings about highly non-linear intra-personal variation. To relieve this problem, Ding et al (2014) combined the component-level and landmark-level methods. In their approach, the Dual-Cross Patterns (DCP) (Ding et al 2014) features of landmarks belonging to the same facial component are concatenated as the description of the component. The pose-robust face representation incorporates a set of features of facial components. While the above methods crop patches centered around facial landmarks, Fischer et al (2012) found that the location of the patches for non-frontal faces has a noticeable impact on the recognition results. For example, the positions of patches around some landmarks, e.g., the nose tip and mouth corners, for face images of extreme pose should be adjusted so that fewer background pixels are included. The accuracy and reliability of dense landmark detection are critical for building semantic correspondence. However, accurate landmark detection in unconstrained images is still challenging. To handle this problem, Zhao and Gao (2009); Liao et al (2013b); Weng et al (2013) proposed alignment-free approaches to extract features around the so-called facial keypoints. For example, Liao et al (2013b) proposed the extraction of Multi-Keypoint Descriptors (MKD) around keypoints detected by SIFT-like detectors....
[...]
...Similarly, Biswas et al. (2013) described each landmark with SIFT features Lowe (2004) and concatenated the SIFT features of all landmarks as the face representation....
[...]
...Similarly, Biswas et al (2013) described each landmark with SIFT features (Lowe 2004) and concatenated the SIFT features of all landmarks as the face representation. More recent engineered features benefit from the rapid progress in face alignment (Wang et al 2014a), which makes dense landmark detection more reliable. For example, Chen et al (2013) extracted multi-scale Local Binary Patterns (LBP) features from patches around 27 landmarks. LBP features for all patches are concatenated to become a high-dimensional feature vector as the pose-robust feature. A similar idea is adopted for feature extraction in (Prince et al 2008; Zhang et al 2013b). Intuitively, the larger the number of landmarks employed, the tighter semantic correspondence that can be achieved. Li et al (2009) proposed the detection of a number of landmarks with the help of a generic 3D face model....
[...]

Proceedings Article•DOI•

Discovery of Collocation Patterns: from Visual Words to Visual Phrases

[...]

Junsong Yuan¹, Ying Wu¹, Ming Yang¹•Institutions (1)

Northwestern University¹

17 Jun 2007

TL;DR: A fast and principled solution to the discovery of significant spatial co-occurrent patterns using frequent itemset mining; a pattern summarization method that deals with the compositional uncertainties in visual phrases; and a top-down refinement scheme of the visual word lexicon by feeding back discovered phrases to tune the similarity measure through metric learning.

...read moreread less

Abstract: A visual word lexicon can be constructed by clustering primitive visual features, and a visual object can be described by a set of visual words. Such a "bag-of-words" representation has led to many significant results in various vision tasks including object recognition and categorization. However, in practice, the clustering of primitive visual features tends to result in synonymous visual words that over-represent visual patterns, as well as polysemous visual words that bring large uncertainties and ambiguities in the representation. This paper aims at generating a higher-level lexicon, i.e. visual phrase lexicon, where a visual phrase is a meaningful spatially co-occurrent pattern of visual words. This higher-level lexicon is much less ambiguous than the lower-level one. The contributions of this paper include: (1) a fast and principled solution to the discovery of significant spatial co-occurrent patterns using frequent itemset mining; (2) a pattern summarization method that deals with the compositional uncertainties in visual phrases; and (3) a top-down refinement scheme of the visual word lexicon by feeding back discovered phrases to tune the similarity measure through metric learning.

...read moreread less

269 citations

Journal Article•DOI•

Recent advances in small object detection based on deep learning: A review

[...]

Kang Tong¹, Yiquan Wu¹, Fei Zhou¹•Institutions (1)

Nanjing University of Aeronautics and Astronautics¹

01 May 2020-Image and Vision Computing

TL;DR: This work comprehensively review the existing small object detection methods based on deep learning from five aspects, including multi-scale feature learning, data augmentation, training strategy, context-based detection and GAN- based detection.

...read moreread less

269 citations

Journal Article•DOI•

ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition.

[...]

Dipendra Jha¹, Logan Ward², Arindam Paul¹, Wei-keng Liao¹, Alok Choudhary¹, Chris Wolverton¹, Ankit Agrawal¹ - Show less +3 more•Institutions (2)

Northwestern University¹, University of Chicago²

04 Dec 2018-Scientific Reports

TL;DR: The design and implementation of a deep neural network model referred to as ElemNet is presented; it automatically captures the physical and chemical interactions and similarities between different elements using artificial intelligence which allows it to predict the materials properties with better accuracy and speed.

...read moreread less

Abstract: Conventional machine learning approaches for predicting material properties from elemental compositions have emphasized the importance of leveraging domain knowledge when designing model inputs. Here, we demonstrate that by using a deep learning approach, we can bypass such manual feature engineering requiring domain knowledge and achieve much better results, even with only a few thousand training samples. We present the design and implementation of a deep neural network model referred to as ElemNet; it automatically captures the physical and chemical interactions and similarities between different elements using artificial intelligence which allows it to predict the materials properties with better accuracy and speed. The speed and best-in-class accuracy of ElemNet enable us to perform a fast and robust screening for new material candidates in a huge combinatorial space; where we predict hundreds of thousands of chemical systems that could contain yet-undiscovered compounds.

...read moreread less

268 citations

Proceedings Article•DOI•

Periocular biometrics in the visible spectrum: A feasibility study

[...]

Unsang Park¹, Arun Ross², Anil K. Jain¹•Institutions (2)

Michigan State University¹, West Virginia University²

28 Sep 2009

TL;DR: The feasibility of using periocular images of an individual as a biometric trait using texture and point operators resulting in a feature set that can be used for matching is studied.

...read moreread less

Abstract: Periocular biometric refers to the facial region in the immediate vicinity of the eye. Acquisition of the periocular biometric does not require high user cooperation and close capture distance unlike other ocular biometrics (e.g., iris, retina, and sclera). We study the feasibility of using periocular images of an individual as a biometric trait. Global and local information are extracted from the periocular region using texture and point operators resulting in a feature set that can be used for matching. The effect of fusing these feature sets is also studied. The experimental results show a 77% rank-1 recognition accuracy using 958 images captured from 30 different subjects.

...read moreread less

267 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
…
167
168
169
170
171
172
173
…
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Object recognition from local scale-invariant features

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

16,989 citations

"Distinctive Image Features from Sca..." refers background or methods in this paper

...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....
[...]
...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....
[...]
...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....
[...]
...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...
[...]
...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....
[...]

Book•

Multiple view geometry in computer vision

[...]

Richard Hartley¹, Andrew Zisserman²•Institutions (2)

Australian National University¹, University of Oxford²

01 Jan 2000

TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.

...read moreread less

Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

...read moreread less

15,558 citations

Multiple View Geometry in Computer Vision.

[...]

Bernhard P. Wrobel

01 Jan 2001

TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.

...read moreread less

Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

...read moreread less

14,282 citations

"Distinctive Image Features from Sca..." refers background in this paper

...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....
[...]

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

Journal Article•DOI•

Robust wide-baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

University of Surrey¹

01 Sep 2004-Image and Vision Computing

TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

...read moreread less

3,422 citations