scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Proceedings ArticleDOI
05 Nov 2012
TL;DR: A system for easily preparing arbitrary wide-area environments for subsequent real-time tracking with a handheld device and shows that minimal user effort is required to initialize a camera tracking session in an unprepared environment.
Abstract: We propose a system for easily preparing arbitrary wide-area environments for subsequent real-time tracking with a handheld device. Our system evaluation shows that minimal user effort is required to initialize a camera tracking session in an unprepared environment. We combine panoramas captured using a handheld omnidirectional camera from several viewpoints to create a point cloud model. After the offline modeling step, live camera pose tracking is initialized by feature point matching, and continuously updated by aligning the point cloud model to the camera image. Given a reconstruction made with less than five minutes of video, we achieve below 25 cm translational error and 0.5 degrees rotational error for over 80% of images tested. In contrast to camera-based simultaneous localization and mapping (SLAM) systems, our methods are suitable for handheld use in large outdoor spaces.

77 citations

Proceedings ArticleDOI
TL;DR: This work presents the largest kinship recognition dataset to date, Families in the Wild, and demonstrates that a pre-trained Convolutional Neural Network as an off-the-shelf feature extractor outperforms the other feature types.
Abstract: We present the largest kinship recognition dataset to date, Families in the Wild (FIW). Motivated by the lack of a single, unified dataset for kinship recognition, we aim to provide a dataset that captivates the interest of the research community. With only a small team, we were able to collect, organize, and label over 10,000 family photos of 1,000 families with our annotation tool designed to mark complex hierarchical relationships and local label information in a quick and efficient manner. We include several benchmarks for two image-based tasks, kinship verification and family recognition. For this, we incorporate several visual features and metric learning methods as baselines. Also, we demonstrate that a pre-trained Convolutional Neural Network (CNN) as an off-the-shelf feature extractor outperforms the other feature types. Then, results were further boosted by fine-tuning two deep CNNs on FIW data: (1) for kinship verification, a triplet loss function was learned on top of the network of pre-trained weights; (2) for family recognition, a family-specific softmax classifier was added to the network.

76 citations


Additional excerpts

  • ...First, we review handcrafted features, Scale Invariant Feature Transformation (SIFT) and Local Binary Patterns (LBP), which are both widely used in kinship verification [16] and facial recognition [20]....

    [...]

  • ...While some relation are relatively easy to recognize, e.g., B-B, SIBS, S-S through SIFT, LBP, and VGG-Face features, results of other relations such as parent- child are still below 70.0%....

    [...]

  • ...SIFT [15] features have been widely applied in object and face recognition....

    [...]

Proceedings ArticleDOI
23 Jun 2014
TL;DR: A feature voting-based landmark detection is more robust than previous local appearance-based detectors, and it is combined with nonparametric shape regularization to build a novel facial landmark localization pipeline that is robust to scale, in-plane rotation, occlusion, expression, and most importantly, extreme head pose.
Abstract: We propose a data-driven approach to facial landmark localization that models the correlations between each landmark and its surrounding appearance features. At runtime, each feature casts a weighted vote to predict landmark locations, where the weight is precomputed to take into account the feature's discriminative power. The feature votingbased landmark detection is more robust than previous local appearance-based detectors, we combine it with nonparametric shape regularization to build a novel facial landmark localization pipeline that is robust to scale, in-plane rotation, occlusion, expression, and most importantly, extreme head pose. We achieve state-of-the-art performance on two especially challenging in-the-wild datasets populated by faces with extreme head pose and expression.

76 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...Dense SIFT descriptors are then extracted over the test face at multiple orientations....

    [...]

  • ...Following the approach in [20], we quantize each SIFT descriptor using fast approximate k-means [16], which efficiently maps each descriptor to a visual word....

    [...]

  • ...Each exemplar has four components: a face image, a set of dense quantized SIFT [15] features, a sparse set of semantic facial landmarks corresponding to mouth corners, nose tip, chin contour, etc., and a unique set of weights, one weight per {feature, landmark} pair....

    [...]

  • ...Each exemplar has four components: a face image, a set of dense quantized SIFT [15] features, a sparse set of semantic facial landmarks corresponding to mouth corners, nose tip, chin contour, etc....

    [...]

Journal ArticleDOI
TL;DR: Experimental results confirm that the use of proposed fusion significantly improves the recognition accuracy.
Abstract: This paper presents an efficient ear recognition technique which derives benefits from the local features of the ear and attempt to handle the problems due to pose, poor contrast, change in illumination and lack of registration. It uses (1) three image enhancement techniques in parallel to neutralize the effect of poor contrast, noise and illumination, (2) a local feature extraction technique (SURF) on enhanced images to minimize the effect of pose variations and poor image registration. SURF feature extraction is carried out on enhanced images to obtain three sets of local features, one for each enhanced image. Three nearest neighbor classifiers are trained on these three sets of features. Matching scores generated by all three classifiers are fused for final decision. The technique has been evaluated on two public databases, namely IIT Kanpur ear database and University of Notre Dame ear database (Collections E). Experimental results confirm that the use of proposed fusion significantly improves the recognition accuracy.

76 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...The technique proposed in [7] has treated ear as a planar surface and has created a homography transform using SIFT [19] feature points to register ears accurately....

    [...]

Proceedings ArticleDOI
24 Mar 2017
TL;DR: A new dataset containing around 47.500 cropped X-ray images of 32 32 pixels with defects and no-defects in automotive components is released and 24 computer vision techniques including deep learning, sparse representations, local descriptors and texture features are evaluated and compared.
Abstract: To ensure safety in the construction of important metallic components for roadworthiness, it is necessary to check every component thoroughly using non-destructive testing. In last decades, X-ray testing has been adopted as the principal non-destructive testing method to identify defects within a component which are undetectable to the naked eye. Nowadays, modern computer vision techniques, such as deep learning and sparse representations, are opening new avenues in automatic object recognition in optical images. These techniques have been broadly used in object and texture recognition by the computer vision community with promising results in optical images. However, a comprehensive evaluation in X-ray testing is required. In this paper, we release a new dataset containing around 47.500 cropped X-ray images of 32 32 pixels with defects and no-defects in automotive components. Using this dataset, we evaluate and compare 24 computer vision techniques including deep learning, sparse representations, local descriptors and texture features, among others. We show in our experiments that the best performance was achieved by a simple LBP descriptor with a SVM-linear classifier obtaining 97% precision and 94% recall. We believe that the methodology presented could be used in similar projects that have to deal with automated detection of defects.

76 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...The descriptors are invariant to scale, rotation, lighting, noise and minor changes in viewpoint [24]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.