scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Proceedings ArticleDOI
06 Nov 2011
TL;DR: This work presents a single-image analysis, not to attempt to identify a single accurate model, but to propose a set of plausible hypotheses about the structure of the environment from an initial frame, and uses data from subsequent frames to update a Bayesian posterior probability distribution over the set of hypotheses.
Abstract: We present a method whereby an embodied agent using visual perception can efficiently create a model of a local indoor environment from its experience of moving within it. Our method uses motion cues to compute likelihoods of indoor structure hypotheses, based on simple, generic geometric knowledge about points, lines, planes, and motion. We present a single-image analysis, not to attempt to identify a single accurate model, but to propose a set of plausible hypotheses about the structure of the environment from an initial frame. We then use data from subsequent frames to update a Bayesian posterior probability distribution over the set of hypotheses. The likelihood function is efficiently computable by comparing the predicted location of point features on the environment model to their actual tracked locations in the image stream. Our method runs in real-time, and it avoids the need of extensive prior training and the Manhattan-world assumption, which makes it more practical and efficient for an intelligent robot to understand its surroundings compared to most previous scene understanding methods. Experimental results on a collection of indoor videos suggest that our method is capable of an unprecedented combination of accuracy and efficiency.

72 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Any method can be used but in this paper, we use KLT [20] tracking because it is more efficient than SIFT [15] and SURF [2], and it works well in our experiments....

    [...]

  • ...Bundler [21] has trouble with simple forward motion because it only considered SIFT points that frequently appear among the image set for 3D reconstructions and camera pose estimation....

    [...]

Proceedings ArticleDOI
03 Dec 2010
TL;DR: This work presents a fast and efficient geometric re-ranking method that can be incorporated in a feature based image-based retrieval system that utilizes a Vocabulary Tree (VT), and shows in experiments that re- ranking schemes can substantially improve recognition accuracy.
Abstract: We present a fast and efficient geometric re-ranking method that can be incorporated in a feature based image-based retrieval system that utilizes a Vocabulary Tree (VT). We form feature pairs by comparing descriptor classification paths in the VT and calculate geometric similarity score of these pairs. We propose a location geometric similarity scoring method that is invariant to rotation, scale, and translation, and can be easily incorporated in mobile visual search and augmented reality systems. We compare the performance of the location geometric scoring scheme to orientation and scale geometric scoring schemes. We show in our experiments that re-ranking schemes can substantially improve recognition accuracy. We can also reduce the worst case server latency up to 1 sec and still improve the recognition performance.

72 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...In this process, features of the query object are matched with features of the database objects using nearest descriptor or the ratio test [7]....

    [...]

  • ...By representing images or objects using sets of local features [7, 8, 9], recognition can be achieved by matching features between the query image and candidate database image....

    [...]

Journal ArticleDOI
TL;DR: A novel transductive transfer subspace learning method for cross-domain facial expression recognition that achieves much better recognition performance compared with the state-of-the-art methods.
Abstract: Facial expression recognition across domains, e.g., training and testing facial images come from different facial poses, is very challenging due to the different marginal distributions between training and testing facial feature vectors. To deal with such challenging cross-domain facial expression recognition problem, a novel transductive transfer subspace learning method is proposed in this paper. In this method, a labelled facial image set from source domain is combined with an unlabelled auxiliary facial image set from target domain to jointly learn a discriminative subspace and make the class labels prediction of the unlabelled facial images, where a transductive transfer regularized least-squares regression (TTRLSR) model is proposed to this end. Then, based on the auxiliary facial image set, we train a SVM classifier for classifying the expressions of other facial images in the target domain. Moreover, we also investigate the use of color facial features to evaluate the recognition performance of the proposed facial expression recognition method, where color scale invariant feature transform (CSIFT) features associated with 49 landmark facial points are extracted to describe each color facial image. Finally, extensive experiments on BU-3DFE and Multi-PIE multiview color facial expression databases are conducted to evaluate the cross-database & cross-view facial expression recognition performance of the proposed method. Comparisons with state-of-the-art domain adaption methods are also included in the experiments. The experimental results demonstrate that the proposed method achieves much better recognition performance compared with the state-of-the-art methods.

72 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...In this paper, we will use SIFT features [7], [24] to describe each facial image....

    [...]

Proceedings ArticleDOI
11 Oct 2009
TL;DR: A novel framework of facial appearance and shape information extraction for facial expression recognition that provides holistic characteristics for the local texture and shape features by enhancing the structure-based spatial information, and makes the local descriptors be possible to be used in facial expression Recognition for the first time.
Abstract: A novel framework of facial appearance and shape information extraction for facial expression recognition is proposed. For appearance extraction, a facial-component-based bag of words method is presented. We segment face images into 4 component regions, and sub-divide them into 4×4 sub-regions. Dense SIFT (Scale-Invariant Feature Transform) features are calculated over the sub-regions and vector quantized into 4×4 sets of codeword distributions. For shape extraction, PHOG (Pyramid Histogram of Orientated Gradient) descriptors are computed on the 4 facial component regions to obtain the spatial distribution of edges. Our framework provides holistic characteristics for the local texture and shape features by enhancing the structure-based spatial information, and makes the local descriptors be possible to be used in facial expression recognition for the first time. The recognition rate achieved by the fusion of appearance and shape features at decision level using the Cohn-Kanade database is 96.33%, which outperforms the state of the arts.

72 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...We firstly segment face images into 4 regions which contain different facial components, then equally divide each region into 4 sub-regions and calculate SIFT [19] (Scale-Invariant Feature Transform) descriptors on a sliding grid over each sub-region....

    [...]

Journal ArticleDOI
TL;DR: A fragment-based generative model for shape that is based on the shock graph and has minimal dependency among its shape fragments is proposed, capable of generating a wide variation of shapes as instances of a given object category.
Abstract: We describe a top-down object detection and segmentation approach that uses a skeleton-based shape model and that works directly on real images. The approach is based on three components. First, we propose a fragment-based generative model for shape that is based on the shock graph and has minimal dependency among its shape fragments. The model is capable of generating a wide variation of shapes as instances of a given object category. Second, we develop a progressive selection mechanism to search among the generated shapes for the category instances that are present in the image. The search begins with a large pool of candidates identified by a dynamic programming (DP) algorithm and progressively reduces it in size by applying series of criteria, namely, local minimum criterion, extent of shape overlap, and thresholding of the objective function to select the final object candidates. Third, we propose the Partitioned Chamfer Matching (PCM) measure to capture the support of image edges for a hypothesized shape. This measure overcomes the shortcomings of the Oriented Chamfer Matching and is robust against spurious edges, missing edges, and accidental alignment between the image edges and the shape boundary contour. We have evaluated our approach on the ETHZ dataset and found it to perform well in both object detection and object segmentation tasks.

72 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Appearance-based methods generally rely on feature points such as SIFT (Lowe 2004) and others (Mikolajczyk and Schmid 2005), and have had remarkable success in detecting the presence of objects (Dorkó and Schmid 2003; Fergus et al....

    [...]

  • ...Appearance-based methods generally rely on feature points such as SIFT (Lowe 2004) and others (Mikolajczyk and Schmid 2005), and have had remarkable success in detecting the presence of objects (Dorkó and Schmid 2003; Fergus et al. 2003; Csurka et al. 2004; Leibe and Schiele 2004; Jurie and Triggs 2005; Berg et al. 2005; Kumar et al. 2005; Winn and Jojic 2005; Lazebnik et al. 2006; Shotton et al. 2006; Todorovic and Ahuja 2006), some of these methods also localize objects (Viola and Jones 2001; Leibe and Schiele 2004; Torralba et al. 2004; Berg et al. 2005; Kumar et al. 2005)....

    [...]

  • ...Appearance-based methods generally rely on feature points such as SIFT (Lowe 2004) and others (Mikolajczyk and Schmid 2005), and have had remarkable success in detecting the presence of objects (Dorkó and Schmid 2003; Fergus et al. 2003; Csurka et al. 2004; Leibe and Schiele 2004; Jurie and Triggs…...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.