scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Proceedings ArticleDOI
16 May 2011
TL;DR: A performance evaluation of the state-of-the-art in 3D key point detection, mainly addressing the task of 3D object recognition, is carried out by analyzing the performance of several prominent methods in terms of robustness to noise, presence of clutter, occlusions and point- of-view variations.
Abstract: Intense research activity on 3D data analysis tasks, such as object recognition and shape retrieval, has recently fostered the proposal of many techniques to perform detection of repeatable and distinctive key points in 3D surfaces. This high number of proposals has not been accompanied yet by a comprehensive comparative evaluation of the methods. Motivated by this, our work proposes a performance evaluation of the state-of-the-art in 3D key point detection, mainly addressing the task of 3D object recognition. The evaluation is carried out by analyzing the performance of several prominent methods in terms of robustness to noise (real and synthetic), presence of clutter, occlusions and point-of-view variations.

191 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...The method in [23] uses as quality measure the displacement of each vertex from its original position after the application of the Difference-of-Gaussians (DoG) filter [24]....

    [...]

  • ...In MeshDoG additional filtering steps are introduced after detection: a maximum number of keypoints is detected, corresponding to a percentage value of the number of vertices of the mesh; as done in [24], noncorner responses are eliminated....

    [...]

Journal ArticleDOI
Hongwei Qin1, Xiu Li1, Jian Liang1, Yigang Peng1, Changshui Zhang1 
TL;DR: A framework to recognize fish from videos captured by underwater cameras deployed in the ocean observation network is proposed, using a deep architecture to extract features of the foreground fish images and a linear SVM classifier for classification.

191 citations


Additional excerpts

  • ...For example, SIFT and HOG features are used for object recognition, LBP and Gabor features are used for texture and face classification....

    [...]

  • ...Actually, the histogram offers some degree of translation invariance in the extracted features, just like in hand-crafted features such as Scale-invariant Feature Transform (SIFT) [33] or Histogram of Oriented Gradients (HOG) [34], and average or max pooling process in ConvNet [20,19,35,16,36]....

    [...]

  • ...We use PHOW features (dense multi-scale SIFT descriptors), Elkan k-means for fast visual word dictionary construction, spatial histograms as image descriptors, a homogeneous kernel map to transform a Chi2 support vector machine (SVM) into a linear one, and SVM classifiers....

    [...]

  • ...LDAþSVM 80.14 Raw-pixel SVM 82.92 Raw-pixel Softmax 87.56 Raw-pixel Nearest Neighbor 89.79 VLFeat Dense-SIFT 93.58 DeepFish-SVM (our) 98:23 DeepFish-SVM-aug (our) 98:59 enlarge the training set for the species whose image number is less than 300....

    [...]

Proceedings ArticleDOI
20 Jun 2011
TL;DR: The proposed associate-predict model is built on an extra generic identity data set, in which each identity contains multiple images with large intra-personal variation, and can substantially improve the performance of most existing face recognition methods.
Abstract: Handling intra-personal variation is a major challenge in face recognition. It is difficult how to appropriately measure the similarity between human faces under significantly different settings (e.g., pose, illumination, and expression). In this paper, we propose a new model, called “Associate-Predict” (AP) model, to address this issue. The associate-predict model is built on an extra generic identity data set, in which each identity contains multiple images with large intra-personal variation. When considering two faces under significantly different settings (e.g., non-frontal and frontal), we first “associate” one input face with alike identities from the generic identity date set. Using the associated faces, we generatively “predict” the appearance of one input face under the setting of another input face, or discriminatively “predict” the likelihood whether two input faces are from the same person or not. We call the two proposed prediction methods as “appearance-prediction” and “likelihood-prediction”. By leveraging an extra data set (“memory”) and the “associate-predict” model, the intra-personal variation can be effectively handled. To improve the generalization ability of our model, we further add a switching mechanism — we directly compare the appearances of two faces if they have close intra-personal settings; otherwise, we use the associate-predict model for the recognition. Experiments on two public face benchmarks (Multi-PIE and LFW) demonstrated that our final model can substantially improve the performance of most existing face recognition methods

191 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...We evaluate four representative lowlevel descriptors: LBP [18], SIFT [16], Gabor [31], and Learning-based (LE) descriptor [3]....

    [...]

  • ...The descriptor-based methods [3, 5, 9, 16, 18, 24, 31, 32] and subspace-based methods [1, 14, 15, 17, 19, 26, 28, 29, 30, 35] are two representative appearance-based approaches....

    [...]

  • ...We use the default parameter described in [31] for the Gabor descriptor and 32-orientation quantization for the SIFT descriptor....

    [...]

Proceedings ArticleDOI
16 Jun 2012
TL;DR: AASC seeks for an optimal combination of affinity matrices so that it is more immune to ineffective affinities and irrelevant features, which enables the construction of similarity or distance-metric measures for clustering less crucial.
Abstract: Spectral clustering makes use of spectral-graph structure of an affinity matrix to partition data into disjoint meaningful groups. Because of its elegance, efficiency and good performance, spectral clustering has become one of the most popular clustering methods. Traditional spectral clustering assumes a single affinity matrix. However, in many applications, there could be multiple potentially useful features and thereby multiple affinity matrices. To apply spectral clustering for these cases, a possible way is to aggregate the affinity matrices into a single one. Unfortunately, affinity measures constructed from different features could have different characteristics. Careless aggregation might make even worse clustering performance. This paper proposes an affinity aggregation spectral clustering (AASC) algorithm which extends spectral clustering to a setting with multiple affinities available. AASC seeks for an optimal combination of affinity matrices so that it is more immune to ineffective affinities and irrelevant features. This enables the construction of similarity or distance-metric measures for clustering less crucial. Experiments show that AASC is effective in simultaneous clustering and feature fusion, thus enhancing the performance of spectral clustering by employing multiple affinities.

191 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...As MMSC, five types of features were used, LBP [21], GIST [22], CENTRIST [31], Dog-SIFT [18], and HOG [9]....

    [...]

  • ...We denote SCL, SCG, SCC , SCD and SCH as the single-affinity spectral clustering methods with five different affinity matrices derived from the above five features (LBP, GIST, CENTRIST, Dog-SIFT, and HOG), respectively....

    [...]

Journal ArticleDOI
TL;DR: A conceptual categorization and metrics for an evaluation of such methods are presented, followed by a comprehensive survey of relevant publications, and technical considerations and tradeoffs of the surveyed methods are discussed.
Abstract: Recently, researchers found that the intended generalizability of (deep) face recognition systems increases their vulnerability against attacks. In particular, the attacks based on morphed face images pose a severe security risk to face recognition systems. In the last few years, the topic of (face) image morphing and automated morphing attack detection has sparked the interest of several research laboratories working in the field of biometrics and many different approaches have been published. In this paper, a conceptual categorization and metrics for an evaluation of such methods are presented, followed by a comprehensive survey of relevant publications. In addition, technical considerations and tradeoffs of the surveyed methods are discussed along with open issues and challenges in the field.

191 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...In particular the combination of features reflecting different information, e.g., LBP and SIFT, leads to improvements....

    [...]

  • ...E.g., for ScaleInvariant Feature Transform (SIFT) [106] the number of extracted keypoints has been shown to be suitable for the task ofmorph detection [38], [78]....

    [...]

  • ...Invariant Feature Transform (SIFT) [106] the number of extracted keypoints has been shown to be suitable for the task ofmorph detection [38], [78]....

    [...]

  • ...Therefore, LBP, BSIF, SIFT, Speeded Up Robust Features (SURF) [107], Histogram ofOrientedGradients (HOG) [108] and the deep features of Openface [109] were fused and evaluated in [79]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.