scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Proceedings ArticleDOI
01 Jun 2016
TL;DR: A novel and effective method to learn a rotation-invariant and Fisher discriminative CNN (RIFD-CNN) model that is applicable to object detection on a public available aerial image dataset and the PASCAL VOC 2007 dataset.
Abstract: Thanks to the powerful feature representations obtained through deep convolutional neural network (CNN), the performance of object detection has recently been substantially boosted. Despite the remarkable success, the problems of object rotation, within-class variability, and between-class similarity remain several major challenges. To address these problems, this paper proposes a novel and effective method to learn a rotation-invariant and Fisher discriminative CNN (RIFD-CNN) model. This is achieved by introducing and learning a rotation-invariant layer and a Fisher discriminative layer, respectively, on the basis of the existing high-capacity CNN architectures. Specifically, the rotation-invariant layer is trained by imposing an explicit regularization constraint on the objective function that enforces invariance on the CNN features before and after rotating. The Fisher discriminative layer is trained by imposing the Fisher discrimination criterion on the CNN features so that they have small within-class scatter but large between-class separation. In the experiments, we comprehensively evaluate the proposed method for object detection task on a public available aerial image dataset and the PASCAL VOC 2007 dataset. State-of-the-art results are achieved compared with the existing baseline methods.

156 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...The success of R-CNN method [5] is largely attributed to the ability of CNN model to extract more richer high-level object representation features as opposed to handengineered low-level features such as SIFT [31] and HOG [32]....

    [...]

Journal ArticleDOI
TL;DR: This work proposes an algorithm for automatically obtaining a segmentation of a rigid object in a sequence of images that are calibrated for camera pose and intrinsic parameters and implicitly exploits a 3D rigidity constraint, expressed as silhouette coherency, which significantly improves silhouette quality over independent 2D segmentations.

156 citations

Journal ArticleDOI
TL;DR: These experiments demonstrate the benefit of using pixel-wise posteriors rather than likelihoods, and showcase the qualities, such as robustness to occlusions and motion blur (and also some failure modes), of the tracker.
Abstract: We formulate a probabilistic framework for simultaneous region-based 2D segmentation and 2D to 3D pose tracking, using a known 3D model. Given such a model, we aim to maximise the discrimination between statistical foreground and background appearance models, via direct optimisation of the 3D pose parameters. The foreground region is delineated by the zero-level-set of a signed distance embedding function, and we define an energy over this region and its immediate background surroundings based on pixel-wise posterior membership probabilities (as opposed to likelihoods). We derive the differentials of this energy with respect to the pose parameters of the 3D object, meaning we can conduct a search for the correct pose using standard gradient-based non-linear minimisation techniques. We propose novel enhancements at the pixel level based on temporal consistency and improved online appearance model adaptation. Furthermore, straightforward extensions of our method lead to multi-camera and multi-object tracking as part of the same framework. The parallel nature of much of the processing in our algorithm means it is amenable to GPU acceleration, and we give details of our real-time implementation, which we use to generate experimental results on both real and artificial video sequences, with a number of 3D models. These experiments demonstrate the benefit of using pixel-wise posteriors rather than likelihoods, and showcase the qualities, such as robustness to occlusions and motion blur (and also some failure modes), of our tracker.

156 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...The rise of effective descriptors for point features that exhibit a degree of viewpoint invariance (e.g. Lowe 2004 or Lepetit et al. 2005) has seen realtime methods for point-based model-based tracking become more effective, such as Wagner et al. (2008) or Ozuysal et al. (2006) (where the model is…...

    [...]

Proceedings ArticleDOI
03 Dec 2010
TL;DR: Gradient Field HoG (GF-HOG) is introduced as a depiction invariant image descriptor, encapsulating local spatial structure in the sketch and facilitating efficient codebook based retrieval.
Abstract: We present an image retrieval system driven by free-hand sketched queries depicting shape. We introduce Gradient Field HoG (GF-HOG) as a depiction invariant image descriptor, encapsulating local spatial structure in the sketch and facilitating efficient codebook based retrieval. We show improved retrieval accuracy over 3 leading descriptors (Self Similarity, SIFT, HoG) across two datasets (Flickr160, ETHZ extended objects), and explain how GF-HOG can be combined with RANSAC to localize sketched objects within relevant images. We also demonstrate a prototype sketch driven photo montage application based on our system.

156 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...If descriptor A in a sketch is assigned to B in the image, then B must also be nearest to A for a valid match [16]....

    [...]

  • ...4 we evaluate SSIM, alongside SIFT and HoG descriptors for the purpose of sketch based shape retrieval and show our adapted GF-HOG descriptor outperforms these on both our own Flickr160 dataset1 and the ETHZ extended object dataset (a subset of which is used in [11])....

    [...]

  • ...We compare retrieval performance with an otherwise identical system incorporating alternative descriptors: the Self-Similarity (SSIM) [10, 11], SIFT [16], and HoG [14] descriptor....

    [...]

  • ...Here, SIFT is computed on a region of radius 16 pixels....

    [...]

  • ...This simple technique yields significant improvements in performance when matching sketches to photos, compared to three leading descriptors: Self-Similarity Descriptor (SSIM); SIFT; and HoG (sec....

    [...]

Journal ArticleDOI
TL;DR: An image analysis scheme can automate the detection of landmarks and events in large image collections, significantly improving the content-consumption experience.
Abstract: An image analysis scheme can automate the detection of landmarks and events in large image collections, significantly improving the content-consumption experience.

156 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...The original 128-dimensional SIFT descriptors were introduced by Lowe.2 In our experiment, we accomplish the extraction using the software implementation of Van de Sande, Gevers, and Snoek.3 We use a bag-ofwords model with a vocabulary of 500 words and perform the clustering process using the k-means algorithm....

    [...]

  • ...The original 128-dimensional SIFT descriptors were introduced by Lowe.(2) In our experiment, we accomplish the extraction using the software implementation of Van de Sande, Gevers, and Snoek....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.