scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Proceedings Article
01 Nov 2012
TL;DR: A new metric learning algorithm is proposed, which alleviates this limitation by considering both positive and negative constraints and use them effi ciently to learn a discriminative latent space.
Abstract: This paper proposes a new approach for Cross Modal Matching, i.e. the matching of patterns represented in di erent modalities, when pairs of same/di erent data are available for training (e.g. faces of same/di erent persons). In this situation, standard approaches such as Partial Least Squares (PLS) or Canonical Correlation Analysis (CCA), map the data into a common latent space that maximizes the covariance, using the information brought by positive pairs only. Our contribution is a new metric learning algorithm, which alleviates this limitation by considering both positive and negative constraints and use them effi ciently to learn a discriminative latent space. The contribution is validated on several datasets for which the proposed approach consistently outperforms PLS/CCA as well as more recent discriminative approaches.

77 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...They are SIFT [29] features extracted on 9 face key-points (eye corners, nose corners, nose tip and mouth corners) at 3 different scales....

    [...]

Journal ArticleDOI
TL;DR: An unsupervised image GPS location estimation approach with hierarchical global feature clustering and local feature refinement and the effectiveness of the proposed hierarchical structure and inverted file structure is demonstrated.
Abstract: Social media has become a very popular way for people to share their photos with friends. Because most of the social images are attached with GPS (geo-tags), a photo's GPS information can be estimated with the help of the large geo-tagged image set while using a visual searching based approach. This paper proposes an unsupervised image GPS location estimation approach with hierarchical global feature clustering and local feature refinement. It consists of two parts: an offline system and an online system. In the offline system, a hierarchical structure is constructed for a large-scale offline social image set with GPS information. Representative images are selected for each GPS location refined cluster, and an inverted file structure is proposed. In the online system, when given an input image, its GPS information can be estimated by hierarchical global clusters selection and local feature refinement in the online system. Both the computational cost and GPS estimation performance demonstrates the effectiveness of the proposed hierarchical structure and inverted file structure in our approach.

77 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Remove images without matched image from ; the image with most matched images in ; Update: while is not null the image with most matched images in ; Number of SIFT features in ; for Count the number of matched SIFT point between image and image ; end if then image can be viewed as near duplicate with image update: otherwise image is assigned as a representative image for the centroid, update: end Output: representative images for the GPS location refined centroid...

    [...]

  • ...3) Scale Invariant Feature Transform (SIFT): The images could be further described via the local interest point descriptors given by SIFT [5]....

    [...]

  • ...There are too many SIFT points in the background, and only a few on themountain, so local feature refinement cannot improve the GPS estimation performance....

    [...]

  • ...If two images have sufficient matched SIFT point pairs[7], [36], they are considered a match....

    [...]

  • ...Then for the offline dataset, each SIFT point is quantized into one of the centroids....

    [...]

Journal Article
TL;DR: A description of an easy to use system, which is able to accomplish the reconstruction of an object in the form of a realistically textured 3D model from images taken with an uncalibrated camera, and a survey of all steps of the reconstruction pipeline.
Abstract: The problem addressed in this paper is the reconstruction of an object in the form of a realistically textured 3D model from images taken with an uncalibrated camera. We especially focus on reconstructions from short image sequences. By means of a description of an easy to use system, which is able to accomplish this in a fast and reliable way, we give a survey of all steps of the reconstruction pipeline. For the purpose of developing a coherent reconstruction system it is necessary to integrate a number of different techniques such as feature detection, algorithms of the RANSAC-family, and methods for auto-calibration. We describe and review recent developments of distinct strands of these techniques. While developing our system the necessity of improvements of several steps of the state-of-the-art reconstruction pipeline emerged. Two of these innovations are introduced in detail in this paper: an advanced SIFT-based feature detector and a two-stage RANSAC process facilitating a faster selection of relevant object points. In addition, we give a recommendation regarding auto-calibration for short image sequences.

77 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...In [26], the descriptor is rotated after its computation to fit that main orientation....

    [...]

  • ...The most prominent and widely successful one is the SIFT feature detector [26]....

    [...]

Posted Content
TL;DR: This paper provides a comprehensive review of current approaches to build appearance descriptors for person re-identication, and the most relevant techniques are described in detail, and categorised according to the body models and features used.
Abstract: In video-surveillance, person re- identication is the task of recognising whether an individual has already been observed over a network of cameras. Typically, this is achieved by exploiting the clothing appearance, as classical biometric traits like the face are impractical in real-world video surveil- lance scenarios. Clothing appearance is represented by means of low-level local and/or global features of the image, usually extracted according to some part- based body model to treat dierent body parts (e.g. torso and legs) independently. This paper provides a comprehensive review of current approaches to build appearance descriptors for person re-identication. The most relevant techniques are described in detail, and categorised according to the body models and features used. The aim of this work is to provide a structured body of knowledge and a starting point for researchers willing to conduct novel investigations on this challenging topic.

77 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...The most famous among them is SIFT (Scale Invariant Feature Transform) [68], where at first salient points of the image are chosen via in interest operator that looks for “stable” locations in...

    [...]

  • ...The most famous among them is SIFT (Scale Invariant Feature Transform) [68], where at first salient points of the image are chosen via in interest operator that looks for “stable” locations in the image (i.e. locations that are identifiable over different scales and rotations)....

    [...]

Journal ArticleDOI
Wei Yang1, Zhentai Lu1, Mei Yu1, Meiyan Huang1, Qianjin Feng1, Wufan Chen1 
TL;DR: Preliminary results demonstrate that the BoW representation is effective and feasible for retrieval of liver lesions in contrast-enhanced CT images.
Abstract: This paper is aimed at developing and evaluating a content-based retrieval method for contrast-enhanced liver computed tomographic (CT) images using bag-of-visual-words (BoW) representations of single and multiple phases. The BoW histograms are extracted using the raw intensity as local patch descriptor for each enhance phase by densely sampling the image patches within the liver lesion regions. The distance metric learning algorithms are employed to obtain the semantic similarity on the Hellinger kernel feature map of the BoW histograms. The different visual vocabularies for BoW and learned distance metrics are evaluated in a contrast-enhanced CT image dataset comprised of 189 patients with three types of focal liver lesions, including 87 hepatomas, 62 cysts, and 60 hemangiomas. For each single enhance phase, the mean of average precision (mAP) of BoW representations for retrieval can reach above 90 % which is significantly higher than that of intensity histogram and Gabor filters. Furthermore, the combined BoW representations of the three enhance phases can improve mAP to 94.5 %. These preliminary results demonstrate that the BoW representation is effective and feasible for retrieval of liver lesions in contrast-enhanced CT images.

77 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...The most popular local patch descriptor is SIFT in the computer vision community [23]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.