scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes a latent-to-full palmprint matching system that is needed in forensics and uses minutiae as features and a robust algorithm to estimate ridge direction and frequency in palmprints facilitates minutia extraction even in poor quality palmprints.
Abstract: The evidential value of palmprints in forensics is clear as about 30% of the latents recovered from crime scenes are from palms. While palmprint-based personal authentication systems have been developed, they mostly deal with low resolution (about 100 ppi) palmprints and only perform full-to-full matching. We propose a latent-to-full palmprint matching system that is needed in forensics. Our system deals with palmprints captured at 500 ppi and uses minutiae as features. Latent palmprint matching is a challenging problem because latents lifted at crime scenes are of poor quality, cover small area of palms and have complex background. Other difficulties include the presence of many creases and a large number of minutiae in palmprints. A robust algorithm to estimate ridge direction and frequency in palmprints is developed. This facilitates minutiae extraction even in poor quality palmprints. A fixed-length minutia descriptor, MinutiaCode, is utilized to capture distinctive information around each minutia and an alignment-based matching algorithm is used to match palmprints. Two sets of partial palmprints (150 live-scan partial palmprints and 100 latents) are matched to a background database of 10,200 full palmprints to test the proposed system. Rank-1 recognition rates of 78.7% and 69%, respectively, were achieved for live-scan palmprints and latents.

332 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...However, this system has the following limitations: 1) SIFT features cannot be consistently detected in latents and full prints, 2) the minutiae extractor and matcher (VeriFinger 4.2) used in [16] are not suitable for latent palmprint matching, and 3) latent images were not used to evaluate the algorithms....

    [...]

  • ...The first stage of SIFT algorithm, as outlined in [18], constructs the scale space by obtaining Difference of Gaussian (DOG) images at various scales....

    [...]

  • ...texture) information, to improve the identification performance, an additional feature set - Scale Invariant Feature Transformation (SIFT) [18] - utilizing the texture information is used in conjunction with minutiae....

    [...]

  • ...In our experiments, we utilized the SIFT implementation by Lowe in [18], where the parameters, number of octaves, number of scale samples per octave are set to 5 and 3, respectively....

    [...]

  • ...A partial-to-full palmprint matching system was proposed in [16] that used both SIFT [17] and minutiae features in matching....

    [...]

Proceedings ArticleDOI
01 Sep 2009
TL;DR: A novel framework for visual saliency detection based on a simple principle: images sharing their global visual appearances are likely to share similar salience, which outperforms state-of-the-art approaches.
Abstract: We propose a novel framework for visual saliency detection based on a simple principle: images sharing their global visual appearances are likely to share similar salience. Assuming that an annotated image database is available, we first retrieve the most similar images to the target image; secondly, we build a simple classifier and we use it to generate saliency maps. Finally, we refine the maps and we extract thumbnails. We show that in spite of its simplicity, our framework outperforms state-of-the-art approaches. Another advantage is its ability to deal with visual pop-up and application/task-driven saliency, if appropriately annotated images are available.

331 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...In all the experiments, we used two types of low level descriptors: SIFT-like gradient orientation histograms [20] and simple local color statistics (RGB mean and variations)....

    [...]

Journal ArticleDOI
TL;DR: AffineSIFT (ASIFT), simulates a set of sample views of the initial images, obtainable by varying the two camera axis orientation parameters, namely the latitude and the longitude angles, which are not treated by the SIFT method.
Abstract: If a physical object has a smooth or piecewise smooth boundary, its images obtained by cameras in varying positions undergo smooth apparent deformations. These deformations are locally well approximated by affine transforms of the image plane. In consequencethe solid object recognition problem has often been led back to the computation of affine invariant image local features. The similarity invariance (invariance to translation, rotation, and zoom) is dealt with rigorously by the SIFT method The method illustrated and demonstrated in this work, AffineSIFT (ASIFT), simulates a set of sample views of the initial images, obtainable by varying the two camera axis orientation parameters, namely the latitude and the longitude angles, which are not treated by the SIFT method. Then it applies the SIFT method itself to all images thus generated. Thus, ASIFT covers effectively all six parameters of the affine transform. Source Code The source code (ANSI C), its documentation, and the online demo are accessible at the IPOL web page of this article 1 .

329 citations

Journal ArticleDOI
TL;DR: To detect the urban area and buildings in satellite images, the use of scale invariant feature transform (SIFT) and graph theoretical tools are proposed and very promising results on automatically detecting urban areas and buildings are reported.
Abstract: Very high resolution satellite images provide valuable information to researchers. Among these, urban-area boundaries and building locations play crucial roles. For a human expert, manually extracting this valuable information is tedious. One possible solution to extract this information is using automated techniques. Unfortunately, the solution is not straightforward if standard image processing and pattern recognition techniques are used. Therefore, to detect the urban area and buildings in satellite images, we propose the use of scale invariant feature transform (SIFT) and graph theoretical tools. SIFT keypoints are powerful in detecting objects under various imaging conditions. However, SIFT is not sufficient for detecting urban areas and buildings alone. Therefore, we formalize the problem in terms of graph theory. In forming the graph, we represent each keypoint as a vertex of the graph. The unary and binary relationships between these vertices (such as spatial distance and intensity values) lead to the edges of the graph. Based on this formalism, we extract the urban area using a novel multiple subgraph matching method. Then, we extract separate buildings in the urban area using a novel graph cut method. We form a diverse and representative test set using panchromatic 1-m-resolution Ikonos imagery. By extensive testings, we report very promising results on automatically detecting urban areas and buildings.

328 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...Lowe [19] proposed the SIFT for object detection based on its template image....

    [...]

  • ...SIFT is a local descriptor extraction method having valuable properties such as invariance to illumination and viewpoint [19]....

    [...]

  • ...More details on SIFT can be found in [19]....

    [...]

Posted Content
TL;DR: In this article, the authors investigate three key mathematical properties of representations: equivariance, invariance, and equivalence, and apply these properties to CNNs to reveal insightful aspects of their structure.
Abstract: Despite the importance of image representations such as histograms of oriented gradients and deep Convolutional Neural Networks (CNN), our theoretical understanding of them remains limited. Aiming at filling this gap, we investigate three key mathematical properties of representations: equivariance, invariance, and equivalence. Equivariance studies how transformations of the input image are encoded by the representation, invariance being a special case where a transformation has no effect. Equivalence studies whether two representations, for example two different parametrisations of a CNN, capture the same visual information or not. A number of methods to establish these properties empirically are proposed, including introducing transformation and stitching layers in CNNs. These methods are then applied to popular representations to reveal insightful aspects of their structure, including clarifying at which layers in a CNN certain geometric invariances are achieved. While the focus of the paper is theoretical, direct applications to structured-output regression are demonstrated too.

327 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.