scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Proceedings ArticleDOI
06 Nov 2011
TL;DR: It is shown that the proposed descriptor is not only invariant to monotonic intensity changes and image rotation but also robust to many other geometric and photometric transformations such as viewpoint change, image blur and JEPG compression.
Abstract: This paper presents a novel method for feature description based on intensity order. Specifically, a Local Intensity Order Pattern(LIOP) is proposed to encode the local ordinal information of each pixel and the overall ordinal information is used to divide the local patch into subregions which are used for accumulating the LIOPs respectively. Therefore, both local and overall intensity ordinal information of the local patch are captured by the proposed LIOP descriptor so as to make it a highly discriminative descriptor. It is shown that the proposed descriptor is not only invariant to monotonic intensity changes and image rotation but also robust to many other geometric and photometric transformations such as viewpoint change, image blur and JEPG compression. The proposed descriptor has been evaluated on the standard Oxford dataset and four additional image pairs with complex illumination changes. The experimental results show that the proposed descriptor obtains a significant improvement over the existing state-of-the-art descriptors.

340 citations


Cites background or methods or result from "Distinctive Image Features from Sca..."

  • ...In our experiments (Intel Core2 Quad CPU 2.83GHz), the average time for constructing a feature descriptor is: 2.1ms for SIFT, 3.8ms for DAISY, 5.3ms for HRI-CSLTP and 5.5ms for LIOP....

    [...]

  • ...It is worth noting that different from other methods such as [13, 11, 22, 9], we do not rotate the local patch according to the local consistent orientation(e....

    [...]

  • ...For example, Harris corner [10] and DoG(Difference of Gaussian) [13] for interest point detection, and Harris-affine [15], Hessian-affine [17], MSER(Maximally Stable Extremal Region) [14], IBR(Intensity-Based Region) and EBR(EdgeBased Region) [24] for affine covariant region detection....

    [...]

  • ...More recently, the local binary pattern [19] based methods are proposed and achieve high performance comparable to SIFT....

    [...]

  • ...Local image features have been widely used in many computer vision applications such as object recognition [13] , texture recognition [12], wide baseline matching [24], image retrieval [18] and panoramic image stitching [3]....

    [...]

Journal ArticleDOI
TL;DR: An overview on what has been done over the last decade in the new and emerging field of information forensics regarding theories, methodologies, state-of-the-art techniques, major applications, and to provide an outlook of the future is provided.
Abstract: In recent decades, we have witnessed the evolution of information technologies from the development of VLSI technologies, to communication and networking infrastructure, to the standardization of multimedia compression and coding schemes, to effective multimedia content search and retrieval. As a result, multimedia devices and digital content have become ubiquitous. This path of technological evolution has naturally led to a critical issue that must be addressed next, namely, to ensure that content, devices, and intellectual property are being used by authorized users for legitimate purposes, and to be able to forensically prove with high confidence when otherwise. When security is compromised, intellectual rights are violated, or authenticity is forged, forensic methodologies and tools are employed to reconstruct what has happened to digital content in order to answer who has done what, when, where, and how. The goal of this paper is to provide an overview on what has been done over the last decade in the new and emerging field of information forensics regarding theories, methodologies, state-of-the-art techniques, major applications, and to provide an outlook of the future.

340 citations

Proceedings ArticleDOI
03 Mar 2018
TL;DR: SIFT and BRISK are found to be the most accurate algorithms while ORB and BRK are most efficient and a benchmark for researchers, regardless of any particular area is set.
Abstract: Image registration is the process of matching, aligning and overlaying two or more images of a scene, which are captured from different viewpoints. It is extensively used in numerous vision based applications. Image registration has five main stages: Feature Detection and Description; Feature Matching; Outlier Rejection; Derivation of Transformation Function; and Image Reconstruction. Timing and accuracy of feature-based Image Registration mainly depend on computational efficiency and robustness of the selected feature-detector-descriptor, respectively. Therefore, choice of feature-detector-descriptor is a critical decision in feature-matching applications. This article presents a comprehensive comparison of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK algorithms. It also elucidates a critical dilemma: Which algorithm is more invariant to scale, rotation and viewpoint changes? To investigate this problem, image matching has been performed with these features to match the scaled versions (5% to 500%), rotated versions (0° to 360°), and perspective-transformed versions of standard images with the original ones. Experiments have been conducted on diverse images taken from benchmark datasets: University of OXFORD, MATLAB, VLFeat, and OpenCV. Nearest-Neighbor-Distance-Ratio has been used as the feature-matching strategy while RANSAC has been applied for rejecting outliers and fitting the transformation models. Results are presented in terms of quantitative comparison, feature-detection-description time, feature-matching time, time of outlier-rejection and model fitting, repeatability, and error in recovered results as compared to the ground-truths. SIFT and BRISK are found to be the most accurate algorithms while ORB and BRISK are most efficient. The article comprises rich information that will be very useful for making important decisions in vision based applications and main aim of this work is to set a benchmark for researchers, regardless of any particular area.

339 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Lowe for matching SIFT features in [13] and by K....

    [...]

  • ...In this article, SIFT (blobs) [13], SURF (blobs) [14], KAZE (blobs) [15], AKAZE (blobs) [16], ORB (corners) [17], and BRISK (corners) [18] algorithms are compared for image matching and registration....

    [...]

  • ...Lowe introduced Scale Invariant Feature Transform (SIFT) in 2004 [13], which is the most renowned featuredetection-description algorithm....

    [...]

Journal ArticleDOI
27 Jul 2014
TL;DR: This work proposes a method for automatically guiding patch-based image completion using mid-level structural cues by first estimates planar projection parameters, softly segments the known region into planes, and discovers translational regularity within these planes.
Abstract: We propose a method for automatically guiding patch-based image completion using mid-level structural cues. Our method first estimates planar projection parameters, softly segments the known region into planes, and discovers translational regularity within these planes. This information is then converted into soft constraints for the low-level completion algorithm by defining prior probabilities for patch offsets and transformations. Our method handles multiple planes, and in the absence of any detected planes falls back to a baseline fronto-parallel image completion algorithm. We validate our technique through extensive comparisons with state-of-the-art algorithms on a variety of scenes.

337 citations

Proceedings ArticleDOI
13 Jun 2010
TL;DR: OLDAMs have significantly higher discrimination between different targets than conventional holistic color histograms, and when integrated into a hierarchical association framework, they help improve the tracking accuracy, particularly reducing the false alarms and identity switches.
Abstract: We present an approach for online learning of discriminative appearance models for robust multi-target tracking in a crowded scene from a single camera. Although much progress has been made in developing methods for optimal data association, there has been comparatively less work on the appearance models, which are key elements for good performance. Many previous methods either use simple features such as color histograms, or focus on the discriminability between a target and the background which does not resolve ambiguities between the different targets. We propose an algorithm for learning a discriminative appearance model for different targets. Training samples are collected online from tracklets within a time sliding window based on some spatial-temporal constraints; this allows the models to adapt to target instances. Learning uses an Ad-aBoost algorithm that combines effective image descriptors and their corresponding similarity measurements. We term the learned models as OLDAMs. Our evaluations indicate that OLDAMs have significantly higher discrimination between different targets than conventional holistic color histograms, and when integrated into a hierarchical association framework, they help improve the tracking accuracy, particularly reducing the false alarms and identity switches.

333 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.