scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: The proposed registration scheme has been tested using data from the Compact High Resolution Imaging Spectrometer (CHRIS) onboard the Project for On-Board Autonomy (Proba) satellite and demonstrates that the proposed method works well in areas with little variation in topography.
Abstract: Subpixel image registration is the key to successful image fusion and superresolution enhancement of multiangle satellite data. Multiangle image registration poses two main challenges: 1) Images captured at large view angles are susceptible to resolution change and blurring, and 2) local geometric distortion caused by topographic effects and/or platform instability may be important. In this paper, we propose a two-step nonrigid automatic registration scheme for multiangle satellite images. In the first step, control points (CPs) are selected in a preregistration process based on the scale-invariant feature transform (SIFT). However, the number of CPs obtained in this first step may be too few and/or CPs may be unevenly distributed. To remediate these problems, in a second step, the preliminary registered image is subdivided into chips of 64 × 64 pixels, and each chip is matched with a corresponding chip in the reference image using normalized cross correlation (NCC). By doing so, more CPs with better spatial distribution are obtained. Two criteria are applied during the generation of CPs to identify outliers. Selected SIFT and NCC CPs are used for defining a nonrigid thin-plate-spline model. The proposed registration scheme has been tested using data from the Compact High Resolution Imaging Spectrometer (CHRIS) onboard the Project for On-Board Autonomy (Proba) satellite. Experimental results demonstrate that the proposed method works well in areas with little variation in topography. Application in areas with more pronounced relief would require the use of orthorectified image data in order to achieve subpixel registration accuracy.

94 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...The arrow illustrates the increasing space scales [15]....

    [...]

  • ...Following the suggestion in [15], k is set to √ 2, which leads to a significant difference in successive scales....

    [...]

  • ...Scale-space extrema in the Difference of Gaussians are regarded as the most stable scale-invariant features [15]....

    [...]

  • ...the scale-invariant feature transform (SIFT) [11], [13]–[15]....

    [...]

Proceedings ArticleDOI
03 Dec 2010
TL;DR: The technique is a highly accurate sparse 3D reconstruction of underwater structures such as corals, constructed from synchronized high definition videos collected using a wide baseline stereo rig.
Abstract: Environmental change is a growing international concern, calling for the regular monitoring, studying and preserving of detailed information about the evolution of underwater ecosystems. For example, fragile coral reefs are exposed to various sources of hazards and potential destruction, and need close observation. Computer vision offers promising technologies to build 3D models of an environment from two-dimensional images. The state of the art techniques have enabled high-quality digital reconstruction of large-scale structures, e.g., buildings and urban environments, but only sparse representations or dense reconstruction of small objects have been obtained from underwater video and still imagery. The application of standard 3D reconstruction methods to challenging underwater environments typically produces unsatisfactory results. Accurate, full camera trajectories are needed to serve as the basis for dense 3D reconstruction. A highly accurate sparse 3D reconstruction is the ideal foundation on which to base subsequent dense reconstruction algorithms. In our application the models are constructed from synchronized high definition videos collected using a wide baseline stereo rig. The rig can be hand-held, attached to a boat, or even to an autonomous underwater vehicle. We solve this problem by employing a smoothing and mapping toolkit developed in our lab specifically for this type of application. The result of our technique is a highly accurate sparse 3D reconstruction of underwater structures such as corals.

94 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Given their scale and local affine invariance properties, we opt to use SIFT [15] or SURF [16] instead, as they constitute a better option for matching visual features...

    [...]

  • ...SIFT and SURF descriptor matching are quite reliable in many situations, yet RANSAC is needed to eliminate outliers due to erroneous stereo and temporal matching, as outliers are capable of introducing large error into the solution....

    [...]

  • ...Given their scale and local affine invariance properties, we opt to use SIFT [15] or SURF [16] instead, as they constitute a better option for matching visual features from varying poses....

    [...]

  • ...To deal with scale and affine distortions in SIFT, for example, keypoint patches are selected from difference-of-Gaussian images at various scales, for which the dominant gradient orientation and scale are stored....

    [...]

  • ...Our technique produces similar results whether we use SIFT or SURF, with SURF running significantly faster....

    [...]

Journal ArticleDOI
TL;DR: Modifications to the SIFT algorithm are proposed that substantially improve the repeatability of detection and effectiveness of matching under radial distortion, while preserving the original invariance to scale and rotation.
Abstract: Keypoint detection and matching is of fundamental importance for many applications in computer and robot vision. The association of points across different views is problematic because image features can undergo significant changes in appearance. Unfortunately, state-of-the-art methods, like the scale-invariant feature transform (SIFT), are not resilient to the radial distortion that often arises in images acquired by cameras with microlenses and/or wide field-of-view. This paper proposes modifications to the SIFT algorithm that substantially improve the repeatability of detection and effectiveness of matching under radial distortion, while preserving the original invariance to scale and rotation. The scale-space representation of the image is obtained using adaptive filtering that compensates the local distortion, and the keypoint description is carried after implicit image gradient correction. Unlike competing methods, our approach avoids image resampling (the processing is carried out in the original image plane), it does not require accurate camera calibration (an approximate modeling of the distortion is sufficient), and it adds minimal computational overhead. Extensive experiments show the advantages of our method in establishing point correspondence across images with radial distortion.

94 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...The reasons for new detections are explained in [3]....

    [...]

  • ...2) Measuring Matching Performance: Two keypoints are considered to be a match iff the Euclidean distance between their SIFT descriptors is below a certain threshold λ [3], [7]....

    [...]

  • ...The original SIFT algorithm that is proposed by Lowe [3], despite being invariant to rotations on the plane P2 , is unable to handle the projective transformations due to camera rotation [14]....

    [...]

  • ...The scale-invariant feature transform (SIFT) [3] is arguably one of the most popular matching algorithms, being broadly used in robotics because of its invariance to common image transformations such as scale, rotation, and moderate viewpoint change [4], [5]....

    [...]

Journal ArticleDOI
TL;DR: It is demonstrated that the performance of specific object retrieval increases with the size of the vocabulary and that the large vocabularies increase the speed of the tf-idf scoring step.
Abstract: A novel similarity measure for bag-of-words type large scale image retrieval is presented. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming embedding. The novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard Oxford 5k, Oxford 105k and Paris datasets/protocols. We study the effect of a fine quantization and very large vocabularies (up to 64 million words) and show that the performance of specific object retrieval increases with the size of the vocabulary. This observation is in contradiction with previously published results. We further demonstrate that the large vocabularies increase the speed of the tf-idf scoring step.

93 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...However, even in the original work on SIFT descriptor matching (Lowe 2004) it is shown that the similarity of the descriptors is not only dependent on the distance of the descriptors, but also on the location of the features in the feature space....

    [...]

  • ...Most, if not all, recent stateof-the-art methods extend the bag-of-words representation introduced by Sivic and Zisserman (Sivic and Zisserman 2003) who represented the image by a histogram of “visual words”, i.e., discretized SIFT descriptors (Lowe 2004)....

    [...]

  • ...To avoid bias (by quantization errors, for example), instead of using the vector-quantized form of the descriptors, the conventional image matching (based on the full SIFT (Lowe 2004)) has to be used....

    [...]

  • ...4.2 Feature Tracks To avoid bias (by quantization errors, for example), instead of using the vector-quantized form of the descriptors, the conventional image matching (based on the full SIFT (Lowe 2004)) has to be used....

    [...]

Proceedings ArticleDOI
01 Jan 2011
TL;DR: The descriptor MiC is shown that encodes image microscopic configuration by a linear configuration model that could avoid the generalization problem suffered by other statistical learning methods.
Abstract: Texture classification can be concluded as the problem of classifying images according to textural cues, that is, categorizing a texture image obtained under certain illumination and viewpoint condition as belonging to one of the pre-learned texture classes. Therefore, it would mainly pass through two steps: image representation or description and classification. In this paper, we focus on the feature extraction part that aims to extract effective patterns to distinguish different textures. Among various feature extraction methods, local features have performed well in real-world applications, such as LBP[4], SIFT [2] and Histogram of Oriented Gradients (HOG) [1]. Representative methods also include grey level difference or co-occurrence statistics [10], and methods based on multi-channel filtering or wavelet decomposition [3, 5, 7]. To learn representative structural configuration from texture images, Varma et al. proposed texton methods based on the filter response space and local image patch space [8, 9]. We show in this paper the descriptor MiC that encodes image microscopic configuration by a linear configuration model. The final local configuration pattern (LCP) feature integrates both the microscopic features represented by optimal model parameters and local features represented by pattern occurrences. To be specific, microscopic features capture image microscopic configuration which embodies image configuration and pixel-wise interaction relationships by a linear model. The optimal model parameters are estimated by an efficient least squares estimator. To achieve rotation invariance, which is a desired property for texture features, Fourier transform is applied to the estimated parameter vectors. Finally, the transformed vectors are concatenated with local pattern occurrences to construct LCPs. As this framework is unsupervised, it could avoid the generalization problem suffered by other statistical learning methods. To model the image configuration with respect to each pattern, we estimate optimal weights, associating with intensities of neighboring pixels, to linearly reconstruct the central pixel intensity. This can be expressed by:

93 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.