scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Book ChapterDOI
05 Sep 2010
TL;DR: 3D building information is exploited by exploiting the nowadays often available 3D building data and massive street-view like image data for database creation to solve the problem of large scale place-of-interest recognition in cell phone images of urban scenarios.
Abstract: We address the problem of large scale place-of-interest recognition in cell phone images of urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-view like image data for database creation. Exploiting vanishing points in query images and thus fully removing 3D rotation from the recognition problem allows then to simplify the feature invariance to a pure homothetic problem, which we show leaves more discriminative power in feature descriptors than classical SIFT. We rerank visual word based document queries using a fast stratified homothetic verification that is tailored for repetitive patterns like window grids on facades and in most cases boosts the correct document to top positions if it was in the short list. Since we exploit 3D building information, the approach finally outputs the camera pose in real world coordinates ready for augmenting the cell phone image with virtual 3D information. The whole system is demonstrated to outperform traditional approaches on city scale experiments for different sources of street-view like image data and a challenging set of cell phone images.

93 citations

Proceedings ArticleDOI
01 Nov 2013
TL;DR: The success of the approach shows that the new air-ground matching algorithm can robustly handle extreme changes in viewpoint, illumination, perceptual aliasing, and over-season variations, thus, outperforming conventional visual place-recognition approaches.
Abstract: We tackle the problem of globally localizing a camera-equipped micro aerial vehicle flying within urban environments for which a Google Street View image database exists. To avoid the caveats of current image-search algorithms in case of severe viewpoint changes between the query and the database images, we propose to generate virtual views of the scene, which exploit the air-ground geometry of the system. To limit the computational complexity of the algorithm, we rely on a histogram-voting scheme to select the best putative image correspondences. The proposed approach is tested on a 2 km image dataset captured with a small quadroctopter flying in the streets of Zurich. The success of our approach shows that our new air-ground matching algorithm can robustly handle extreme changes in viewpoint, illumination, perceptual aliasing, and over-season variations, thus, outperforming conventional visual place-recognition approaches.

93 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Point feature detectors and descriptors—such as SIFT [17], BRISK [22], etc....

    [...]

  • ...A comparison between two images is done through the following pipeline: (i) SIFT [17] image features are extracted in both images; (ii) their descriptors are matched; (iii) outliers are rejected through verification of their geometric consistency via fundamental-matrix estimation (e....

    [...]

Journal ArticleDOI
Wenping Ma1, Jun Zhang1, Yue Wu1, Licheng Jiao1, Hao Zhu1, Wei Zhao1 
TL;DR: An effective coarse-to-fine strategy is introduced and a new two-step registration method based on deep and local features based on a convolutional neural network is developed, which can apparently increase the correct correspondences, can improve the ratio of correct Correspondences, and is highly robust and accurate.
Abstract: Automatic remote sensing image registration has achieved great accomplishment. However, it is still a vital challenging problem to develop a robust and accurate registration method due to the negative effects of noise and imaging differences between images. For these images, it is difficult to guarantee the accuracy and robustness at the same time for one-step registration methods. To address this issue, we introduce an effective coarse-to-fine strategy and develop a new two-step registration method based on deep and local features in this paper. The first step is to calculate the approximate spatial relationship, which is obtained by a convolutional neural network. This step makes full use of the deep features to match and can generate stable results. For the second step, a matching strategy considering spatial relationship is applied to the local feature-based method. In addition, this step adopts more accurate features in location to adjust the results of the previous step. A variety of homologous and multimodal remote sensing images, including optical, synthetic aperture radar, and general map images, are used to evaluate the proposed method. The comparison experiments demonstrate that our method can apparently increase the correct correspondences, can improve the ratio of correct correspondences, and is highly robust and accurate.

93 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...The registration process of the classic SIFT method contains five steps: scale-space extrema detection, keypoint localization, orientation assignment, keypoint descriptor, and keypoint matching [10]....

    [...]

  • ...In order to improve the stability of keypoints, Lowe [25] used the Taylor expansion of the scale-space function, D(x, y, σ ) D(x) = D + ∂ D T ∂x x + 1 2 xT ∂2 D ∂x2 x (3) x = (x, y, σ )T is the offset....

    [...]

  • ...Lowe recommends that the descriptors are computed by using gradient information of eight directions in a 4 × 4 window within the keypoint scale space....

    [...]

  • ...SIFT was first introduced by Lowe [25] in 1999 and then improved in 2004 [10]....

    [...]

  • ...Scale-invariant feature transform (SIFT) [10] is one of the most commonly used methods among point feature-based methods, and various improved SIFT-based methods are also widely used....

    [...]

Journal ArticleDOI
TL;DR: It is believed that an important contribution of this paper is to show that even a simple decoupled system can provide state-of-the-art performance on the PASCAL VOC 2007, PASCal VOC 2008 and MSRC 21 datasets.
Abstract: We consider the problem of semantic segmentation, i.e. assigning each pixel in an image to a set of pre-defined semantic object categories. State-of-the-art semantic segmentation algorithms typically consist of three components: a local appearance model, a local consistency model and a global consistency model. These three components are generally integrated into a unified probabilistic framework. While it enables at training time a joint estimation of the model parameters and while it ensures at test time a globally consistent labeling of the pixels, it also comes at a high computational cost. We propose a simple approach to semantic segmentation where the three components are decoupled (this journal submission is an extended version of the following conference paper: G. Csurka and F. Perronnin, "A simple high performance approach to semantic segmentation", BMVC, 2008). For the local appearance model, we make use of the Fisher kernel. While this framework was shown to lead to high accuracy for image classification, to our best knowledge this is its first application to the segmentation problem. The semantic segmentation process is then guided by a low-level segmentation which enforces local consistency. Finally, to enforce image-level consistency we use global image classifiers: if an image as a whole is unlikely to contain an object class, then the corresponding class is not considered in the segmentation pipeline. The decoupling of the components makes our system very efficient both at training and test time. An efficient training enables to estimate the model parameters on large quantities of data. Especially, we explain how our system can leverage weakly labeled data, i.e. images for which we do not have pixel-level labels but either object bounding boxes or even only image-level labels. We believe that an important contribution of this paper is to show that even a simple decoupled system can provide state-of-the-art performance on the PASCAL VOC 2007, PASCAL VOC 2008 and MSRC 21 datasets.

92 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...the output of filter banks), color statistics (histogram or moments) and SIFT (Lowe 2004)....

    [...]

  • ...The SIFT and color maps are then simply averaged for each category....

    [...]

  • ...We make use of two types of low-level descriptors: 128-dimensional SIFT features (Lowe 2004) and 96-dimensional color descriptors....

    [...]

  • ...The most popular descriptors include texture (i.e. the output of filter banks), color statistics (histogram or moments) and SIFT (Lowe 2004)....

    [...]

  • ...We thus obtain one pixel-level probability map per class per feature type, i.e. one for SIFT and one for color in our case....

    [...]

Proceedings ArticleDOI
07 Dec 2015
TL;DR: A fast scalable solution based on the Kernalized Correlation Filter (KCF) framework that integrates the fast HoG descriptors and Intel's Complex Conjugate Symmetric (CCS) packed format to boost the achievable frame rates.
Abstract: Correlation filters for long-term visual object tracking have recently seen great interest. Although they present competitive performance results, there is still a need for improving their tracking capabilities. In this paper, we present a fast scalable solution based on the Kernalized Correlation Filter (KCF) framework. We introduce an adjustable Gaussian window function and a keypoint-based model for scale estimation to deal with the fixed size limitation in the Kernelized Correlation Filter. Furthermore, we integrate the fast HoG descriptors and Intel's Complex Conjugate Symmetric (CCS) packed format to boost the achievable frame rates. We test our solution using the Visual Tracker Benchmark and the VOT Challenge datasets. We evaluate our tracker in terms of precision and success rate, accuracy, robustness and speed. The empirical evaluations demonstrate clear improvements by the proposed tracker over the KCF algorithm while ranking among the top state-of-the-art trackers.

92 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...It adopts SIFT [18] features and descriptors to match keypoints and uses single value decomposition to estimate position, scale and orientation of the matches....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.