scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: A new computer vision-based method for automated 3D energy performance modeling of existing buildings using thermal and digital imagery captured by a single thermal camera that expedites the modeling process and has the potential to be used as a rapid and robust building diagnostic tool.

102 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Here, we use the GPU-based implementation [39] of the Scale-invariant feature transform (SIFT) keypoint detection [40]....

    [...]

Journal ArticleDOI
TL;DR: Thorough experiments suggest that the proposed saliency- inspired fast image retrieval scheme, S-sim, significantly speeds up online retrieval and outperforms the state-of-the-art BoW-based image retrieval schemes.
Abstract: The bag-of-visual-words (BoW) model is effective for representing images and videos in many computer vision problems, and achieves promising performance in image retrieval. Nevertheless, the level of retrieval efficiency in a large-scale database is not acceptable for practical usage. Considering that the relevant images in the database of a given query are more likely to be distinctive than ambiguous, this paper defines “database saliency” as the distinctiveness score calculated for every image to measure its overall “saliency” in the database. By taking advantage of database saliency, we propose a saliency- inspired fast image retrieval scheme, S-sim, which significantly improves efficiency while retains state-of-the-art accuracy in image retrieval . There are two stages in S-sim: the bottom-up saliency mechanism computes the database saliency value of each image by hierarchically decomposing a posterior probability into local patches and visual words, the concurrent information of visual words is then bottom-up propagated to estimate the distinctiveness, and the top-down saliency mechanism discriminatively expands the query via a very low-dimensional linear SVM trained on the top-ranked images after initial search, ranking images are then sorted on their distances to the decision boundary as well as the database saliency values. We comprehensively evaluate S-sim on common retrieval benchmarks, e.g., Oxford and Paris datasets. Thorough experiments suggest that, because of the offline database saliency computation and online low-dimensional SVM, our approach significantly speeds up online retrieval and outperforms the state-of-the-art BoW-based image retrieval schemes.

101 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...It is based on a hierarchical model, which separates a salient window into several small patches [21], and then into local features [22]....

    [...]

Journal ArticleDOI
TL;DR: This study extends the state of the art of deep learning convolutional neural network (CNN) to the classification of video images of echocardiography, aiming at assisting clinicians in diagnosis of heart diseases.

101 citations

Book ChapterDOI
05 Sep 2010
TL;DR: A joint energy functional is proposed that integrates spatial and temporal information from two subsequent image pairs subject to an unknown stereo setup and a normalisation of image and stereo constraints such that deviations from model assumptions can be interpreted in a geometrical way.
Abstract: We present a novel variational method for the simultaneous estimation of dense scene flow and structure from stereo sequences. In contrast to existing approaches that rely on a fully calibrated camera setup, we assume that only the intrinsic camera parameters are known. To couple the estimation of motion, structure and geometry, we propose a joint energy functional that integrates spatial and temporal information from two subsequent image pairs subject to an unknown stereo setup. We further introduce a normalisation of image and stereo constraints such that deviations from model assumptions can be interpreted in a geometrical way. Finally, we suggest a separate discontinuity-preserving regularisation to improve the accuracy. Experiments on calibrated and uncalibrated data demonstrate the excellent performance of our approach. We even outperform recent techniques for the rectified case that make explicit use of the simplified geometry.

101 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...To this end, we use a variant of the recent optical flow technique of [1] with constraint normalisation and SIFT matches [15] as prior....

    [...]

Journal ArticleDOI
TL;DR: A deep scene representation to achieve the invariance of CNN features and further enhance the discriminative power is proposed and, even with a simple linear classifier, can achieve the state-of-the-art performance.
Abstract: As a fundamental problem in earth observation, aerial scene classification tries to assign a specific semantic label to an aerial image. In recent years, the deep convolutional neural networks (CNNs) have shown advanced performances in aerial scene classification. The successful pretrained CNNs can be transferable to aerial images. However, global CNN activations may lack geometric invariance and, therefore, limit the improvement of aerial scene classification. To address this problem, this paper proposes a deep scene representation to achieve the invariance of CNN features and further enhance the discriminative power. The proposed method: 1) extracts CNN activations from the last convolutional layer of pretrained CNN; 2) performs multiscale pooling (MSP) on these activations; and 3) builds a holistic representation by the Fisher vector method. MSP is a simple and effective multiscale strategy, which enriches multiscale spatial information in affordable computational time. The proposed representation is particularly suited at aerial scenes and consistently outperforms global CNN activations without requiring feature adaptation. Extensive experiments on five aerial scene data sets indicate that the proposed method, even with a simple linear classifier, can achieve the state-of-the-art performance.

101 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ..., Gabor [42], local binary patterns (LBPs) [43], and scale-invariant feature transform (SIFT) [44]) and then build a holistic scene representation...

    [...]

  • ...In this paper, the last convolutional features can be pooled at multiple scales and encoded into a single FV just like SIFT....

    [...]

  • ...Traditionally, aerial scene classification methods rely on hand-crafted features for image description, such as Gabor [42], LBPs [43], and SIFT [44]....

    [...]

  • ...The low-level methods first extract handcrafted local features (e.g., Gabor [42], local binary patterns (LBPs) [43], and scale-invariant feature transform (SIFT) [44]) and then build a holistic scene representation by local descriptor encoding methods (e.g., BoVW [17], [18], SPM [2], VLAD [19], and FV [20])....

    [...]

  • ...These feature maps generated by deep convolutional layers are analogous to the local features (e.g., Gabor, LBP, and SIFT) in traditional scene classification methods [42]–[44]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.