scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Book ChapterDOI
08 Oct 2016
TL;DR: It is shown that pre-trained convolutional neural network can perform better for this task than other machine vision methods aimed at photograph analysis and retrieval performance can be significantly improved by fine-tuning a network specifically for thistask.
Abstract: This paper examines how far state-of-the-art machine vision algorithms can be used to retrieve common visual patterns shared by series of paintings. The research of such visual patterns, central to Art History Research, is challenging because of the diversity of similarity criteria that could relevantly demonstrate genealogical links. We design a methodology and a tool to annotate efficiently clusters of similar paintings and test various algorithms in a retrieval task. We show that pre-trained convolutional neural network can perform better for this task than other machine vision methods aimed at photograph analysis. We also show that retrieval performance can be significantly improved by fine-tuning a network specifically for this task.

76 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...We computed the SIFT descriptors for every image of the dataset....

    [...]

  • ...The main class of algorithms used very successfully in the problem of visual instance retrieval are based on local visual descriptors (mainly SIFT [24])....

    [...]

  • ...However, previous works on cross-domain matching [5,12,33] have shown that while these methods perform well on photographs, the performance of SIFT across domains drops drastically....

    [...]

  • ...The extreme variability in patterns, style and colors seems to be too strong for a dictionary of SIFT descriptor to handle....

    [...]

Journal ArticleDOI
TL;DR: Extensive experiments show that SPHORB consistently outperforms other existing spherical features in accuracy, efficiency and robustness to camera movements, and has been validated by real-world matching tests.
Abstract: In this paper, we propose SPHORB, a new fast and robust binary feature detector and descriptor for spherical panoramic images. In contrast to state-of-the-art spherical features, our approach stems from the geodesic grid, a nearly equal-area hexagonal grid parametrization of the sphere used in climate modeling. It enables us to directly build fine-grained pyramids and construct robust features on the hexagonal spherical grid, thus avoiding the costly computation of spherical harmonics and their associated bandwidth limitation. We further study how to achieve scale and rotation invariance for the proposed SPHORB feature. Extensive experiments show that SPHORB consistently outperforms other existing spherical features in accuracy, efficiency and robustness to camera movements. The superior performance of SPHORB has also been validated by real-world matching tests.

76 citations

Journal ArticleDOI
TL;DR: A novel adaptive similarity measure which is consistent with k-nearest neighbor search is presented, and it is proved that it leads to a valid kernel if the original similarity function is a kernel function.

76 citations

Journal ArticleDOI
TL;DR: A novel intra-operative dense surface reconstruction framework that is capable of providing geometry information from only monocular MIS videos for geometry-aware AR applications such as site measurements and depth cues is presented.

76 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...Since ORB [40] is a binary feature point descriptor, it is an order of magnitude faster than SURF [1] and more than two orders faster than SIFT [27] with better accuracy....

    [...]

  • ...2016] [27] Lowe DG (2004) Distinctive image features from scale-invariant keypoints....

    [...]

  • ...Traditional tracking methods for AR in MIS usually involve feature points based tracking such as Scale-Invariant Feature Transform (SIFT) [18], Speeded Up Robust Features (SURF) [22], Optical Flow tracking [38] or other approaches specifically designed to work with soft tissues that account for changes in scale, rotation and brightness [31]....

    [...]

Posted Content
TL;DR: In this article, an extensive evaluation of visual descriptors for the content-based retrieval of remote sensing (RS) images is presented, which includes global hand-crafted, local handcrafted, and convolutional neural network (CNNs) features coupled with four different Content-Based Image Retrieval schemes.
Abstract: In this paper we present an extensive evaluation of visual descriptors for the content-based retrieval of remote sensing (RS) images. The evaluation includes global hand-crafted, local hand-crafted, and Convolutional Neural Network (CNNs) features coupled with four different Content-Based Image Retrieval schemes. We conducted all the experiments on two publicly available datasets: the 21-class UC Merced Land Use/Land Cover (LandUse) dataset and 19-class High-resolution Satellite Scene dataset (SceneSat). The content of RS images might be quite heterogeneous, ranging from images containing fine grained textures, to coarse grained ones or to images containing objects. It is therefore not obvious in this domain, which descriptor should be employed to describe images having such a variability. Results demonstrate that CNN-based features perform better than both global and and local hand-crafted features whatever is the retrieval scheme adopted. Features extracted from SatResNet-50, a residual CNN suitable fine-tuned on the RS domain, shows much better performance than a residual CNN pre-trained on multimedia scene and object images. Features extracted from NetVLAD, a CNN that considers both CNN and local features, works better than others CNN solutions on those images that contain fine-grained textures and objects.

76 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.