scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: The system provides a graphical user interface for visual inspection of the individual steps of the pipeline, i.e., the structure-from-motion result, multi-view stereo depth maps, and rendering of scenes and meshes, and it allows to reconstruct large datasets containing some detailed regions with much higher resolution than the rest of the scene.

83 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Our system implements and jointly uses both SIFT [14] and SURF [15], which are among the top performing features in literature....

    [...]

Journal ArticleDOI
TL;DR: A new population game dynamics (InImDyn) is proposed which is motivated by the analogy with infection and immunization processes within a population of ''players,'' and it is proved that the evolution of the dynamics is governed by a quadratic Lyapunov function, representing the average population payoff.

83 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...We extract from each image a set of SIFT features [28], each of which is augmented with a scale s, an orientation θ, a position p in the image and a descriptor d (i....

    [...]

Journal ArticleDOI
TL;DR: This paper employs a graph-based query-specific fusion approach where multiple retrieval results are integrated and reordered based on a fused graph, capable of combining the strengths of local or holistic features adaptively for different inputs.
Abstract: In the analysis of histopathological images, both holistic (e.g., architecture features) and local appearance features demonstrate excellent performance, while their accuracy may vary dramatically when providing different inputs. This motivates us to investigate how to fuse results from these features to enhance the accuracy. Particularly, we employ content-based image retrieval approaches to discover morphologically relevant images for image-guided diagnosis, using holistic and local features, both of which are generated from the cell detection results by a stacked sparse autoencoder. Because of the dramatically different characteristics and representations of these heterogeneous features (i.e., holistic and local), their results may not agree with each other, causing difficulties for traditional fusion methods. In this paper, we employ a graph-based query-specific fusion approach where multiple retrieval results (i.e., rank lists) are integrated and reordered based on a fused graph. The proposed method is capable of combining the strengths of local or holistic features adaptively for different inputs. We evaluate our method on a challenging clinical problem, i.e., histopathological image-guided diagnosis of intraductal breast lesions, and it achieves $91.67\%$ classification accuracy on $120$ breast tissue images from $40$ patients.

83 citations


Additional excerpts

  • ...For local feature, 1500–2000 SIFT descriptors [36] are extracted from each image by detecting key points to describe the cell appearance....

    [...]

Proceedings ArticleDOI
01 May 2014
TL;DR: A visual place recognition algorithm which uses only straight line features in challenging outdoor environments and is tested with a challenging real-world dataset with more than 10,000 database images acquired in urban driving scenarios.
Abstract: In this paper, we propose a visual place recognition algorithm which uses only straight line features in challenging outdoor environments. Compared to point features used in most existing place recognition methods, line features are easily found in man-made environments and more robust to environmental changes such as illumination, viewing direction, or occlusion. Candidate matches are found using a vocabulary tree and their geometric consistency is verified by a motion estimation algorithm using line segments. The proposed algorithm operates in real-time, and it is tested with a challenging real-world dataset with more than 10,000 database images acquired in urban driving scenarios.

83 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...In order to verify the retrieval performance of the vocabulary tree trained with MSLD line descriptors, we performed experimental comparisons with another vocabulary tree trained with SIFT in identical environments....

    [...]

  • ...Because the tested area was very structured, it is more reasonable to attribute this result to the experimented environment and not the performances of the SIFT or the MSLD. Additionally, we did another experiment for a comparison of the vocabulary trees under strong environmental changes....

    [...]

  • ...Section II describes algorithms for finding matching hypotheses using a vocabulary tree, and presents an experimental evaluation of the retrieval performance of the tree trained with line descriptors, by comparing it with another tree trained with SIFT....

    [...]

  • ...In building the vocabulary tree with the SIFT features, we used the same settings that are used to build the vocabulary tree with the MSLD features (i.e., the same eight videos, eight million SIFT features, k = 50, and l = 3)....

    [...]

  • ...With the strategy, we performed the evaluation varying parameters of line extraction and SIFT keypoint extraction (i.e., a minimum length threshold in line extraction and a maximum number of retaining keypoints in SIFT)....

    [...]

Posted Content
TL;DR: In this article, an end-to-end trainable convolutional neural network architecture is proposed to identify sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model.
Abstract: We address the problem of finding reliable dense correspondences between a pair of images This is a challenging task due to strong appearance differences between the corresponding scene elements and ambiguities generated by repetitive patterns The contributions of this work are threefold First, inspired by the classic idea of disambiguating feature matches using semi-local constraints, we develop an end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model Second, we demonstrate that the model can be trained effectively from weak supervision in the form of matching and non-matching image pairs without the need for costly manual annotation of point to point correspondences Third, we show the proposed neighbourhood consensus network can be applied to a range of matching tasks including both category- and instance-level matching, obtaining the state-of-the-art results on the PF Pascal dataset and the InLoc indoor visual localization benchmark

83 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.