scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Off-the-shelf CNN visual features are extracted from the CNN model, which is pretrained on ImageNet with more than one million images from 1000 object categories, as a generic image representation to tackle cross-modal retrieval.
Abstract: Recently, convolutional neural network (CNN) visual features have demonstrated their powerful ability as a universal representation for various recognition tasks. In this paper, cross-modal retrieval with CNN visual features is implemented with several classic methods. Specifically, off-the-shelf CNN visual features are extracted from the CNN model, which is pretrained on ImageNet with more than one million images from 1000 object categories, as a generic image representation to tackle cross-modal retrieval. To further enhance the representational ability of CNN visual features, based on the pretrained CNN model on ImageNet, a fine-tuning step is performed by using the open source Caffe CNN library for each target data set. Besides, we propose a deep semantic matching method to address the cross-modal retrieval problem with respect to samples which are annotated with one or multiple labels. Extensive experiments on five popular publicly available data sets well demonstrate the superiority of CNN visual features for cross-modal retrieval.

329 citations

Journal ArticleDOI
TL;DR: In this paper, a UAV was deployed over the debris-covered tongue of the Lirung Glacier in Nepal and the mass loss and surface velocity of the glacier were derived based on ortho-mosaics and digital elevation models.

328 citations

Journal ArticleDOI
TL;DR: It is argued that the key issue in search is the processing of image patches in the periphery, where visual representation is characterized by summary statistics computed over a sizable pooling region that predicts a set of classic search results, as well as peripheral discriminability of crowded patches such as those found in search displays.
Abstract: Vision is an active process: We repeatedly move our eyes to seek out objects of interest and explore our environment. Visual search experiments capture aspects of this process, by having subjects look for a target within a background of distractors. Search speed often correlates with target-distractor discriminability; search is faster when the target and distractors look quite different. However, there are notable exceptions. A given discriminability can yield efficient searches (where the target seems to "pop-out") as well as inefficient ones (where additional distractors make search significantly slower and more difficult). Search is often more difficult when finding the target requires distinguishing a particular configuration or conjunction of features. Search asymmetries abound. These puzzling results have fueled three decades of theoretical and experimental studies. We argue that the key issue in search is the processing of image patches in the periphery, where visual representation is characterized by summary statistics computed over a sizable pooling region. By quantifying these statistics, we predict a set of classic search results, as well as peripheral discriminability of crowded patches such as those found in search displays.

328 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...Recent work in computer vision has in fact suggested that certain object recognition tasks can be performed using texture-like descriptors, often known as a “bag of words” or “bag of features,” while for other tasks this descriptor is insufficient (Lowe, 2004)....

    [...]

Journal ArticleDOI
TL;DR: To detect the urban area and buildings in satellite images, the use of scale invariant feature transform (SIFT) and graph theoretical tools are proposed and very promising results on automatically detecting urban areas and buildings are reported.
Abstract: Very high resolution satellite images provide valuable information to researchers. Among these, urban-area boundaries and building locations play crucial roles. For a human expert, manually extracting this valuable information is tedious. One possible solution to extract this information is using automated techniques. Unfortunately, the solution is not straightforward if standard image processing and pattern recognition techniques are used. Therefore, to detect the urban area and buildings in satellite images, we propose the use of scale invariant feature transform (SIFT) and graph theoretical tools. SIFT keypoints are powerful in detecting objects under various imaging conditions. However, SIFT is not sufficient for detecting urban areas and buildings alone. Therefore, we formalize the problem in terms of graph theory. In forming the graph, we represent each keypoint as a vertex of the graph. The unary and binary relationships between these vertices (such as spatial distance and intensity values) lead to the edges of the graph. Based on this formalism, we extract the urban area using a novel multiple subgraph matching method. Then, we extract separate buildings in the urban area using a novel graph cut method. We form a diverse and representative test set using panchromatic 1-m-resolution Ikonos imagery. By extensive testings, we report very promising results on automatically detecting urban areas and buildings.

328 citations

Journal ArticleDOI
TL;DR: This paper compares several families of space hashing functions in a real setup and reveals that unstructured quantizer significantly improves the accuracy of LSH, as it closely fits the data in the feature space.

327 citations

References
More filters
Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations


"Distinctive Image Features from Sca..." refers background or methods in this paper

  • ...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....

    [...]

  • ...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....

    [...]

  • ...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....

    [...]

  • ...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...

    [...]

  • ...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....

    [...]

Book
01 Jan 2000
TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.
Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

15,558 citations

01 Jan 2001
TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.
Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

14,282 citations


"Distinctive Image Features from Sca..." refers background in this paper

  • ...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....

    [...]

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.