scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
06 Nov 2009
TL;DR: This work proposes a novel classifier-fusion scheme using learning algorithms, i.e. syntactic models, instead of the usual Bayesian or heuristic rules, to solve the problem of feature extractor and classifier combinations on DaimlerChrysler Automotive Dataset.
Abstract: This work proposes a novel classifier-fusion scheme using learning algorithms, i.e. syntactic models, instead of the usual Bayesian or heuristic rules. Moreover, this paper complements the previous comparative studies on DaimlerChrysler Automotive Dataset, offering a set of complementary experiments using feature extractor and classifier combinations. The experimental results provide evidence of the effectiveness of our methods regarding false positive rate, AUC, and accuracy, which reached 96.67%.

242 citations

Book ChapterDOI
06 Sep 2014
TL;DR: Experiments show that solely based on 3D context without any image region category classifier, the proposed whole-room context model can achieve a comparable performance with the state-of-the-art object detector, demonstrating that when the FOV is large, context is as powerful as object appearance.
Abstract: The field-of-view of standard cameras is very small, which is one of the main reasons that contextual information is not as useful as it should be for object detection. To overcome this limitation, we advocate the use of 360° full-view panoramas in scene understanding, and propose a whole-room context model in 3D. For an input panorama, our method outputs 3D bounding boxes of the room and all major objects inside, together with their semantic categories. Our method generates 3D hypotheses based on contextual constraints and ranks the hypotheses holistically, combining both bottom-up and top-down context information. To train our model, we construct an annotated panorama dataset and reconstruct the 3D model from single-view using manual annotation. Experiments show that solely based on 3D context without any image region category classifier, we can achieve a comparable performance with the state-of-the-art object detector. This demonstrates that when the FOV is large, context is as powerful as object appearance. All data and source code are available online.

242 citations

Journal ArticleDOI
TL;DR: This paper proposes a mutual context model to jointly model objects and human poses in human-object interaction activities and shows that the model outperforms state of the art in detecting very difficult objects and estimating human poses, as well as classifying human- object interaction activities.
Abstract: Detecting objects in cluttered scenes and estimating articulated human body parts from 2D images are two challenging problems in computer vision. The difficulty is particularly pronounced in activities involving human-object interactions (e.g., playing tennis), where the relevant objects tend to be small or only partially visible and the human body parts are often self-occluded. We observe, however, that objects and human poses can serve as mutual context to each other-recognizing one facilitates the recognition of the other. In this paper, we propose a mutual context model to jointly model objects and human poses in human-object interaction activities. In our approach, object detection provides a strong prior for better human pose estimation, while human pose estimation improves the accuracy of detecting the objects that interact with the human. On a six-class sports data set and a 24-class people interacting with musical instruments data set, we show that our mutual context model outperforms state of the art in detecting very difficult objects and estimating human poses, as well as classifying human-object interaction activities.

242 citations

Book ChapterDOI
08 Sep 2018
TL;DR: This work presents a system for keyframe-based dense camera tracking and depth map estimation that is entirely learned, and shows that generating a large number of pose hypotheses leads to more accurate predictions.
Abstract: We present a system for keyframe-based dense camera tracking and depth map estimation that is entirely learned. For tracking, we estimate small pose increments between the current camera image and a synthetic viewpoint. This significantly simplifies the learning problem and alleviates the dataset bias for camera motions. Further, we show that generating a large number of pose hypotheses leads to more accurate predictions. For mapping, we accumulate information in a cost volume centered at the current depth estimate. The mapping network then combines the cost volume and the keyframe image to update the depth prediction, thereby effectively making use of depth measurements and image-based priors. Our approach yields state-of-the-art results with few images and is robust with respect to noisy camera poses. We demonstrate that the performance of our 6 DOF tracking competes with RGB-D tracking algorithms.We compare favorably against strong classic and deep learning powered dense depth algorithms.

242 citations

Journal ArticleDOI
TL;DR: A novel approach is presented for copy-move forgery detection and localization based on the JLinkage algorithm, which performs a robust clustering in the space of the geometric transformation, which outperforms other similar state-of-the-art techniques.
Abstract: Understanding if a digital image is authentic or not, is a key purpose of image forensics. There are several different tampering attacks but, surely, one of the most common and immediate one is copy-move. A recent and effective approach for detecting copy-move forgeries is to use local visual features such as SIFT. In this kind of methods, SIFT matching is often followed by a clustering procedure to group keypoints that are spatially close. Often, this procedure could be unsatisfactory, in particular in those cases in which the copied patch contains pixels that are spatially very distant among them, and when the pasted area is near to the original source. In such cases, a better estimation of the cloned area is necessary in order to obtain an accurate forgery localization. In this paper a novel approach is presented for copy-move forgery detection and localization based on the JLinkage algorithm, which performs a robust clustering in the space of the geometric transformation. Experimental results, carried out on different datasets, show that the proposed method outperforms other similar state-of-the-art techniques both in terms of copy-move forgery detection reliability and of precision in the manipulated patch localization.

242 citations

References
More filters
Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations


"Distinctive Image Features from Sca..." refers background or methods in this paper

  • ...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....

    [...]

  • ...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....

    [...]

  • ...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....

    [...]

  • ...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...

    [...]

  • ...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....

    [...]

Book
01 Jan 2000
TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.
Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

15,558 citations

01 Jan 2001
TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.
Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

14,282 citations


"Distinctive Image Features from Sca..." refers background in this paper

  • ...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....

    [...]

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.