scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

01 Nov 2004-International Journal of Computer Vision (Kluwer Academic Publishers)-Vol. 60, Iss: 2, pp 91-110
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A new procedure for static head-pose estimation and a new algorithm for visual 3-D tracking are presented and integrated into the novel real-time system for measuring the position and orientation of a driver's head.
Abstract: Driver distraction and inattention are prominent causes of automotive collisions. To enable driver-assistance systems to address these problems, we require new sensing approaches to infer a driver's focus of attention. In this paper, we present a new procedure for static head-pose estimation and a new algorithm for visual 3-D tracking. They are integrated into the novel real-time (30 fps) system for measuring the position and orientation of a driver's head. This system consists of three interconnected modules that detect the driver's head, provide initial estimates of the head's pose, and continuously track its position and orientation in six degrees of freedom. The head-detection module consists of an array of Haar-wavelet Adaboost cascades. The initial pose estimation module employs localized gradient orientation (LGO) histograms as input to support vector regressors (SVRs). The tracking module provides a fine estimate of the 3-D motion of the head using a new appearance-based particle filter for 3-D model tracking in an augmented reality environment. We describe our implementation that utilizes OpenGL-optimized graphics hardware to efficiently compute particle samples in real time. To demonstrate the suitability of this system for real driving situations, we provide a comprehensive evaluation with drivers of varying ages, race, and sex spanning daytime and nighttime conditions. To quantitatively measure the accuracy of system, we compare its estimation results to a marker-based cinematic motion-capture system installed in the automotive testbed.

273 citations

Proceedings ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a trajectory-pooled deep-convolutional descriptor (TDD) to combine hand-crafted features and deep-learned features.
Abstract: Visual features are of vital importance for human action understanding in videos. This paper presents a new video representation, called trajectory-pooled deep-convolutional descriptor (TDD), which shares the merits of both hand-crafted features and deep-learned features. Specifically, we utilize deep architectures to learn discriminative convolutional feature maps, and conduct trajectory-constrained pooling to aggregate these convolutional features into effective descriptors. To enhance the robustness of TDDs, we design two normalization methods to transform convolutional feature maps, namely spatiotemporal normalization and channel normalization. The advantages of our features come from (i) TDDs are automatically learned and contain high discriminative capacity compared with those hand-crafted features; (ii) TDDs take account of the intrinsic characteristics of temporal dimension and introduce the strategies of trajectory-constrained sampling and pooling for aggregating deep-learned features. We conduct experiments on two challenging datasets: HMDB51 and UCF101. Experimental results show that TDDs outperform previous hand-crafted features and deep-learned features. Our method also achieves superior performance to the state of the art on these datasets (HMDB51 65.9%, UCF101 91.5%).

273 citations

Journal ArticleDOI
TL;DR: In this article, the authors provide a theoretical framework for analyzing the robustness of classifiers to adversarial perturbations, and show fundamental upper bounds on the adversarial robustness.
Abstract: The goal of this paper is to analyze the intriguing instability of classifiers to adversarial perturbations (Szegedy et al., in: International conference on learning representations (ICLR), 2014). We provide a theoretical framework for analyzing the robustness of classifiers to adversarial perturbations, and show fundamental upper bounds on the robustness of classifiers. Specifically, we establish a general upper bound on the robustness of classifiers to adversarial perturbations, and then illustrate the obtained upper bound on two practical classes of classifiers, namely the linear and quadratic classifiers. In both cases, our upper bound depends on a distinguishability measure that captures the notion of difficulty of the classification task. Our results for both classes imply that in tasks involving small distinguishability, no classifier in the considered set will be robust to adversarial perturbations, even if a good accuracy is achieved. Our theoretical framework moreover suggests that the phenomenon of adversarial instability is due to the low flexibility of classifiers, compared to the difficulty of the classification task (captured mathematically by the distinguishability measure). We further show the existence of a clear distinction between the robustness of a classifier to random noise and its robustness to adversarial perturbations. Specifically, the former is shown to be larger than the latter by a factor that is proportional to $$\sqrt{d}$$ (with d being the signal dimension) for linear classifiers. This result gives a theoretical explanation for the discrepancy between the two robustness properties in high dimensional problems, which was empirically observed by Szegedy et al. in the context of neural networks. We finally show experimental results on controlled and real-world data that confirm the theoretical analysis and extend its spirit to more complex classification schemes.

272 citations

Journal ArticleDOI
TL;DR: In this article, a small, unpiloted aerial system was used to acquire aerial photographs and processing theses using structure-from-motion (SfM) photogrammetry.

272 citations

Proceedings ArticleDOI
30 Sep 2008
TL;DR: A first experimental evaluation conducted on a publicly available set of low-resolution videos in a commercial mall shows very promising inter-camera person re-identification performances and the matching method is very fast, making re- identification among hundreds of persons computationally feasible in less than ~ 1/5 second.
Abstract: We present and evaluate a person re-identification scheme for multi-camera surveillance system. Our approach uses matching of signatures based on interest-points descriptors collected on short video sequences. One of the originalities of our method is to accumulate interest points on several sufficiently time-spaced images during person tracking within each camera, in order to capture appearance variability. A first experimental evaluation conducted on a publicly available set of low-resolution videos in a commercial mall shows very promising inter-camera person re-identification performances (a precision of 82% for a recall of 78%). It should also be noted that our matching method is very fast: ~ 1/8s for re-identification of one target person among 10 previously seen persons, and a logarithmic dependence with the number of stored person models, making re- identification among hundreds of persons computationally feasible in less than ~ 1/5 second.

272 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...However, contrary to both of them, we do not use SIFT [8] detector and descriptor, but a locally-developped (see §2) and particularly efficient variant of SURF [9]....

    [...]

  • ...SURF itself is a recently proposed and extremely efficient alternative to the more classic and widely used interest point detector and descriptor SIFT [8]....

    [...]

  • ...However, contrary to both of them, we do not use SIFT [8] detector and descri ptor, but a locally-developped (see §2) and particularly effi cient variant of SURF [9]....

    [...]

  • ...SURF itself is a recently proposed and extremely efficient alternative to the more cla ssic and widely used interest point detector and descriptor SIFT [8]....

    [...]

References
More filters
Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations


"Distinctive Image Features from Sca..." refers background or methods in this paper

  • ...The initial implementation of this approach (Lowe, 1999) simply located keypoints at the location and scale of the central sample point....

    [...]

  • ...Earlier work by the author (Lowe, 1999) extended the local feature approach to achieve scale invariance....

    [...]

  • ...More details on applications of these features to recognition are available in other pape rs (Lowe, 1999; Lowe, 2001; Se, Lowe and Little, 2002)....

    [...]

  • ...To efficiently detect stable keypoint locations in scale space, we have proposed (Lowe, 1999) using scalespace extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ ), which can be computed from the difference of two nearby scales separated by a constant multiplicative…...

    [...]

  • ...More details on applications of these features to recognition are available in other papers (Lowe, 1999, 2001; Se et al., 2002)....

    [...]

Book
01 Jan 2000
TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.
Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

15,558 citations

01 Jan 2001
TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.
Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

14,282 citations


"Distinctive Image Features from Sca..." refers background in this paper

  • ...A more general solution would be to solve for the fundamental matrix (Luong and Faugeras, 1996; Hartley and Zisserman, 2000)....

    [...]

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.