scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: A new algorithm for estimating the relative translation and orientation of an inertial measurement unit and a camera, which does not require any additional hardware, except a piece of paper with a checkerboard pattern on it, which works well in practice, both for perspective and spherical cameras.
Abstract: This paper is concerned with the problem of estimating the relative translation and orientation of an inertial measurement unit and a camera, which are rigidly connected. The key is to realize that this problem is in fact an instance of a standard problem within the area of system identification, referred to as a gray-box problem. We propose a new algorithm for estimating the relative translation and orientation, which does not require any additional hardware, except a piece of paper with a checkerboard pattern on it. The method is based on a physical model which can also be used in solving, for example, sensor fusion problems. The experimental results show that the method works well in practice, both for perspective and spherical cameras.

87 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Common examples are SIFT [Lowe, 2004] and more recently SURF [Bay et al., 2008] and FERNS [Ozuysal et al., 2007]....

    [...]

  • ...Common examples are SIFT (Lowe 2004) and more recently SURF (Bay et al....

    [...]

Proceedings ArticleDOI
01 Dec 2010
TL;DR: A design overview of the ClassX system and the evaluation results of a 3-month pilot deployment demonstrate that the system is a low-cost, efficient and pragmatic solution to interactive online lecture viewing.
Abstract: ClassX is an interactive online lecture viewing system developed at Stanford University. Unlike existing solutions that restrict the user to watch only a pre-defined view, ClassX allows interactive pan/tilt/zoom while watching the video. The interactive video streaming paradigm avoids sending the entire field-of-view in the recorded high resolution, thus reducing the required data rate. To alleviate the navigation burden on the part of the online viewer, ClassX offers automatic tracking of the lecturer. ClassX also employs slide recognition technology, which allows automatic synchronization of digital presentation slides with those appearing in the lecture video. This paper presents a design overview of the ClassX system and the evaluation results of a 3-month pilot deployment at Stanford University. The results demonstrate that our system is a low-cost, efficient and pragmatic solution to interactive online lecture viewing.

87 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: If the photographer takes a burst of images, a modality available in virtually all modern digital cameras, it is shown that it is possible to combine them to get a clean sharp version without explicitly solving any blur estimation and subsequent inverse problem.
Abstract: Numerous recent approaches attempt to remove image blur due to camera shake, either with one or multiple input images, by explicitly solving an inverse and inherently ill-posed deconvolution problem. If the photographer takes a burst of images, a modality available in virtually all modern digital cameras, we show that it is possible to combine them to get a clean sharp version. This is done without explicitly solving any blur estimation and subsequent inverse problem. The proposed algorithm is strikingly simple: it performs a weighted average in the Fourier domain, with weights depending on the Fourier spectrum magnitude. The method's rationale is that camera shake has a random nature and therefore each image in the burst is generally blurred differently. Experiments with real camera data show that the proposed Fourier Burst Accumulation algorithm achieves state-of-the-art results an order of magnitude faster, with simplicity for on-board implementation on camera phones.

87 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Image correspondences are found using SIFT features [19] and then filtered out through the orsa algorithm [21], a variant of the so called ransac method [10]....

    [...]

Journal ArticleDOI
TL;DR: A novel airport detection and aircraft recognition method that is based on the two-layer visual saliency analysis model and support vector machines is proposed for high-resolution broad-area remote-sensing images and produces more robust results in complex scenes.
Abstract: Efficient airport detection and aircraft recognition are essential due to the strategic importance of these regions and targets in economic and military construction. In this paper, a novel airport detection and aircraft recognition method that is based on the two-layer visual saliency analysis model and support vector machines (SVMs) is proposed for high-resolution broad-area remote-sensing images. In the first layer saliency (FLS) model, we introduce a spatial-frequency visual saliency analysis algorithm that is based on a CIE Lab color space to reduce the interference of backgrounds and efficiently detect well-defined airport regions in broad-area remote-sensing images. In the second layer saliency model, we propose a saliency analysis strategy that is based on an edge feature preserving wavelet transform and high-frequency wavelet coefficient reconstruction to complete the pre-extraction of aircraft candidates from airport regions that are detected by the FLS and crudely extract as many aircraft candidates as possible for additional classification in detected airport regions. Then, we utilize feature descriptors that are based on a dense SIFT and Hu moment to accurately describe these features of the aircraft candidates. Finally, these object features are inputted to the SVM, and the aircraft are recognized. The experimental results indicate that the proposed method not only reliably and effectively detects targets in high-resolution broad-area remote-sensing images but also produces more robust results in complex scenes.

87 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...[38] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int....

    [...]

  • ...(16) The SIFT is a local feature descriptor that was initially presented by Lowe [38] as a feature of keypoints....

    [...]

Proceedings ArticleDOI
20 May 2018
TL;DR: This paper investigated the feasibility of processing surveillance video streaming at the network edge for real-time, uninterrupted moving human objects tracking, and an efficient multi-object tracking algorithm based on Kernelized Correlation Filters is proposed.
Abstract: Allowing computation to be performed at the edge of a network, edge computing has been recognized as a promising approach to address some challenges in the cloud computing paradigm, particularly to the delay-sensitive and mission-critical applications like real-time surveillance. Prevalence of networked cameras and smart mobile devices enable video analytics at the network edge. However, human objects detection and tracking are still conducted at cloud centers, as real-time, online tracking is computationally expensive. In this paper, we investigated the feasibility of processing surveillance video streaming at the network edge for real-time, uninterrupted moving human objects tracking. Moving human detection based on Histogram of Oriented Gradients (HOG) and linear Support Vector Machine (SVM) is illustrated for features extraction, and an efficient multi-object tracking algorithm based on Kernelized Correlation Filters (KCF) is proposed. Implemented and tested on Raspberry Pi 3, our experimental results are very encouraging, which validated the feasibility of the proposed approach toward a real-time surveillance solution at the edge of networks.

86 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...Grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection [7] and HOG+SVM algorithm [14] has better performance in human detection....

    [...]

  • ...Scale invariance feature transformation (SIFT) provides an alternative algorithm for human detection through extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene [14]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.