scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes a multiple kernel learning (MKL) algorithm that is based on the sparse representation-based classification (SRC) method that can perform significantly better than many competitive image classification algorithms.
Abstract: In this paper, we propose a multiple kernel learning (MKL) algorithm that is based on the sparse representation-based classification (SRC) method. Taking advantage of the nonlinear kernel SRC in efficiently representing the nonlinearities in the high-dimensional feature space, we propose an MKL method based on the kernel alignment criteria. Our method uses a two step training method to learn the kernel weights and sparse codes. At each iteration, the sparse codes are updated first while fixing the kernel mixing coefficients, and then the kernel mixing coefficients are updated while fixing the sparse codes. These two steps are repeated until a stopping criteria is met. The effectiveness of the proposed method is demonstrated using several publicly available image classification databases and it is shown that this method can perform significantly better than many competitive image classification algorithms.

96 citations

Proceedings ArticleDOI
06 Oct 2009
TL;DR: A strategy to efficiently denoise multi-images or video by using a complex image processing chain involving accurate registration, video equalization, noise estimation and the use of state-of-the-art denoising methods that can be estimated accurately from the image burst.
Abstract: Taking photographs under low light conditions with a hand-held camera is problematic. A long exposure time can cause motion blur due to the camera shaking and a short exposure time gives a noisy image. We consider the new technical possibility offered by cameras that take image bursts. Each image of the burst is sharp but noisy. In this preliminary investigation, we explore a strategy to efficiently denoise multi-images or video. The proposed algorithm is a complex image processing chain involving accurate registration, video equalization, noise estimation and the use of state-of-the-art denoising methods. Yet, we show that this complex chain may become risk free thanks to a key feature: the noise model can be estimated accurately from the image burst. Preliminary tests will be presented. On the technical side, the method can already be used to estimate a non parametric camera noise model from any image burst.

96 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...One of the most robust is the Scale Invariant Feature Transform (SIFT) [9]....

    [...]

  • ...To this aim we will introduce a precise variant of SIFT [9] and a generalization of ORSA (Optimized Random Sampling Algorithm, [10]) to register all the images together....

    [...]

Journal ArticleDOI
TL;DR: Bottom-up saliency influences cognitive processes as far removed from the sensory periphery as in the conscious choice of what an observer considers interesting in the selection and prioritization process by humans.
Abstract: Most natural scenes are too complex to be perceived instantaneously in their entirety. Observers therefore have to select parts of them and process these parts sequentially. We study how this selection and prioritization process is performed by humans at two different levels. One is the overt attention mechanism of saccadic eye movements in a free-viewing paradigm. The second is a conscious decision process in which we asked observers which points in a scene they considered the most interesting. We find in a very large participant population (more than one thousand) that observers largely agree on which points they consider interesting. Their selections are also correlated with the eye movement pattern of different subjects. Both are correlated with predictions of a purely bottom-up saliency map model. Thus, bottom-up saliency influences cognitive processes as far removed from the sensory periphery as in the conscious choice of what an observer considers interesting.

96 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...However, information that is important for object processing, such as figure–ground segmentation (Driver & Baylis, 1996) or invariant features (Lowe, 2004), is not accounted for by the saliency model....

    [...]

Book ChapterDOI
07 Oct 2012
TL;DR: A new architecture, denoted spatial pyramid matching on the semantic manifold (SPMSM), is proposed for scene recognition and is shown to achieve the best recognition rates in the literature for two large datasets and rates equivalent or superior to the state-of-the-art on a number of smaller datasets.
Abstract: A new architecture, denoted spatial pyramid matching on the semantic manifold (SPMSM), is proposed for scene recognition. SPMSM is based on a recent image representation on a semantic probability simplex, which is now augmented with a rough encoding of spatial information. A connection between the semantic simplex and a Riemmanian manifold is established, so as to equip the architecture with a similarity measure that respects the manifold structure of the semantic space. It is then argued that the closed-form geodesic distance between two manifold points is a natural measure of similarity between images. This leads to a conditionally positive definite kernel that can be used with any SVM classifier. An approximation of the geodesic distance reveals connections to the well-known Bhattacharyya kernel, and is explored to derive an explicit feature embedding for this kernel, by simple square-rooting. This enables a low-complexity SVM implementation, using a linear SVM on the embedded features. Several experiments are reported, comparing SPMSM to state-of-the-art recognition methods. SPMSM is shown to achieve the best recognition rates in the literature for two large datasets (MIT Indoor and SUN) and rates equivalent or superior to the state-of-the-art on a number of smaller datasets. In all cases, the resulting SVM also has much smaller dimensionality and requires much fewer support vectors than previous classifiers. This guarantees much smaller complexity and suggests improved generalization beyond the datasets considered.

96 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...The appearance representation was based on SIFT(5) descriptors [31], computed on an evenly-spaced 4×4 pixel grid....

    [...]

Proceedings ArticleDOI
25 Mar 2013
TL;DR: In this paper, the authors present quantitative evidence that this crucial design consideration to meet interactive performance criteria limits data center consolidation and describe an architectural solution that is a seamless extension of today's cloud computing infrastructure.
Abstract: The convergence of mobile computing and cloud computing enables new multimedia applications that are both resource-intensive and interaction-intensive. For these applications, end-to-end network bandwidth and latency matter greatly when cloud resources are used to augment the computational power and battery life of a mobile device. We first present quantitative evidence that this crucial design consideration to meet interactive performance criteria limits data center consolidation. We then describe an architectural solution that is a seamless extension of today's cloud computing infrastructure.

96 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...The system extracts key visual elements (SIFT features [10]) from an image, matches these against a database of features from a known set of objects, and finally performs geometric computations to determine the pose of the identified object....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.