scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes a keypoint-based framework to address the keyframe selection problem so that local features can be employed in selecting keyframes, and introduces two criteria, coverage and redundancy, based on keypoint matching in the selection process.
Abstract: Keyframe selection has been crucial for effective and efficient video content analysis. While most of the existing approaches represent individual frames with global features, we, for the first time, propose a keypoint-based framework to address the keyframe selection problem so that local features can be employed in selecting keyframes. In general, the selected keyframes should both be representative of video content and containing minimum redundancy. Therefore, we introduce two criteria, coverage and redundancy, based on keypoint matching in the selection process. Comprehensive experiments demonstrate that our approach outperforms the state of the art.

134 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...Lowe proposed to improve matching robustness by imposing the ratio test criterion [6] (i....

    [...]

  • ...Lowe’s SIFT descriptor [6] is utilized for keypoint extraction and representation, though many other local features [8]...

    [...]

  • ...Recently, local features, such as the scale-invariant feature transform (SIFT) descriptor [6], have played a significant role in many application domains of visual content analysis, such as object recognition and image classification, due to their distinctive representation capacity....

    [...]

Proceedings ArticleDOI
01 Oct 2014
TL;DR: A real-time non-intrusive monitoring system is developed, which detects the emotional states of the driver by analyzing facial expressions, and which operates very well on simulated data even with generic models.
Abstract: Monitoring the attentive and emotional status of the driver is critical for the safety and comfort of driving. In this work a real-time non-intrusive monitoring system is developed, which detects the emotional states of the driver by analyzing facial expressions. The system considers two negative basic emotions, anger and disgust, as stress related emotions. We detect an individual emotion in each video frame and the decision on the stress level is made on sequence level. Experimental results show that the developed system operates very well on simulated data even with generic models. An additional pose normalization step reduces the impact of pose mismatch due to camera setup and pose variation, and hence improves the detection accuracy further.

134 citations

Journal ArticleDOI
TL;DR: The nature of texts and inherent challenges addressed by word spotting methods are thoroughly examined and the use of retrieval enhancement techniques based on relevance feedback which improve the retrieved results are investigated.

134 citations

Journal ArticleDOI
TL;DR: This study highlights the utility of several off-the-shelf photogrammetric tools for the measurement of structural complexity across a range of scales relevant to ecologist and managers and provides important information on the accuracy and precision of these systems which should allow for their targeted use by non-experts in computer vision within these contexts.
Abstract: In tropical reef ecosystems corals are the key habitat builders providing most ecosystem structure, which influences coral reef biodiversity and resilience. Remote sensing applications have progressed significantly and photogrammetry together with application of structure from motion software is emerging as a leading technique to create three-dimensional (3D) models of corals and reefs from which biophysical properties of structural complexity can be quantified. This enables the addressing of a range of important marine research questions, such as what the role of habitat complexity is in driving key ecological processes (i.e., foraging). Yet, it is essential to assess the accuracy and precision of photogrammetric measurements to support their application in mapping, monitoring and quantifying coral reef form and structure. This study evaluated the precision (by repeated modeling) and accuracy (by comparison with laser reference models) of geometry and structural complexity metrics derived from photogrammetric 3D models of marine benthic habitat at two ecologically relevant spatial extents; individual coral colonies of a range of common morphologies and patches of reef area of 100s of square metres. Surface rugosity measurements were generally precise across all morphologies and spatial extents with average differences in the geometry of replicate models of 1–6 mm for coral colonies and 25 mm for the reef area. Precision decreased with complexity of the coral morphology, with metrics for small massive corals being the most precise (1% coefficient of variation (CV) in surface rugosity) and metrics for bottlebrush corals being the least precise (10% CV in surface rugosity). There was no indication however that precision was related to complexity for the patch-scale modelling. The 3D geometry of coral models differed by only 1–3 mm from laser reference models. However, high spatial variation in these differences around the model led to a consistent underestimation of surface rugosity values for all morphs of between 8% and 37%. This study highlights the utility of several off-the-shelf photogrammetry tools for the measurement of structural complexity across a range of scales relevant to ecologist and managers. It also provides important information on the accuracy and precision of these systems which should allow for their targeted use by non-experts in computer vision within these contexts.

133 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Feature detection in VisualSFM used a scale invariant feature transform (SIFT) algorithm [34] and generated a network of camera poses from which the sparse 3D point cloud of the model surface was generated....

    [...]

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A powerful flowchart named Hierarchical Part Matching (HPM) is proposed to cope with fine-grained classification tasks and achieves the state-of-the-art classification accuracy in the Caltech-UCSD-Birds-200-2011 dataset by making full use of the ground-truth part annotations.
Abstract: As a special topic in computer vision, fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with fine-grained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learning (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-of-the-art classification accuracy in the Caltech-UCSD-Birds-200-2011 dataset by making full use of the ground-truth part annotations.

133 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...We use the VLFeat [18] library to extract OppSIFT descriptors [17]....

    [...]

  • ...We extract SIFT descriptors [13] on the image and obtain a set of local descriptors D:...

    [...]

  • ...The description vector dm is a D-dimensional vector, where D = 3× 128 = 384 using OpponentSIFT (OppSIFT) [17] on RGB-images....

    [...]

  • ...SIFT[13] GCut[15] LLC[20] UCM-SG GPP[21] HSL Max-Pooling...

    [...]

  • ...Starting from raw image data, we first extract SIFT [13] descriptors as local features....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.