scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, a large number of airborne photographs of two different landslides have been acquired and used for ortho-mosaic and digital terrain model (DTM) generation, thus allowing for high-resolution landslide monitoring.
Abstract: In recent years, the application of unmanned aerial vehicles (UAVs) has become more common and the availability of lightweight digital cameras has enabled UAV-systems to represent affordable and practical remote sensing platforms, allowing flexible and high- resolution remote sensing investigations. In the course of numerous UAV-based remote sensing campaigns significant numbers of airborne photographs of two different landslides have been acquired. These images were used for ortho-mosaic and digital terrain model (DTM) generation, thus allowing for high-resolution landslide monitoring. Several new open source image- and DTM- processing tools are now providing a complete remote sensing working cycle with the use of no commercial hard- or software.

81 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ..., 2008) or by applying more sophisticated object- or feature-based matching techniques such as 'scaleinvariant feature transform' (SIFT) (Lowe, 2004)....

    [...]

  • ...Displacement analysis can also be managed by applying automated image matching using correlation-based methods (Leprince et al., 2008) or by applying more sophisticated object- or feature-based matching techniques such as 'scaleinvariant feature transform' (SIFT) (Lowe, 2004)....

    [...]

Book ChapterDOI
06 Sep 2014
TL;DR: The UNICT-FD889 dataset is introduced, the first food image dataset composed by over \(800\) distinct plates of food which can be used as benchmark to design and compare representation models of food images, and results confirm that both textures and colors are fundamental properties in food representation.
Abstract: It is well-known that people love food. However, an insane diet can cause problems in the general health of the people. Since health is strictly linked to the diet, advanced computer vision tools to recognize food images (e.g. acquired with mobile/wearable cameras), as well as their properties (e.g., calories), can help the diet monitoring by providing useful information to the experts (e.g., nutritionists) to assess the food intake of patients (e.g., to combat obesity). The food recognition is a challenging task since the food is intrinsically deformable and presents high variability in appearance. Image representation plays a fundamental role. To properly study the peculiarities of the image representation in the food application context, a benchmark dataset is needed. These facts motivate the work presented in this paper. In this work we introduce the UNICT-FD889 dataset. It is the first food image dataset composed by over \(800\) distinct plates of food which can be used as benchmark to design and compare representation models of food images. We exploit the UNICT-FD889 dataset for Near Duplicate Image Retrieval (NDIR) purposes by comparing three standard state-of-the-art image descriptors: Bag of Textons, PRICoLBP and SIFT. Results confirm that both textures and colors are fundamental properties in food representation. Moreover the experiments point out that the Bag of Textons representation obtained considering the color domain is more accurate than the other two approaches for NDIR.

80 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...We employ three standard state-of-the-art image descriptors as baseline in our tests: Bag of Textons [13], PRICoLBP [14] and SIFT features [15]....

    [...]

  • ...We benchmark the proposed dataset in the context of NDIR by using three standard state-of-the-art image descriptors: Bag of Textons [13], PRICoLBP [14] and SIFT [15]....

    [...]

  • ...Scale-Invariant Feature Transform (SIFT) [15] is one of the most popular descriptor used in computer vision....

    [...]

Journal ArticleDOI
TL;DR: In this article, a discriminant multiple coupled latent subspace framework is proposed to find the sets of projection directions for different poses such that the projected images of the same subject in different poses are maximally correlated in the latent space.

80 citations


Additional excerpts

  • ...Similarly, the associate-predict model [44] divides face images into patches and extracts LBP [39], SIFT [40], Gabor [41] and Learning based descriptors (LE) [49] as features....

    [...]

Proceedings ArticleDOI
29 Oct 2012
TL;DR: A statistical model for the distribution of incorrect detections output by an image matching algorithm is proposed, which results in a novel scoring criterion in which the weight of correlated keypoint matches is reduced, penalizing irrelevant logo detections.
Abstract: Detecting logos in photos is challenging. A reason is that logos locally resemble patterns frequently seen in random images. We propose to learn a statistical model for the distribution of incorrect detections output by an image matching algorithm. It results in a novel scoring criterion in which the weight of correlated keypoint matches is reduced, penalizing irrelevant logo detections. In experiments on two very different logo retrieval benchmarks, our approach largely improves over the standard matching criterion as well as other state-of-the-art approaches.

80 citations

Journal ArticleDOI
TL;DR: The adaptive deep sparse semantic modeling (ADSSM) framework combining sparse topics and deep features is proposed for HSR image scene classification and significantly improves the performance when compared with the other state-of-the-art methods.
Abstract: High spatial resolution (HSR) imagery scene classification, which involves labeling an HSR image with a specific semantic class according to the geographical properties, has received increased attention, and many algorithms have been proposed for this task. The employment of the probabilistic topic model to acquire latent topics and the convolutional neural networks (CNNs) to capture deep features for representing HSR images has been an effective ways to bridge the semantic gap. However, the midlevel topic features are usually local and significant, whereas the high-level deep features convey more global and detailed information. In this paper, to discover more discriminative semantics for HSR images, the adaptive deep sparse semantic modeling (ADSSM) framework combining sparse topics and deep features is proposed for HSR image scene classification. In ADSSM, the fully sparse topic model and a CNN are integrated. To exploit the multilevel semantics for HSR scenes, the sparse topic features and deep features are effectively fused at the semantic level. Based on the difference between the sparse topic features and the deep features, an adaptive feature normalization strategy is proposed to improve the fusion of the different features. The experimental results obtained with four HSR image classification data sets confirm that the proposed method significantly improves the performance when compared with the other state-of-the-art methods.

80 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...2) In ADSSM, the structural features are extracted by employing the SIFT descriptor [40], which is able to overcome affine transformation, noise, and changes...

    [...]

  • ...The number of training samples was then varied over the range of [80, 60, 40, 20, 10] for the UCM data set and [100, 80, 60, 40, 20] for the Google data set of SIRI-WHU....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.