scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors present the monitoring of surface movement patterns at the toe of the Potoska planina landslide, which lies at the tectonic contact between the Upper Carboniferous and Permian clastic rocks, and the Upper Triassic to Lower Jurassic carbonate rocks.
Abstract: This paper summarizes the observation of the Potoska planina landslide, which is located in the Karavanke mountain range in NW Slovenia. The landslide lies at the tectonic contact between the Upper Carboniferous and the Permian clastic rocks, and the Upper Triassic to Lower Jurassic carbonate rocks. Due to active tectonics, the clastic rocks are heavily deformed and, consequently, highly prone to fast and deep weathering. The carbonate rocks are also highly fissured due to tectonic disturbances, which result in large quantities of talus and scree material covering the part below the crown. A greater spatial density of springs and wetlands, supplied from the infiltration, is evident at the contact between scree and clastic rocks. Due to prevailing geological, tectonic and hydrological conditions, the Potoska planina area is highly prone to different slope mass movements. This paper presents the monitoring of surface movement patterns at the toe of the Potoska planina landslide. The sliding mass is composed of tectonically deformed and weathered Upper Carboniferous and Permian clastic rocks covered with a large amount of talus material, which is unstable and prone to landslides. Additionally, the Bela torrent causes significant erosion and increases the possibility of mobilization of the sliding mass downstream. Based on said conditions and field survey work, the toe of the landslide is considered to be the most active part of the landslide. In order to estimate surface movement patterns over a monitoring period of 22.5 months and five reconnaissance campaigns, periodic monitoring was conducted using unmanned aerial vehicle (UAV)-based photogrammetry, which provides high-resolution images and tachymetric geodetic measurements that enable accurate control of photogrammetric analysis of surface displacements. Using the results of said periodic monitoring, analyses of UAV-based displacement patterns, surface elevations and volume changes were all modelled for four observation periods. According to our results, the movement pattern at the toe of the Potoska planina landslide indicates a steadily downslope movement of the entire area with localized surges superficial slips.

76 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...Candidates were chosen based on the Euclidian distance of their descriptor vectors using the nearest neighbour algorithm (Lowe, 2004)....

    [...]

  • ...From the images, the SIFT algorithm extracts distinctive invariant feature points that can be used to perform reliable matching between views of an object and scene (Lowe, 1999; Lowe, 2004)....

    [...]

  • ...between views of an object and scene (Lowe, 1999; Lowe, 2004)....

    [...]

Book ChapterDOI
05 Sep 2010
TL;DR: The goal is to develop an algorithm to fuse different similarity measures for robust shape retrieval through a semi-supervised learning framework and it works directly on any given similarity measures/metrics.
Abstract: In this paper, we propose a new shape/object retrieval algorithm, co-transduction. The performance of a retrieval system is critically decided by the accuracy of adopted similarity measures (distances or metrics). Different types of measures may focus on different aspects of the objects: e.g. measures computed based on contours and skeletons are often complementary to each other. Our goal is to develop an algorithm to fuse different similarity measures for robust shape retrieval through a semi-supervised learning framework. We name our method co-transduction which is inspired by the co-training algorithm [1]. Given two similarity measures and a query shape, the algorithm iteratively retrieves the most similar shapes using one measure and assigns them to a pool for the other measure to do a re-ranking, and vice-versa. Using co-transduction, we achieved a significantly improved result of 97.72% on the MPEG-7 dataset [2] over the state-of-the-art performances (91% in [3], 93.4% in [4]). Our algorithm is general and it works directly on any given similarity measures/metrics; it is not limited to object shape retrieval and can be applied to other tasks for ranking/retrieval.

75 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...The image descriptor is a combination of Hessian-Affine region detector [27] and SIFT descriptor [28]....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a detailed review of prominent algorithms from the perspective of learning generalizable, flexible and efficient statistical eye models from a small number of training images, and organizes the discussion of the global aspects of eye localization in uncontrolled environments towards the development of a robust eye localization system.

75 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...They found that GLOH [73] and SIFT [66] are the two best performed descriptors among others in their settings....

    [...]

  • ..., Harr wavelets features, Gabor features, and those in the spatial domain, especially various gradient-based features, such as Local Binary Patterns (LBP, [2]), Scale Invariant Feature Transform (SIFT [66]) and Gradient Location-Orientation Histogram (GLOH [73])....

    [...]

  • ...Popular feature set descriptors include those in frequency domain, e.g., Harr wavelets features, Gabor features, and those in the spatial domain, especially various gradient-based features, such as Local Binary Patterns (LBP, [2]), Scale Invariant Feature Transform (SIFT [66]) and Gradient Location-Orientation Histogram (GLOH [73])....

    [...]

Book ChapterDOI
05 Sep 2010
TL;DR: A large margin framework to improve the discrimination of I2C distance especially for small number of local features by learning Per-Class Mahalanobis metrics is proposed and can significantly outperform the original NBNN in several prevalent image datasets.
Abstract: Image-To-Class (I2C) distance is first used in Naive-Bayes Nearest-Neighbor (NBNN) classifier for image classification and has successfully handled datasets with large intra-class variances. However, the performance of this distance relies heavily on the large number of local features in the training set and test image, which need heavy computation cost for nearest-neighbor (NN) search in the testing phase. If using small number of local features for accelerating the NN search, the performance will be poor. In this paper, we propose a large margin framework to improve the discrimination of I2C distance especially for small number of local features by learning Per-Class Mahalanobis metrics. Our I2C distance is adaptive to different class by combining with the learned metric for each class. These multiple Per-Class metrics are learned simultaneously by forming a convex optimization problem with the constraints that the I2C distance from each training image to its belonging class should be less than the distance to other classes by a large margin. A gradient descent method is applied to efficiently solve this optimization problem. For efficiency and performance improved, we also adopt the idea of spatial pyramid restriction and learning I2C distance function to improve this I2C distance. We show in experiments that the proposed method can significantly outperform the original NBNN in several prevalent image datasets, and our best results can achieve state-of-the-art performance on most datasets.

75 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...44 using SIFT [12] in our implementation....

    [...]

  • ...When using the same configuration, their approach is worse than ours, as either indicated in [22] as well as implemented by us using their published LibHIK1 code, which is 81.36± 0.54 using CENTRIST [23] and 78.66±0.44 using SIFT [12] in our implementation....

    [...]

  • ...For feature extraction, we use dense sampling strategy and SIFT features [12] as our descriptor, which are computed on a 16 × 16 patches over a grid with spacing of 8 pixels for all datasets....

    [...]

  • ...In [1], they extracted SIFT features using multi-scale patches densely sampled from each image, which result in much redundant features on the training set (about 15000 to 20000 features per image)....

    [...]

Journal ArticleDOI
TL;DR: This paper introduces a novel supervised cross-modality hashing framework, which can generate unified binary codes for instances represented in different modalities and significantly outperforms the state-of-the-art multimodality hashing techniques.
Abstract: With the dramatic development of the Internet, how to exploit large-scale retrieval techniques for multimodal web data has become one of the most popular but challenging problems in computer vision and multimedia. Recently, hashing methods are used for fast nearest neighbor search in large-scale data spaces, by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. Inspired by this, in this paper, we introduce a novel supervised cross-modality hashing framework, which can generate unified binary codes for instances represented in different modalities. Particularly, in the learning phase, each bit of a code can be sequentially learned with a discrete optimization scheme that jointly minimizes its empirical loss based on a boosting strategy. In a bitwise manner, hash functions are then learned for each modality, mapping the corresponding representations into unified hash codes. We regard this approach as cross-modality sequential discrete hashing (CSDH), which can effectively reduce the quantization errors arisen in the oversimplified rounding-off step and thus lead to high-quality binary codes. In the test phase, a simple fusion scheme is utilized to generate a unified hash code for final retrieval by merging the predicted hashing results of an unseen instance from different modalities. The proposed CSDH has been systematically evaluated on three standard data sets: Wiki, MIRFlickr, and NUS-WIDE, and the results show that our method significantly outperforms the state-of-the-art multimodality hashing techniques.

75 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.