scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Book ChapterDOI
08 Oct 2016
TL;DR: This paper proposes a novel deep learning-based approach to PROgressive Vehicle re-ID, called “PROVID”, which treats vehicle Re-Id as two specific progressive search processes: coarse-to-fine search in the feature space, and near- to-distantsearch in the real world surveillance environment.
Abstract: While re-identification (Re-Id) of persons has attracted intensive attention, vehicle, which is a significant object class in urban video surveillance, is often overlooked by vision community. Most existing methods for vehicle Re-Id only achieve limited performance, as they predominantly focus on the generic appearance of vehicle while neglecting some unique identities of vehicle (e.g., license plate). In this paper, we propose a novel deep learning-based approach to PROgressive Vehicle re-ID, called “PROVID”. Our approach treats vehicle Re-Id as two specific progressive search processes: coarse-to-fine search in the feature space, and near-to-distant search in the real world surveillance environment. The first search process employs the appearance attributes of vehicle for a coarse filtering, and then exploits the Siamese Neural Network for license plate verification to accurately identify vehicles. The near-to-distant search process retrieves vehicles in a manner like human beings, by searching from near to faraway cameras and from close to distant time. Moreover, to facilitate progressive vehicle Re-Id research, we collect to-date the largest dataset named VeRi-776 from large-scale urban surveillance videos, which contains not only massive vehicles with diverse attributes and high recurrence rate, but also sufficient license plates and spatiotemporal labels. A comprehensive evaluation on the VeRi-776 shows that our approach outperforms the state-of-the-art methods by 9.28 % improvements in term of mAP.

450 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...This method adopts the conventional SIFT as the local descriptor....

    [...]

  • ...The texture feature is represented by the conventional descriptors such as Scale-Invariant Feature Transform (SIFT) [21]....

    [...]

  • ...Table 1 shows the search results which demonstrate that the deep learned model is much better than the SIFT feature....

    [...]

  • ...The settings of the two models are as follows: (1) FACT + Plate-SIFT....

    [...]

  • ...To evaluate the Siamese neural network-based plate verification, we compare it with the conventional handcraft features, SIFT [21]....

    [...]

Proceedings ArticleDOI
01 Jan 2009
TL;DR: RANSAC (Random Sample Consensus) has been popular in regression problem with samples contaminated with outliers, but there are a few survey and performance analysis on them.
Abstract: RANSAC (Random Sample Consensus) has been popular in regression problem with samples contaminated with outliers. It has been a milestone of many researches on robust estimators, but there are a few survey and performance analysis on them. This paper categorizes them on their objectives: being accurate, being fast, and being robust. Performance evaluation performed on line fitting with various data distribution. Planar homography estimation was utilized to present performance in real data.

449 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...SIFT [17] were used to generate tentative corresponding points....

    [...]

Journal ArticleDOI
TL;DR: The goal is to provide a survey that will help researchers to better position their own work in the context of existing solutions, and to help newcomers and practitioners in computer graphics to quickly gain an overview of this vast field.
Abstract: This paper provides a comprehensive overview of urban reconstruction. While there exists a considerable body of literature, this topic is still under active research. The work reviewed in this survey stems from the following three research communities: computer graphics, computer vision and photogrammetry and remote sensing. Our goal is to provide a survey that will help researchers to better position their own work in the context of existing solutions, and to help newcomers and practitioners in computer graphics to quickly gain an overview of this vast field. Further, we would like to bring the mentioned research communities to even more interdisciplinary work, since the reconstruction problem itself is by far not solved.

445 citations

Journal ArticleDOI
TL;DR: This work presents a carefully designed dataset of video sequences of planar textures with ground truth, which includes various geometric changes, lighting conditions, and levels of motion blur, and presents a comprehensive quantitative evaluation of detector-descriptor-based visual camera tracking based on this testbed.
Abstract: Applications for real-time visual tracking can be found in many areas, including visual odometry and augmented reality. Interest point detection and feature description form the basis of feature-based tracking, and a variety of algorithms for these tasks have been proposed. In this work, we present (1) a carefully designed dataset of video sequences of planar textures with ground truth, which includes various geometric changes, lighting conditions, and levels of motion blur, and which may serve as a testbed for a variety of tracking-related problems, and (2) a comprehensive quantitative evaluation of detector-descriptor-based visual camera tracking based on this testbed. We evaluate the impact of individual algorithm parameters, compare algorithms for both detection and description in isolation, as well as all detector-descriptor combinations as a tracking solution. In contrast to existing evaluations, which aim at different tasks such as object recognition and have limited validity for visual tracking, our evaluation is geared towards this application in all relevant factors (performance measures, testbed, candidate algorithms). To our knowledge, this is the first work that comprehensively compares these algorithms in this context, and in particular, on video streams.

441 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...Lowe (1999, 2004) proposed to select the local extrema of an image filtered with differences of Gaussians, which are separable and hence faster to compute than the Laplacian....

    [...]

  • ...Affine-invariant detectors provide higher repeatability for large affine distortions (Lowe 2004; Mikolajczyk and Schmid 2002), but are typically expensive to compute (Mikolajczyk et al. 2005; Moreels and Perona 2007)....

    [...]

  • ...Like Lowe (2004)’s DoGs, the filters designed by Agrawal et al. (2008) aim at approximating a Laplacian of a Gaussian filter, though simplified further: In the first step, the filter is reduced to a bi-level filter, i.e., with filter values −1 and 1....

    [...]

  • .../known targets not specified Ferns RANSAC, P-n-P Se et al. (2002) SLAM/trinocular camera DoG [scale, orientation] LSE, Kalman filter Skrypnyk and Lowe (2004) tracking/known scene DoG SIFT RANSAC, non-lin....

    [...]

  • ...For each keypoint p, the SIFT algorithm (Lowe 1999; Lowe 2004) first assigns an orientation αp in order to make the descriptor invariant to image rotation: The gradient magnitude m and orientation α are computed for each pixel around p, and a histogram of these orientations, weighted by m and a Gaussian window around p, is computed....

    [...]

Proceedings ArticleDOI
Kaiming He1, Fang Wen1, Jian Sun1
23 Jun 2013
TL;DR: A novel Affinity-Preserving K-means algorithm which simultaneously performs k-mean clustering and learns the binary indices of the quantized cells and outperforms various state-of-the-art hashing encoding methods.
Abstract: In computer vision there has been increasing interest in learning hashing codes whose Hamming distance approximates the data similarity. The hashing functions play roles in both quantizing the vector space and generating similarity-preserving codes. Most existing hashing methods use hyper-planes (or kernelized hyper-planes) to quantize and encode. In this paper, we present a hashing method adopting the k-means quantization. We propose a novel Affinity-Preserving K-means algorithm which simultaneously performs k-means clustering and learns the binary indices of the quantized cells. The distance between the cells is approximated by the Hamming distance of the cell indices. We further generalize our algorithm to a product space for learning longer codes. Experiments show our method, named as K-means Hashing (KMH), outperforms various state-of-the-art hashing encoding methods.

437 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...The first dataset is SIFT1M from [10], containing 1 million 128-d SIFT features [17] and 10,000 independent queries....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.