scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Proceedings ArticleDOI
16 Apr 2013
TL;DR: This paper shows how the Map- Reduce paradigm can be applied to indexing algorithms and demonstrates that great scalability can be achieved using Hadoop, a popular Map-Reduce-based framework.
Abstract: Most researchers working on high-dimensional indexing agree on the following three trends: (i) the size of the multimedia collections to index are now reaching millions if not billions of items, (ii) the computers we use every day now come with multiple cores and (iii) hardware becomes more available, thanks to easier access to Grids and/or Clouds This paper shows how the Map-Reduce paradigm can be applied to indexing algorithms and demonstrates that great scalability can be achieved using Hadoop, a popular Map-Reduce-based framework Dramatic performance improvements are not however guaranteed a priori: such frameworks are rigid, they severely constrain the possible access patterns to data and scares resource RAM has to be shared Furthermore, algorithms require major redesign, and may have to settle for sub-optimal behavior The benefits, however, are many: simplicity for programmers, automatic distribution, fault tolerance, failure detection and automatic re-runs and, last but not least, scalability We share our experience of adapting a clustering-based high-dimensional indexing algorithm to the Map-Reduce model, and of testing it at large scale with Hadoop as we index 30 billion SIFT descriptors We foresee that lessons drawn from our work could minimize time, effort and energy invested by other researchers and practitioners working in similar directions

80 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...When the raw descriptor collection is on the order of terabytes, as is the case when indexing tens of millions of real world image using SIFT [18], then indexing may take days or even weeks....

    [...]

  • ...Many query images are visu­ally such that only a very small number of SIFT descrip­tors can be extracted from their contents, e.g., 1% of the images have less than 8 descriptors....

    [...]

  • ...We evaluate index creation and search using an image collection contain­ing roughly 100 million images, this is about 30 billion SIFT descriptors or about 4 terabytes of data....

    [...]

  • ...SIFT descriptors were then extracted from these images, result­ing in about 30 billion descriptors, i.e. 300 SIFT descriptors per image on average....

    [...]

  • ...Getting 100% accuracy is impossible as some image variants have zero SIFT descriptors (too dark e.g.)....

    [...]

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A supervised method that explores the structure learning techniques to design efficient hash functions and exploits the common local visual patterns occurring in video frames that are associated with the same semantic class, and simultaneously preserves the temporal consistency over successive frames from the same video.
Abstract: Recently, learning based hashing methods have become popular for indexing large-scale media data. Hashing methods map high-dimensional features to compact binary codes that are efficient to match and robust in preserving original similarity. However, most of the existing hashing methods treat videos as a simple aggregation of independent frames and index each video through combining the indexes of frames. The structure information of videos, e.g., discriminative local visual commonality and temporal consistency, is often neglected in the design of hash functions. In this paper, we propose a supervised method that explores the structure learning techniques to design efficient hash functions. The proposed video hashing method formulates a minimization problem over a structure-regularized empirical loss. In particular, the structure regularization exploits the common local visual patterns occurring in video frames that are associated with the same semantic class, and simultaneously preserves the temporal consistency over successive frames from the same video. We show that the minimization objective can be efficiently solved by an Accelerated Proximal Gradient (APG) method. Extensive experiments on two large video benchmark datasets (up to around 150K video clips with over 12 million frames) show that the proposed method significantly outperforms the state-of-the-art hashing methods.

80 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Since we apply the widely used Bag-of-Words (BoW) model with local SIFT [15] features for video representation in our formulation, such selected feature dimensions, i....

    [...]

  • ...Since we apply the widely used Bag-of-Words (BoW) model with local SIFT [15] features for video representation in our formulation, such selected feature dimensions, i.e., visual words, correspond to discriminative local visual patterns....

    [...]

  • ...For each key frame, we extract 128-dimensional SIFT features [15] over key points and perform BoW quantization to derive the image representations [16]....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a secure framework for outsourced privacy-preserving storage and retrieval in large shared image repositories based on IES-CBIR, a novel Image Encryption Scheme that exhibits Content-Based Image Retrieval properties.
Abstract: Storage requirements for visual data have been increasing in recent years, following the emergence of many highly interactive multimedia services and applications for mobile devices in both personal and corporate scenarios. This has been a key driving factor for the adoption of cloud-based data outsourcing solutions. However, outsourcing data storage to the Cloud also leads to new security challenges that must be carefully addressed, especially regarding privacy. In this paper we propose a secure framework for outsourced privacy-preserving storage and retrieval in large shared image repositories. Our proposal is based on IES-CBIR, a novel Image Encryption Scheme that exhibits Content-Based Image Retrieval properties. The framework enables both encrypted storage and searching using Content-Based Image Retrieval queries while preserving privacy against honest-but-curious cloud administrators. We have built a prototype of the proposed framework, formally analyzed and proven its security properties, and experimentally evaluated its performance and retrieval precision. Our results show that IES-CBIR is provably secure, allows more efficient operations than existing proposals, both in terms of time and space complexity, and paves the way for new practical application scenarios.

80 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...SSE [17] IDI þ IDvwI OðFEI þ ClusterfvIþ put(vwI )) OðEAESðIÞþ put(CI ) þ get(Idx) þ OðjCBj þ jvwj jRepjÞ Local Color DOPEðIdxÞ þ FEI þ ClusterfvI þ UpdatevwI ðIdxÞ þ EOPEðIdxÞ þ put(Idx)) PKHE [15] IDI+sizeI þ IDvwI OðEPaillierðIÞ þ put(CI )) (OðEPaillier(I) þ put(CI )) – SIFT This Work IDI+sizeI þ IDvwI OðEIES CBIRðIÞ þ put(CI )) OðEIES CBIRðIÞ þ put(CI )) – Global Color the CBIR algorithms used in each work: local color histograms [17], SIFT [33], and global color histograms [34]....

    [...]

  • ...In this experiment, PKHE achieved the best result, as expected due to the use of the SIFT retrieval algorithm [33]....

    [...]

  • ...grams [17], SIFT [33], and global color histograms [34]....

    [...]

  • ...SIFT features were originally designed for object-recognition, and we believe that their use to search by example in image repositories (such as the ones used in our experiments and in the literature) does not leverage its full potential....

    [...]

  • ...Retrieval precision results for the PKHE system (in both experiments) were not substantially different from the other systems, even though it uses strong texture-based image features (in particular, SIFT)....

    [...]

Journal ArticleDOI
TL;DR: A ceiling vision-based simultaneous localization and mapping (SLAM) methodology for solving the global localization problems in multirobot formations is proposed and an efficient data-association method is developed to achieve an optimistic feature match hypothesis quickly and accurately.
Abstract: Localization is a key issue in multirobot formations, but it has not yet been sufficiently studied. In this paper, we propose a ceiling vision-based simultaneous localization and mapping (SLAM) methodology for solving the global localization problems in multirobot formations. First, an efficient data-association method is developed to achieve an optimistic feature match hypothesis quickly and accurately. Then, the relative poses among the robots are calculated utilizing a match-based approach, for local localization. To achieve the goal of global localization, three strategies are proposed. The first strategy is to globally localize one robot only (i.e., leader) and then localize the others based on relative poses among the robots. The second strategy is that each robot globally localizes itself by implementing SLAM individually. The third strategy is to utilize a common SLAM server, which may be installed on one of the robots, to globally localize all the robots simultaneously, based on a shared global map. Experiments are finally performed on a group of mobile robots to demonstrate the effectiveness of the proposed approaches.

80 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...Se et al. [30] used a robust scale-invariant feature transform (SIFT) descriptors to associate features, and Davison et al. [31] employed patch matching algorithm and particle searching strategy for data association....

    [...]

  • ...Compared to SIFT [33], Harris corners are more accurate and efficient in textureless environment such as ceilings and walls....

    [...]

Journal ArticleDOI
TL;DR: The combination of these components within the pictorial structures framework results in a generic model that yields state-of-the-art performance for several datasets on a variety of tasks: people detection, upper body pose estimation, and full body Pose estimation.
Abstract: In this paper we consider people detection and articulated pose estimation, two closely related and challenging problems in computer vision. Conceptually, both of these problems can be addressed within the pictorial structures framework (Felzenszwalb and Huttenlocher in Int. J. Comput. Vis. 61(1):55---79, 2005; Fischler and Elschlager in IEEE Trans. Comput. C-22(1):67---92, 1973), even though previous approaches have not shown such generality. A principal difficulty for such a general approach is to model the appearance of body parts. The model has to be discriminative enough to enable reliable detection in cluttered scenes and general enough to capture highly variable appearance. Therefore, as the first important component of our approach, we propose a discriminative appearance model based on densely sampled local descriptors and AdaBoost classifiers. Secondly, we interpret the normalized margin of each classifier as likelihood in a generative model and compute marginal posteriors for each part using belief propagation. Thirdly, non-Gaussian relationships between parts are represented as Gaussians in the coordinate system of the joint between the parts. Additionally, in order to cope with shortcomings of tree-based pictorial structures models, we augment our model with additional repulsive factors in order to discourage overcounting of image evidence. We demonstrate that the combination of these components within the pictorial structures framework results in a generic model that yields state-of-the-art performance for several datasets on a variety of tasks: people detection, upper body pose estimation, and full body pose estimation.

80 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...(6) Boosted Part Detectors In our model, we represent the image evidence E by a densely computed grid of local image descriptors (e.g., shape context (Belongie et al. 2001) or SIFT, (Lowe 2004)—see Sect....

    [...]

  • ...In particular, we compute dense appearance representations based on local image descriptors [4,33,35], and use AdaBoost [19] to train discriminative part classifiers....

    [...]

  • ...The first interesting outcome of this experiment is that the original SIFT descriptor did not perform well compared to the results obtained with shape context....

    [...]

  • ...On the other hand, it shows that SIFT- and HOG-based detectors fail to benefit from a richer image description, which is perhaps due to the fact that properties such as texture do not generalize well across object instances....

    [...]

  • ...We compare the performance of shape context descriptors as previously used in [2] with SIFT descriptors [33], and edge templates obtained using the code from [38] and integrated into our pose estimation framework....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.