scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper presents a simple but effective scene classification approach based on the incorporation of a multi-resolution representation into a bag-of-features model and shows that the proposed approach performs competitively against previous methods across all data sets.

138 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...Lowe [17] proposed a scale invariant feature transform (SIFT), which was originally applied to perform reliable matching between different views of an object or scene....

    [...]

  • ...against visual descriptor variances, separate experiments were carried out with SIFT [17], LBP (Local Binary Pattern) [29,30], and WHGO (Weighted Histograms of Gradient Orientation) [31] descriptors for the OT, FP, and LS data sets....

    [...]

  • ...The bag-offeatures model samples an image efficiently with various local interest point detectors [16,17] or dense regions [11,15], and describes it with local descriptors [17,18]....

    [...]

  • ...This approach constructs multiple resolution images and extracts SIFT features [17] for all resolution images with dense regions....

    [...]

Proceedings ArticleDOI
Ji Lin1, Liangliang Ren1, Jiwen Lu1, Jianjiang Feng1, Jie Zhou1 
21 Jul 2017
TL;DR: This paper reaches the global optimal solution and balance the performance between different cameras by optimizing the similarity and association iteratively and shows that the method obtains significant performance improvement and outperforms the state-of-the-art methods by large margins.
Abstract: In this paper, we propose a consistent-aware deep learning (CADL) framework for person re-identification in a camera network. Unlike most existing person re-identification methods which identify whether two body images are from the same person, our approach aims to obtain the maximal correct matches for the whole camera network. Different from recently proposed camera network based re-identification methods which only consider the consistent information in the matching stage to obtain a global optimal association, we exploit such consistent-aware information under a deep learning framework where both feature representation and image matching are automatically learned with certain consistent constraints. Specifically, we reach the global optimal solution and balance the performance between different cameras by optimizing the similarity and association iteratively. Experimental results show that our method obtains significant performance improvement and outperforms the state-of-the-art methods by large margins.

138 citations

Proceedings ArticleDOI
10 Oct 2009
TL;DR: A solution to VPC is presented based upon a recently-developed visual feature known as CENTRIST (CENsus TRansform hISTogram), and a new dataset is described which is believed to be the first significant, realistic dataset for the VPC problem.
Abstract: In this paper we describe the problem of Visual Place Categorization (VPC) for mobile robotics, which involves predicting the semantic category of a place from image measurements acquired from an autonomous platform. For example, a robot in an unfamiliar home environment should be able to recognize the functionality of the rooms it visits, such as kitchen, living room, etc. We describe an approach to VPC based on sequential processing of images acquired with a conventional video camera. We identify two key challenges: Dealing with non-characteristic views and integrating restricted-FOV imagery into a holistic prediction. We present a solution to VPC based upon a recently-developed visual feature known as CENTRIST (CENsus TRansform hISTogram). We describe a new dataset for VPC which we have recently collected and are making publicly available. We believe this is the first significant, realistic dataset for the VPC problem. It contains the interiors of six different homes with ground truth labels. We use this dataset to validate our solution approach, achieving promising results.

138 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: A tree-based reassembly that greedily merges components while respecting the geometric constraints of the puzzle problem is proposed and has state-of-the-art performance for puzzle assembly, whether or not the orientation of the pieces is known.
Abstract: This paper introduces new types of square-piece jigsaw puzzles: those for which the orientation of each jigsaw piece is unknown. We propose a tree-based reassembly that greedily merges components while respecting the geometric constraints of the puzzle problem. The algorithm has state-of-the-art performance for puzzle assembly, whether or not the orientation of the pieces is known. Our algorithm makes fewer assumptions than past work, and success is shown even when pieces from multiple puzzles are mixed together. For solving puzzles where jigsaw piece location is known but orientation is unknown, we propose a pairwise MRF where each node represents a jigsaw piece's orientation. Other contributions of the paper include an improved measure (MGC) for quantifying the compatibility of potential jigsaw piece matches based on expecting smoothness in gradient distributions across boundaries.

137 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...and the second-smallest dissimilarity measure for that jigsaw piece’s edge (akin to SIFT feature matching [15])....

    [...]

Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work presents a novel algorithm that finds this bounding contour and achieves the segmentation of one object, given the fixation, in a cue independent manner and evaluates the performance of the proposed algorithm on challenging videos and stereo pairs.
Abstract: The human visual system observes and understands a scene/image by making a series of fixations. Every “fixation point” lies inside a particular region of arbitrary shape and size in the scene which can either be an object or just a part of it. We define as a basic segmentation problem the task of segmenting that region containing the “fixation point”. Segmenting this region is equivalent to finding the enclosing contour - a connected set of boundary edge fragments in the edge map of the scene - around the fixation. We present here a novel algorithm that finds this bounding contour and achieves the segmentation of one object, given the fixation. The proposed segmentation framework combines monocular cues (color/intensity/texture) with stereo and/or motion, in a cue independent manner. We evaluate the performance of the proposed algorithm on challenging videos and stereo pairs. Although the proposed algorithm is more suitable for an active observer capable of fixating at different locations in the scene, it applies to a single image as well. In fact, we show that even with monocular cues alone, the introduced algorithm performs as well or better than a number of image segmentation algorithms, when applied to challenging inputs.

137 citations


Cites background from "Distinctive Image Features from Sca..."

  • ...Fixation points amount to features in the scene and the recent literature on features comes in handy[18, 22]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.