scispace - formally typeset
Search or ask a question

Distinctive Image Features from Scale-Invariant Keypoints

01 Jan 2011-
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.
Citations
More filters
Journal ArticleDOI
TL;DR: The development of a low-cost UAV-LiDAR system and an accompanying workflow to produce 3D point clouds and a novel trajectory determination algorithm fusing observations from a GPS receiver, an Inertial Measurement Unit and a High Definition (HD) video camera are presented.
Abstract: We present the development of a low-cost Unmanned Aerial Vehicle-Light Detecting and Ranging (UAV-LiDAR) system and an accompanying workflow to produce 3D point clouds. UAV systems provide an unrivalled combination of high temporal and spatial resolution datasets. The TerraLuma UAV-LiDAR system has been developed to take advantage of these properties and in doing so overcome some of the current limitations of the use of this technology within the forestry industry. A modified processing workflow including a novel trajectory determination algorithm fusing observations from a GPS receiver, an Inertial Measurement Unit (IMU) and a High Definition (HD) video camera is presented. The advantages of this workflow are demonstrated using a rigorous assessment of the spatial accuracy of the final point clouds. It is shown that due to the inclusion of video the horizontal accuracy of the final point cloud improves from 0.61 m to 0.34 m (RMS error assessed against ground control). The effect of the very high density point clouds (up to 62 points per m2) produced by the UAV-LiDAR system on the measurement of tree location, height and crown width are also assessed by performing repeat surveys over individual isolated trees. The standard deviation of tree height is shown to reduce from 0.26 m, when using data with a density of 8 points perm2, to 0.15mwhen the higher density data was used. Improvements in the uncertainty of the measurement of tree location, 0.80 m to 0.53 m, and crown width, 0.69 m to 0.61 m are also shown.

570 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...The approach used in the matching of SIFT key features allows preliminary matches that are invariant to large changes in scale and rotation to be made [39]....

    [...]

  • ...The first stage of the SfM algorithm is then used to identify projections of the same features in space from two or more views using the Scale Invariant Feature Transform (SIFT) technique developed in [39]....

    [...]

Journal ArticleDOI
TL;DR: A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task, and significantly outperforms non-learnt image representations and off-the-shelf CNN descriptors on two challenging place recognition benchmarks.
Abstract: We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph We present the following four principal contributions First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the “Vector of Locally Aggregated Descriptors” image representation commonly used in image retrieval The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation Second, we create a new weakly supervised ranking loss, which enables end-to-end learning of the architecture's parameters from images depicting the same places over time downloaded from Google Street View Time Machine Third, we develop an efficient training procedure which can be applied on very large-scale weakly labelled tasks Finally, we show that the proposed architecture and training procedure significantly outperform non-learnt image representations and off-the-shelf CNN descriptors on challenging place recognition and image retrieval benchmarks

562 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ..., fðIÞ corresponds to extracting SIFT descriptors [20], followed by pooling into a bag-of-words vector [21] or a VLAD vector [24]), here we propose to learn the representation fðIÞ in an end-to-end manner, directly optimized for the task of place recognition....

    [...]

  • ...Recently, after publication of the first version of thiswork [79], [77] and [78] achieved even better results on the image retrieval tasks by using stronger supervision in the form of automatically cleaned-up image correspondences obtained with structure-from-motion, i.e., precise matching of RootSIFT descriptors and spatial verification....

    [...]

  • ...local feature based compact descriptor, which consists of VLAD pooling [24] with intra-normalization [23] on top of densely extracted RootSIFTs [20], [48]....

    [...]

  • ...7 compares the top ranked images of our method versus the best baseline (RootSIFT+ VLAD+whitening); additional examples are shown in the appendix, available online....

    [...]

  • ...5 also shows that our trained fVLAD representation with whitening based on VGG16 ( ) convincingly outperforms RootSIFT +VLAD+whitening, as well as the method of Torii et al. [10], and therefore sets the state-of-the-art for compact descriptors on all benchmarks....

    [...]

Proceedings ArticleDOI
16 Jun 2012
TL;DR: A method for building an image descriptor using distribution fields (DFs), a representation that allows smoothing the objective function without destroying information about pixel values, and experimental evidence on the superiority of the width of the basin of attraction around the global optimum of DFs over other descriptors are presented.
Abstract: Visual tracking of general objects often relies on the assumption that gradient descent of the alignment function will reach the global optimum. A common technique to smooth the objective function is to blur the image. However, blurring the image destroys image information, which can cause the target to be lost. To address this problem we introduce a method for building an image descriptor using distribution fields (DFs), a representation that allows smoothing the objective function without destroying information about pixel values. We present experimental evidence on the superiority of the width of the basin of attraction around the global optimum of DFs over other descriptors. DFs also allow the representation of uncertainty about the tracked object. This helps in disregarding outliers during tracking (like occlusions or small misalignments) without modeling them explicitly. Finally, this provides a convenient way to aggregate the observations of the object through time and maintain an updated model. We present a simple tracking algorithm that uses DFs and obtains state-of-the-art results on standard benchmarks.

561 citations


Cites methods from "Distinctive Image Features from Sca..."

  • ...In object detection and recognition, descriptors like HOG [9] and SIFT [17] use histograms of gradients....

    [...]

Journal ArticleDOI
13 Jun 2010
TL;DR: In this article, a novel optical flow estimation method is proposed, which reduces the reliance of the flow estimates on their initial values propagated from the coarser level and enables recovering many motion details in each scale.
Abstract: We discuss the cause of a severe optical flow estimation problem that fine motion structures cannot always be correctly reconstructed in the commonly employed multi-scale variational framework. Our major finding is that significant and abrupt displacement transition wrecks small-scale motion structures in the coarse-to-fine refinement. A novel optical flow estimation method is proposed in this paper to address this issue, which reduces the reliance of the flow estimates on their initial values propagated from the coarser level and enables recovering many motion details in each scale. The contribution of this paper also includes adaption of the objective function and development of a new optimization procedure. The effectiveness of our method is borne out by experiments for both large- and small-displacement optical flow estimation.

559 citations

Journal ArticleDOI
TL;DR: This paper investigates a simple but powerful approach to make robust use of HOG features for face recognition by proposing to extract HOG descriptors from a regular grid and identifying the necessity of performing dimensionality reduction to remove noise and make the classification process less prone to overfitting.

553 citations


Cites background or methods from "Distinctive Image Features from Sca..."

  • ...The algorithm for extracting HOGs (see Dalal and Triggs, 2005; Lowe, 2004) counts occurrences of edge orientations in a local neighborhood of an image....

    [...]

  • ...HOGs are extracted from a regular grid of non-overlapped patches covering the whole normalized image....

    [...]

  • ...1 shows an example patch with their corresponding HOGs....

    [...]

  • ...Recently, Histograms of Oriented Gradients (HOGs) have proven to be an effective descriptor for object recognition in general and face recognition in particular....

    [...]

  • ...Histograms of Oriented Gradients (HOGs) (Lowe, 2004) are image descriptors invariant to 2D rotation which have been used in many different problems in computer vision, such as pedestrian detection (Bertozzi et al., 2007; Wang and Lien, 2007; Chuang et al., 2008; Watanabe et al., 2009; Baranda et al., 2008; He et al., 2008; Kobayashi et al., 2008; Suard et al., 2006; Zhu et al., 2006; ll rights reserved....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations

Journal ArticleDOI
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Abstract: In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [Mikolajczyk, K and Schmid, C, 2004]. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [Belongie, S, et al., April 2002], steerable filters [Freeman, W and Adelson, E, Setp. 1991], PCA-SIFT [Ke, Y and Sukthankar, R, 2004], differential invariants [Koenderink, J and van Doorn, A, 1987], spin images [Lazebnik, S, et al., 2003], SIFT [Lowe, D. G., 1999], complex filters [Schaffalitzky, F and Zisserman, A, 2002], moment invariants [Van Gool, L, et al., 1996], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.

7,057 citations

Journal ArticleDOI
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.

3,422 citations

Trending Questions (1)
How can distinctive features theory be applied to elision?

The provided information does not mention anything about the application of distinctive features theory to elision.