scispace - formally typeset
Search or ask a question

Showing papers on "Scale-invariant feature transform published in 2009"


Journal ArticleDOI
TL;DR: A new texture feature called center-symmetric local binary pattern (CS-LBP) is introduced that is a modified version of the well-known localbinary pattern (LBP), and is computationally simpler than the SIFT.

1,172 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: This paper proposes to represent each frame of a video using a histogram of oriented optical flow (HOOF) and to recognize human actions by classifying HOOF time-series, and proposes a generalization of the Binet-Cauchy kernels to nonlinear dynamical systems (NLDS) whose output lives in a non-Euclidean space.
Abstract: System theoretic approaches to action recognition model the dynamics of a scene with linear dynamical systems (LDSs) and perform classification using metrics on the space of LDSs, e.g. Binet-Cauchy kernels. However, such approaches are only applicable to time series data living in a Euclidean space, e.g. joint trajectories extracted from motion capture data or feature point trajectories extracted from video. Much of the success of recent object recognition techniques relies on the use of more complex feature descriptors, such as SIFT descriptors or HOG descriptors, which are essentially histograms. Since histograms live in a non-Euclidean space, we can no longer model their temporal evolution with LDSs, nor can we classify them using a metric for LDSs. In this paper, we propose to represent each frame of a video using a histogram of oriented optical flow (HOOF) and to recognize human actions by classifying HOOF time-series. For this purpose, we propose a generalization of the Binet-Cauchy kernels to nonlinear dynamical systems (NLDS) whose output lives in a non-Euclidean space, e.g. the space of histograms. This can be achieved by using kernels defined on the original non-Euclidean space, leading to a well-defined metric for NLDSs. We use these kernels for the classification of actions in video sequences using (HOOF) as the output of the NLDS. We evaluate our approach to recognition of human actions in several scenarios and achieve encouraging results.

610 citations


Journal ArticleDOI
TL;DR: Experimental work demonstrates that the proposed mean shift/SIFT strategy improves the tracking performance of the classical mean shift and SIFT tracking algorithms in complicated real scenarios.

603 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: A framework for computing low bit-rate feature descriptors with a 20× reduction in bit rate is proposed and it is shown how to efficiently compute distances between descriptors in their compressed representation eliminating the need for decoding.
Abstract: Establishing visual correspondences is an essential component of many computer vision problems, and is often done with robust, local feature-descriptors. Transmission and storage of these descriptors are of critical importance in the context of mobile distributed camera networks and large indexing problems. We propose a framework for computing low bit-rate feature descriptors with a 20× reduction in bit rate. The framework is low complexity and has significant speed-up in the matching stage. We represent gradient histograms as tree structures which can be efficiently compressed. We show how to efficiently compute distances between descriptors in their compressed representation eliminating the need for decoding. We perform a comprehensive performance comparison with SIFT, SURF, and other low bit-rate descriptors and show that our proposed CHoG descriptor outperforms existing schemes.

282 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: This work proposes a method that automatically learns feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together.
Abstract: Several recently-proposed architectures for high-performance object recognition are composed of two main stages: a feature extraction stage that extracts locally-invariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a non-linear transform, such as a point-wise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locally-invariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited.

273 citations


Proceedings ArticleDOI
07 Nov 2009
TL;DR: Two new approaches are proposed: Volume-SIFT (VSIFT) and Partial-Descriptor-Sift (PDSIFT) for face recognition based on the original SIFT algorithm, which can achieve comparable performance as the most successful holistic approach ERE and significantly outperforms FLDA and NLDA.
Abstract: Scale Invariant Feature Transform (SIFT) has shown to be a powerful technique for general object recognition/detection. In this paper, we propose two new approaches: Volume-SIFT (VSIFT) and Partial-Descriptor-SIFT (PDSIFT) for face recognition based on the original SIFT algorithm. We compare holistic approaches: Fisherface (FLDA), the null space approach (NLDA) and Eigenfeature Regularization and Extraction (ERE) with feature based approaches: SIFT and PDSIFT. Experiments on the ORL and AR databases show that the performance of PDSIFT is significantly better than the original SIFT approach. Moreover, PDSIFT can achieve comparable performance as the most successful holistic approach ERE and significantly outperforms FLDA and NLDA.

255 citations


Journal ArticleDOI
TL;DR: Experimental results for multidate, multispectral, and multisensor remote images indicate that the proposed scale-orientation joint restriction criteria improves the match performance compared to intensity- and SIFT-based methods in terms of correct-match rate and aligning accuracy.
Abstract: When the scale-invariant feature transform (SIFT) is adopted in the registration of remote sensing images, a lot of incorrect matches of keypoints will appear owing to the significant difference in the image intensity between remote sensing images compared to visible images. Scale-orientation joint restriction criteria are proposed to achieve robust feature matching for keypoints in remote sensing images. Moreover, the feature descriptor of each keypoint is also refined to overcome the difference in the gradient intensity and orientation between remote image pairs. Experimental results for multidate, multispectral, and multisensor remote images indicate that the proposed method improves the match performance compared to intensity- and SIFT-based methods in terms of correct-match rate and aligning accuracy.

230 citations


Journal ArticleDOI
TL;DR: There is a large dependence of the methods on the amount of face and background information that is included in the face's images, and the performance of all methods decreases largely with outdoor-illumination, but LBP-based methods are an excellent election if the authors need real-time operation as well as high recognition rates.
Abstract: The aim of this work is to carry out a comparative study of face recognition methods that are suitable to work in unconstrained environments. The analyzed methods are selected by considering their performance in former comparative studies, in addition to be real-time, to require just one image per person, and to be fully online. In the study two local-matching methods, histograms of LBP features and Gabor Jet descriptors, one holistic method, generalized PCA, and two image-matching methods, SIFT-based and ERCF-based, are analyzed. The methods are compared using the FERET, LFW, UCHFaceHRI, and FRGC databases, which allows evaluating them in real-world conditions that include variations in scale, pose, lighting, focus, resolution, facial expression, accessories, makeup, occlusions, background and photographic quality. Main conclusions of this study are: there is a large dependence of the methods on the amount of face and background information that is included in the face's images, and the performance of all methods decreases largely with outdoor-illumination. The analyzed methods are robust to inaccurate alignment, face occlusions, and variations in expressions, to a large degree. LBP-based methods are an excellent election if we need real-time operation as well as high recognition rates.

185 citations


01 Jan 2009
TL;DR: In this paper, the authors address the problem of outdoor, appearance-based topological localization, particularly over long periods of time where seasonal changes alter the appearance of the environment, using local image features to compare single image pairs.
Abstract: In this paper, we address the problem of outdoor, appearance-based topological localization, particularly over longperiods of time where seasonal changes alter the appearance of the environment. We investigate a straight-forwardmethod that relies on local image features to compare single image pairs. We rst look into which of the dominatingimage feature algorithms, SIFT or the more recent SURF, that is most suitable for this task. We then ne-tune ourlocalization algorithm in terms of accuracy, and also introduce the epipolar constraint to further improve the result.The nal localization algorithm is applied on multiple data sets, each consisting of a large number of panoramicimages, which have been acquired over a period of nine months with large seasonal changes. The nal localizationrate in the single-image matching, cross-seasonal case is between 80 to 95%. Key words: Localization, Scene Recognition, Outdoor Environments 1. IntroductionLocal feature matching has become an increas-ingly used method for comparing images. Variousmethods have been proposed. The Scale-InvariantFeature Transform (SIFT) by Lowe [14] has, withits high accuracy and relatively low computationtime, become the de facto standard. Some attemptsof further improvements to the algorithm havebeen made (for example PCA-SIFT by Ke andSukthankar [10]). Perhaps the most recent, promis-ing approach is the Speeded Up Robust Features(SURF) by Bay et al. [5], which has been shown to

172 citations


Journal ArticleDOI
TL;DR: This work proposes the region-based SIFT approach to iris recognition, which does not require polar transformation, affine transformation or highly accurate segmentation to perform iris Recognition and is scale invariant.

160 citations


Proceedings ArticleDOI
19 Apr 2009
TL;DR: Affine-SIFT (ASIFT) is introduced, a fully affine invariant image comparison method that permits to reliably identify features that have undergone very large affine distortions measured by a new parameter, the transition tilt.
Abstract: A fully affine invariant image comparison method, Affine-SIFT (ASIFT) is introduced. While SIFT is fully invariant with respect to only four parameters namely zoom, rotation and translation, the new method treats the two left over parameters : the angles defining the camera axis orientation. Against any prognosis, simulating all views depending on these two parameters is feasible. The method permits to reliably identify features that have undergone very large affine distortions measured by a new parameter, the transition tilt. State-of-the-art methods hardly exceed transition tilts of 2 (SIFT), 2.5 (Harris-Affine and Hessian-Affine) and 10 (MSER). ASIFT can handle transition tilts up 36 and higher (see Fig. 1).

Journal ArticleDOI
TL;DR: The proposed features extend the concepts used for 2-D scalar images in the computer vision SIFT technique for extracting and matching distinctive scale invariant features to images of arbitrary dimensionality through the use of hyperspherical coordinates for gradients and multidimensional histograms to create the feature vectors.
Abstract: We propose the n -dimensional scale invariant feature transform ( n-SIFT) method for extracting and matching salient features from scalar images of arbitrary dimensionality, and compare this method's performance to other related features. The proposed features extend the concepts used for 2-D scalar images in the computer vision SIFT technique for extracting and matching distinctive scale invariant features. We apply the features to images of arbitrary dimensionality through the use of hyperspherical coordinates for gradients and multidimensional histograms to create the feature vectors. We analyze the performance of a fully automated multimodal medical image matching technique based on these features, and successfully apply the technique to determine accurate feature point correspondence between pairs of 3-D MRI images and dynamic 3D + time CT data.

Journal ArticleDOI
18 May 2009-Sensors
TL;DR: The goal is to establish the suitability of the SIFT technique for automatic tie point extraction and approximate DSM (Digital Surface Model) generation, and to develop an auto- Adaptive SIFT operator, which has been validated on several aerial images, with particular attention to large scale aerial images acquired using mini-UAV systems.
Abstract: In the photogrammetry field, interest in region detectors, which are widely used in Computer Vision, is quickly increasing due to the availability of new techniques. Images acquired by Mobile Mapping Technology, Oblique Photogrammetric Cameras or Unmanned Aerial Vehicles do not observe normal acquisition conditions. Feature extraction and matching techniques, which are traditionally used in photogrammetry, are usually inefficient for these applications as they are unable to provide reliable results under extreme geometrical conditions (convergent taking geometry, strong affine transformations, etc.) and for bad-textured images. A performance analysis of the SIFT technique in aerial and close-range photogrammetric applications is presented in this paper. The goal is to establish the suitability of the SIFT technique for automatic tie point extraction and approximate DSM (Digital Surface Model) generation. First, the performances of the SIFT operator have been compared with those provided by feature extraction and matching techniques used in photogrammetry. All these techniques have been implemented by the authors and validated on aerial and terrestrial images. Moreover, an auto-adaptive version of the SIFT operator has been developed, in order to improve the performances of the SIFT detector in relation to the texture of the images. The Auto-Adaptive SIFT operator (A2 SIFT) has been validated on several aerial images, with particular attention to large scale aerial images acquired using mini-UAV systems.

Proceedings ArticleDOI
08 Jul 2009
TL;DR: An improved algorithm is proposed that performs equal or better than the previous method for both articulated and rigid but geometrically detailed models, and extracts much larger number of local visual features by sampling each depth image densely and randomly.
Abstract: Our previous shape-based 3D model retrieval algorithm compares 3D shapes by using thousands of local visual features per model. A 3D model is rendered into a set of depth images, and from each image, local visual features are extracted by using the Scale Invariant Feature Transform (SIFT) algorithm by Lowe. To efficiently compare among large sets of local features, the algorithm employs bag-of-features approach to integrate the local features into a feature vector per model. The algorithm outperformed other methods for a dataset containing highly articulated yet geometrically simple 3D models. For a dataset containing diverse and detailed models, the method did only as well as other methods. This paper proposes an improved algorithm that performs equal or better than our previous method for both articulated and rigid but geometrically detailed models. The proposed algorithm extracts much larger number of local visual features by sampling each depth image densely and randomly. To contain computational cost, the method utilizes GPU for SIFT feature extraction and an efficient randomized decision tree for encoding SIFT features into visual words. Empirical evaluation showed that the proposed method is very fast, yet significantly outperforms our previous method for rigid and geometrically detailed models. For the simple yet articulated models, the performance was virtually unchanged.

Journal ArticleDOI
TL;DR: This paper presents an algorithm that extracts robust feature descriptors from 2.5D range images, in order to provide accurate point-based correspondences between compared range surfaces, inspired by the SIFT.

Proceedings ArticleDOI
10 Oct 2009
TL;DR: A combination of the Harris corner detector and the SIFT descriptor, which computes features with a high repeatability and very good matching properties within approx.
Abstract: In the recent past, the recognition and localization of objects based on local point features has become a widely accepted and utilized method. Among the most popular features are currently the SIFT features, the more recent SURF features, and region-based features such as the MSER. For time-critical application of object recognition and localization systems operating on such features, the SIFT features are too slow (500–600 ms for images of size 640×480 on a 3GHz CPU). The faster SURF achieve a computation time of 150–240 ms, which is still too slow for active tracking of objects or visual servoing applications. In this paper, we present a combination of the Harris corner detector and the SIFT descriptor, which computes features with a high repeatability and very good matching properties within approx. 20 ms. While just computing the SIFT descriptors for computed Harris interest points would lead to an approach that is not scale-invariant, we will show how scale-invariance can be achieved without a time-consuming scale space analysis. Furthermore, we will present results of successful application of the proposed features within our system for recognition and localization of textured objects. An extensive experimental evaluation proves the practical applicability of our approach.

Proceedings ArticleDOI
01 Jan 2009
TL;DR: Experimental results on the AR-Face and CMU-PIE database using manually aligned faces, unaligned faces, and partially occluded faces show that the proposed approach is robust and can outperform current generic approaches.
Abstract: We analyze the usage of Speeded Up Robust Features (SURF) as local descriptors for face recognition. The effect of different feature extraction and viewpoint consistency constrained matching approaches are analyzed. Furthermore, a RANSAC based outlier removal for system combination is proposed. The proposed approach allows to match faces under partial occlusions, and even if they are not perfectly aligned or illuminated. Current approaches are sensitive to registration errors and usually rely on a very good initial alignment and illumination of the faces to be recognized. A grid-based and dense extraction of local features in combination with a block-based matching accounting for different viewpoint constraints is proposed, as interest-point based feature extraction approaches for face recognition often fail. The proposed SURF descriptors are compared to SIFT descriptors. Experimental results on the AR-Face and CMU-PIE database using manually aligned faces, unaligned faces, and partially occluded faces show that the proposed approach is robust and can outperform current generic approaches.

Proceedings ArticleDOI
01 Dec 2009
TL;DR: Compared with the existing SIFT FPGA implementation, which requires 33 milliseconds for an image of 320×240 pixels, a significant improvement has been achieved for the proposed architecture.
Abstract: This paper has proposed an architecture of optimised SIFT (Scale Invariant Feature Transform) feature detection for an FPGA implementation of an image matcher. In order for SIFT based image matcher to be implemented on an FPGA efficiently, in terms of speed and hardware resource usage, the original SIFT algorithm has been significantly optimised in the following aspects: 1) Upsampling has been replaced with downsampling to save the interpolation operation. 2) Only four scales with two octaves are needed for our image matcher with moderate degradation of matching performance. 3) The total dimension of the feature descriptor has been reduced to 72 from 128 of the original SIFT, which leads to significantly simplify the image matching operation. With the optimisation above, the proposed FPGA implementation is able to detect the features of a typical image of 640×480 pixels within 31 milliseconds. Therefore, compared with the existing SIFT FPGA implementation, which requires 33 milliseconds for an image of 320×240 pixels, a significant improvement has been achieved for our proposed architecture.

Proceedings ArticleDOI
18 Jan 2009
TL;DR: It is shown that image and feature matching algorithms are robust to significantly compressed features, and a strong correlation between MSE and matching error for feature points and images is established.
Abstract: We investigate transform coding to efficiently store and transmit SIFT and SURF image descriptors. We show that image and feature matching algorithms are robust to significantly compressed features. We achieve nearperfect image matching and retrieval for both SIFT and SURF using ∼2 bits/dimension. When applied to SIFT and SURF, this provides a 16× compression relative to conventional floating point representation. We establish a strong correlation between MSE and matching error for feature points and images. Feature compression enables many application that may not otherwise be possible, especially on mobile devices.

Proceedings ArticleDOI
30 Oct 2009
TL;DR: This paper proposes to exploit SURF features in face recognition in this paper by exploiting the advantages of SURF, a scale and in-plane rotation invariant detector and descriptor with comparable or even better performance with SIFT.
Abstract: The Scale Invariant Feature Transform (SIFT) proposed by David G. Lowe has been used in face recognition and proved to perform well. Recently, a new detector and descriptor, named Speed-Up Robust Features (SURF) suggested by Herbert Bay, attracts people's attentions. SURF is a scale and in-plane rotation invariant detector and descriptor with comparable or even better performance with SIFT. Because each of SURF feature has only 64 dimensions in general and an indexing scheme is built by using the sign of the Laplacian, SURF is much faster than the 128-dimensional SIFT at the matching step. Thus based on the above advantages of SURF, we propose to exploit SURF features in face recognition in this paper.

Proceedings ArticleDOI
08 Jul 2009
TL;DR: The Bag of Words pipeline forms the basis to compare various fast alternatives for all of its components, and a fast algorithm to densely sample SIFT and SURF is proposed, and several variants of these descriptors are compared.
Abstract: We start from the state-of-the-art Bag of Words pipeline that in the 2008 benchmarks of TRECvid and PASCAL yielded the best performance scores. We have contributed to that pipeline, which now forms the basis to compare various fast alternatives for all of its components: (i) For descriptor extraction we propose a fast algorithm to densely sample SIFT and SURF, and we compare several variants of these descriptors. (ii) For descriptor projection we compare a k-means visual vocabulary with a Random Forest. As a preprojection step we experiment with PCA on the descriptors to decrease projection time. (iii) For classification we use Support Vector Machines and compare the x2 kernel with the RBF kernel. Our results lead to a 10-fold speed increase without any loss of accuracy and to a 30-fold speed increase with 17% loss of accuracy, where the latter system does real-time classification at 26 images per second.

Proceedings ArticleDOI
24 Aug 2009
TL;DR: This work uses an extension to the well-known SIFT descriptor - called Hue-SIFT - aimed at adding color information to the original SIFT, which shows recognition rates which are similar to those achieved by other approaches in literature.
Abstract: Most of previous papers about the detection of nude or pornographic images start by the application of a skin detector followed by some kind of shape or geometric modeling. In this work, these two steps are avoided by a bag-of-features (BOF) approach, in which images are represented by histograms of sparse visual descriptors. BOF approaches have been applied successfully to object recognition tasks, but most descriptors used in that case are based on gray level information. Our approach is based on an extension to the well-known SIFT descriptor - called Hue-SIFT - aimed at adding color information to the original SIFT. Experimental results show recognition rates which are similar to those achieved by other approaches in literature, without the need for sophisticated skin or shape models.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: A novel and robust feature descriptor called ordinal spatial intensity distribution (OSID) which is invariant to any monotonically increasing brightness changes and has far reaching implications for many applications in computer vision including motion estimation, object tracking/recognition, image classification/retrieval, 3D reconstruction, and stereo.
Abstract: We describe a novel and robust feature descriptor called ordinal spatial intensity distribution (OSID) which is invariant to any monotonically increasing brightness changes. Many traditional features are invariant to intensity shift or affine brightness changes but cannot handle more complex nonlinear brightness changes, which often occur due to the nonlinear camera response, variations in capture device parameters, temporal changes in the illumination, and viewpoint-dependent illumination and shadowing. A configuration of spatial patch sub-divisions is defined, and the descriptor is obtained by computing a 2-D histogram in the intensity ordering and spatial sub-division spaces. Extensive experiments show that the proposed descriptor significantly outperforms many state-of-the-art descriptors such as SIFT, GLOH, and PCA-SIFT under complex brightness changes. Moreover, the experiments demonstrate the proposed descriptor's superior performance even in the presence of image blur, viewpoint changes, and JPEG compression. The proposed descriptor has far reaching implications for many applications in computer vision including motion estimation, object tracking/recognition, image classification/retrieval, 3D reconstruction, and stereo.

Journal ArticleDOI
TL;DR: The best classification results under 10-fold cross-validation test were 4.57% and 5.95% using PCALC and SVM, indicating that the local region detector based insect classification method could be an effective way for insect identification and classification.

Proceedings ArticleDOI
01 Sep 2009
TL;DR: The Scale Invariant Feature Transformation (SIFT) approach for recognition using iris images achieves significantly better performance than either of the individual schemes, with a performance improvement of 24% in the Equal Error Rate.
Abstract: Biometric methods based on iris images are believed to allow very high accuracy, and there has been an explosion of interest in iris biometrics in recent years. In this paper, we use the Scale Invariant Feature Transformation (SIFT) for recognition using iris images. Contrarily to traditional iris recognition systems, the SIFT approach does not rely on the transformation of the iris pattern to polar coordinates or on highly accurate segmentation, allowing less constrained image acquisition conditions. We extract characteristic SIFT feature points in scale space and perform matching based on the texture information around the feature points using the SIFT operator. Experiments are done using the BioSec multimodal database, which includes 3,200 iris images from 200 individuals acquired in two different sessions. We contribute with the analysis of the influence of different SIFT parameters on the recognition performance. We also show the complementarity between the SIFT approach and a popular matching approach based on transformation to polar coordinates and Log-Gabor wavelets. The combination of the two approaches achieves significantly better performance than either of the individual schemes, with a performance improvement of 24% in the Equal Error Rate.

Proceedings ArticleDOI
01 Sep 2009
TL;DR: The proposed 3D shape model retrieval method is capable of retrieving 3D models having diverse shape representations and is robust against articulation and global deformation of 3D shapes thanks to location-free integration of local visual features.
Abstract: This paper describes a 3D shape model retrieval method that accepts, as a query, a 3D mesh obtained by a range scan from a viewpoint. The proposed method visually compares single depth map of the query with depth maps of a 3D model rendered from multiple viewpoints. Comparison of the depth maps employs bag-of local visual features extracted by using a modified version of Lowe's Scale-Invariant Feature Transform (SIFT). The method is capable of retrieving 3D models having diverse shape representations and is robust against articulation and global deformation of 3D shapes thanks to location-free integration of local visual features. Two modifications to the SIFT are made to avoid ill effects of range scanning artifacts, such as jagged edges and cracks, that exist in the query mesh. The two modifications are; (1) dense and random feature placement, and (2) importance sampling of low-frequency images in the SIFT's Gaussian image pyramid. Our experimental evaluation showed that the proposed method significantly outperforms previous methods.

Journal ArticleDOI
TL;DR: This paper presents a novel approach to digital video stabilization that uses adaptive particle filter for global motion estimation and proposes a new cost function called SIFT-BMSE (SIFT Block Mean Square Error) to disregard the foreground object pixels and reduce the computational cost.
Abstract: This paper presents a novel approach to digital video stabilization that uses adaptive particle filter for global motion estimation. In this approach, dimensionality of the feature space is first reduced by the principal component analysis (PCA) method using the features obtained from a scale invariant feature transform (SIFT), and hence the resultant features may be termed as the PCA-SIFT features. The trajectory of these features extracted from video frames is used to estimate undesirable motion between frames. A new cost function called SIFT-BMSE (SIFT Block Mean Square Error) is proposed in adaptive particle filter framework to disregard the foreground object pixels and reduce the computational cost. Frame compensation based on these estimates yields stabilized full-frame video sequences. Experimental results show that the proposed algorithm is both accurate and efficient.

Proceedings ArticleDOI
01 Jan 2009
TL;DR: It is demonstrated that the detection of features at different image scales actually has an influence on the localisation accuracy, and a general framework to determine the uncertainty of multi-scale image features is introduced.
Abstract: Image feature points are the basis for numerous computer vision tasks, such as pose estimation or object detection. State of the art algorithms detect features that are invariant to scale and orientation changes. While feature detectors and descriptors have been widely studied in terms of stability and repeatability, their localisation error has often been assumed to be uniform and insignificant. We argue that this assumption does not hold for scale-invariant feature detectors and demonstrate that the detection of features at different image scales actually has an influence on the localisation accuracy. A general framework to determine the uncertainty of multi-scale image features is introduced. This uncertainty is represented via anisotropic covariances with varying orientation and magnitude. We apply our framework to the well-known SIFT and SURF algorithms, detail its implementation and make it available 1 . Finally, the usefulness of such covariance estimates for bundle adjustment and homography computation is illustrated.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: A robust feature matching scheme in which features can be matched in 2.3µs, comparable robustness to SIFT and Ferns while using a tiny fraction of the processing time, and in the latter case a fractions of the memory as well.
Abstract: In this paper we present a robust feature matching scheme in which features can be matched in 2.3µs. For a typical task involving 150 features per image, this results in a processing time of 500µs for feature extraction and matching. In order to achieve very fast matching we use simple features based on histograms of pixel intensities and an indexing scheme based on their joint distribution. The features are stored with a novel bit mask representation which requires only 44 bytes of memory per feature and allows computation of a dissimilarity score in 20ns. A training phase gives the patch-based features invariance to small viewpoint variations. Larger viewpoint variations are handled by training entirely independent sets of features from different viewpoints. A complete system is presented where a database of around 13,000 features is used to robustly localise a single planar target in just over a millisecond, including all steps from feature detection to model fitting. The resulting system shows comparable robustness to SIFT [8] and Ferns [14] while using a tiny fraction of the processing time, and in the latter case a fraction of the memory as well.

Journal ArticleDOI
TL;DR: A novel Symmetric Scale Invariant Feature Transform (symmetric-SIFT) descriptor is proposed and developed, which works with near real-time performance, and can deal with the large non-overlapping and large initial misalignment situations.
Abstract: The purpose of image registration is to spatially align two or more single-modality images taken at different times, or several images acquired by multiple imaging modalities. Intensity-based registration usually requires optimization of the similarity metric between the images. However, global optimization techniques are too time-consuming, and local optimization techniques frequently fail to search the global transformation space because of the large initial misalignment of the two images. Moreover, for large non-overlapping area registration, the similarity metric cannot reach its optimum value when the two images are properly registered. In order to solve these problems, we propose a novel Symmetric Scale Invariant Feature Transform (symmetric-SIFT) descriptor and develop a fast multi-modal image registration technique. The proposed technique automatically generates a lot of highly distinctive symmetric-SIFT descriptors for two images, and the registration is performed by matching the corresponding descriptors over two images. These descriptors are invariant to image scale and rotation, and are partially invariant to affine transformation. Moreover, these descriptors are symmetric to contrast, which makes it suitable for multi-modal image registration. The proposed technique abandons the optimization and similarity metric strategy. It works with near real-time performance, and can deal with the large non-overlapping and large initial misalignment situations. Test cases involving scale change, large non-overlapping, and large initial misalignment on computed tomography (CT) and magnetic resonance (MR) datasets show that it needs much less runtime and achieves better accuracy when compared to other algorithms. (C) 2009 National Natural Science Foundation of China and Chinese Academy of Sciences. Published by Elsevier Limited and Science in China Press. All rights reserved.