scispace - formally typeset
Search or ask a question

Showing papers on "Scale-invariant feature transform published in 2012"


Proceedings ArticleDOI
16 Jun 2012
TL;DR: This work proposes a novel keypoint descriptor inspired by the human visual system and more precisely the retina, coined Fast Retina Keypoint (FREAK), which is in general faster to compute with lower memory load and also more robust than SIFT, SURF or BRISK.
Abstract: A large number of vision applications rely on matching keypoints across images. The last decade featured an arms-race towards faster and more robust keypoints and association algorithms: Scale Invariant Feature Transform (SIFT)[17], Speed-up Robust Feature (SURF)[4], and more recently Binary Robust Invariant Scalable Keypoints (BRISK)[I6] to name a few. These days, the deployment of vision algorithms on smart phones and embedded devices with low memory and computation complexity has even upped the ante: the goal is to make descriptors faster to compute, more compact while remaining robust to scale, rotation and noise. To best address the current requirements, we propose a novel keypoint descriptor inspired by the human visual system and more precisely the retina, coined Fast Retina Keypoint (FREAK). A cascade of binary strings is computed by efficiently comparing image intensities over a retinal sampling pattern. Our experiments show that FREAKs are in general faster to compute with lower memory load and also more robust than SIFT, SURF or BRISK. They are thus competitive alternatives to existing keypoints in particular for embedded applications.

1,876 citations


Book ChapterDOI
07 Oct 2012
TL;DR: KAZE features, a novel multiscale 2D feature detection and description algorithm in nonlinear scale spaces, can make blurring locally adaptive to the image data, reducing noise but retaining object boundaries, obtaining superior localization accuracy and distinctiviness.
Abstract: In this paper, we introduce KAZE features, a novel multiscale 2D feature detection and description algorithm in nonlinear scale spaces. Previous approaches detect and describe features at different scale levels by building or approximating the Gaussian scale space of an image. However, Gaussian blurring does not respect the natural boundaries of objects and smoothes to the same degree both details and noise, reducing localization accuracy and distinctiveness. In contrast, we detect and describe 2D features in a nonlinear scale space by means of nonlinear diffusion filtering. In this way, we can make blurring locally adaptive to the image data, reducing noise but retaining object boundaries, obtaining superior localization accuracy and distinctiviness. The nonlinear scale space is built using efficient Additive Operator Splitting (AOS) techniques and variable conductance diffusion. We present an extensive evaluation on benchmark datasets and a practical matching application on deformable surfaces. Even though our features are somewhat more expensive to compute than SURF due to the construction of the nonlinear scale space, but comparable to SIFT, our results reveal a step forward in performance both in detection and description against previous state-of-the-art methods.

905 citations


Journal ArticleDOI
TL;DR: This paper shows that one can directly compute a binary descriptor, which it is called BRIEF, on the basis of simple intensity difference tests and shows that it yields comparable recognition accuracy, while running in an almost vanishing fraction of the time required by either.
Abstract: Binary descriptors are becoming increasingly popular as a means to compare feature points very fast while requiring comparatively small amounts of memory. The typical approach to creating them is to first compute floating-point ones, using an algorithm such as SIFT, and then to binarize them. In this paper, we show that we can directly compute a binary descriptor, which we call BRIEF, on the basis of simple intensity difference tests. As a result, BRIEF is very fast both to build and to match. We compare it against SURF and SIFT on standard benchmarks and show that it yields comparable recognition accuracy, while running in an almost vanishing fraction of the time required by either.

872 citations


Proceedings ArticleDOI
14 May 2012
TL;DR: An approach to simultaneous localization and mapping (SLAM) for RGB-D cameras like the Microsoft Kinect that concurrently estimates the trajectory of a hand-held Kinect and generates a dense 3D model of the environment is presented.
Abstract: We present an approach to simultaneous localization and mapping (SLAM) for RGB-D cameras like the Microsoft Kinect. Our system concurrently estimates the trajectory of a hand-held Kinect and generates a dense 3D model of the environment. We present the key features of our approach and evaluate its performance thoroughly on a recently published dataset, including a large set of sequences of different scenes with varying camera speeds and illumination conditions. In particular, we evaluate the accuracy, robustness, and processing time for three different feature descriptors (SIFT, SURF, and ORB). The experiments demonstrate that our system can robustly deal with difficult data in common indoor scenarios while being fast enough for online operation. Our system is fully available as open-source.

765 citations


Journal ArticleDOI
TL;DR: This work reduces the size of the descriptors by representing them as short binary strings and learn descriptor invariance from examples, and shows extensive experimental validation, demonstrating the advantage of the proposed approach.
Abstract: SIFT-like local feature descriptors are ubiquitously employed in computer vision applications such as content-based retrieval, video analysis, copy detection, object recognition, photo tourism, and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometric and geometric transformations, in particular, affine and intensity scale transformations. However, real transformations that an image can undergo can only be approximately modeled in this way, and thus most descriptors are only approximately invariant in practice. Second, descriptors are usually high dimensional (e.g., SIFT is represented as a 128-dimensional vector). In large-scale retrieval and matching problems, this can pose challenges in storing and retrieving descriptor data. We map the descriptor vectors into the Hamming space in which the Hamming metric is used to compare the resulting representations. This way, we reduce the size of the descriptors by representing them as short binary strings and learn descriptor invariance from examples. We show extensive experimental validation, demonstrating the advantage of the proposed approach.

654 citations


Journal ArticleDOI
TL;DR: This paper created a challenging real-world copy-move dataset, and a software framework for systematic image manipulation, and examined the 15 most prominent feature sets, finding the keypoint-based features Sift and Surf as well as the block-based DCT, DWT, KPCA, PCA, and Zernike features perform very well.
Abstract: A copy-move forgery is created by copying and pasting content within the same image, and potentially postprocessing it. In recent years, the detection of copy-move forgeries has become one of the most actively researched topics in blind image forensics. A considerable number of different algorithms have been proposed focusing on different types of postprocessed copies. In this paper, we aim to answer which copy-move forgery detection algorithms and processing steps (e.g., matching, filtering, outlier detection, affine transformation estimation) perform best in various postprocessing scenarios. The focus of our analysis is to evaluate the performance of previously proposed feature sets. We achieve this by casting existing algorithms in a common pipeline. In this paper, we examined the 15 most prominent feature sets. We analyzed the detection performance on a per-image basis and on a per-pixel basis. We created a challenging real-world copy-move dataset, and a software framework for systematic image manipulation. Experiments show, that the keypoint-based features Sift and Surf, as well as the block-based DCT, DWT, KPCA, PCA, and Zernike features perform very well. These feature sets exhibit the best robustness against various noise sources and downsampling, while reliably identifying the copied regions.

623 citations


01 Jan 2012
TL;DR: KNN (K-Nearest Neighbor) and Random Sample Consensus (RANSAC) are added to the three robust feature detection methods in order to analyze the results of the methods‟ application in recognition.
Abstract: This paper summarizes the three robust feature detection methods: Scale Invariant Feature Transform (SIFT), Principal Component Analysis (PCA–SIFT) and Speeded Up Robust Features (SURF). This paper uses KNN (K-Nearest Neighbor) and Random Sample Consensus (RANSAC) to the three methods in order to analyze the results of the methods‟ application in recognition. KNN is used to find the matches, and RANSAC to reject inconsistent matches from which the inliers can take as correct matches. The performance of the robust feature detection methods are compared for scale changes, rotation, and blur. All the experiments use repeatability measurement and the number of correct matches for the evaluation measurements. SIFT presents its stability in most situations although it‟s slow. SURF is the fastest one with good performance as the same as SIFT. PCA-SIFT show its advantages in rotation and illumination changes.

612 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper examined the 15 most prominent feature sets and analyzed the detection performance on a per-image basis and on per-pixel basis, and found that the keypoint-based features SIFT and SURF, as well as the block-based DCT, DWT, KPCA, PCA and Zernike features perform very well.
Abstract: A copy-move forgery is created by copying and pasting content within the same image, and potentially post-processing it. In recent years, the detection of copy-move forgeries has become one of the most actively researched topics in blind image forensics. A considerable number of different algorithms have been proposed focusing on different types of postprocessed copies. In this paper, we aim to answer which copy-move forgery detection algorithms and processing steps (e.g., matching, filtering, outlier detection, affine transformation estimation) perform best in various postprocessing scenarios. The focus of our analysis is to evaluate the performance of previously proposed feature sets. We achieve this by casting existing algorithms in a common pipeline. In this paper, we examined the 15 most prominent feature sets. We analyzed the detection performance on a per-image basis and on a per-pixel basis. We created a challenging real-world copy-move dataset, and a software framework for systematic image manipulation. Experiments show, that the keypoint-based features SIFT and SURF, as well as the block-based DCT, DWT, KPCA, PCA and Zernike features perform very well. These feature sets exhibit the best robustness against various noise sources and downsampling, while reliably identifying the copied regions.

429 citations


Journal ArticleDOI
TL;DR: The SIFT descriptor has been proven to be very useful in practice for robust image matching and object recognition under real-world conditions and has also been extended from grey-level to colour images and from 2-D spatial images to 2+1-D spatio-temporal video.
Abstract: Scale Invariant Feature Transform (SIFT) is an image descriptor for image-based matching developed by David Lowe (1999,2004). This descriptor as well as related image descriptors are used for a large number of purposes in computer vision related to point matching between different views of a 3-D scene and view-based object recognition. The SIFT descriptor is invariant to translations, rotations and scaling transformations in the image domain and robust to moderate perspective transformations and illumination variations. Experimentally, the SIFT descriptor has been proven to be very useful in practice for robust image matching and object recognition under real-world conditions.In its original formulation, the SIFT descriptor comprised a method for detecting interest points from a grey-level image at which statistics of local gradient directions of image intensities were accumulated to give a summarizing description of the local image structures in a local neighbourhood around each interest point, with the intention that this descriptor should be used for matching corresponding interest points between different images. Later, the SIFT descriptor has also been applied at dense grids (dense SIFT) which have been shown to lead to better performance for tasks such as object categorization and texture classification. The SIFT descriptor has also been extended from grey-level to colour images and from 2-D spatial images to 2+1-D spatio-temporal video.

356 citations


Proceedings ArticleDOI
28 May 2012
TL;DR: This paper introduces a new algorithm for approximate matching of binary features, based on priority search of multiple hierarchical clustering trees, and shows that it performs well for large datasets, both in terms of speed and memory efficiency.
Abstract: There has been growing interest in the use of binary-valued features, such as BRIEF, ORB, and BRISK for efficient local feature matching. These binary features have several advantages over vector-based features as they can be faster to compute, more compact to store, and more efficient to compare. Although it is fast to compute the Hamming distance between pairs of binary features, particularly on modern architectures, it can still be too slow to use linear search in the case of large datasets. For vector-based features, such as SIFT and SURF, the solution has been to use approximate nearest-neighbor search, but these existing algorithms are not suitable for binary features. In this paper we introduce a new algorithm for approximate matching of binary features, based on priority search of multiple hierarchical clustering trees. We compare this to existing alternatives, and show that it performs well for large datasets, both in terms of speed and memory efficiency.

312 citations


Proceedings ArticleDOI
09 Jul 2012
TL;DR: A feature-fusion-based food recognition method for bounding boxes of the candidate regions with various kinds of visual features including bag-of-features of SIFT and CSIFT with spatial pyramid, histogram of oriented gradient (HoG), and Gabor texture features is applied.
Abstract: In this paper, we propose a two-step method to recognize multiple-food images by detecting candidate regions with several methods and classifying them with various kinds of features. In the first step, we detect several candidate regions by fusing outputs of several region detectors including Felzenszwalb's deformable part model (DPM) [1], a circle detector and the JSEG region segmentation. In the second step, we apply a feature-fusion-based food recognition method for bounding boxes of the candidate regions with various kinds of visual features including bag-of-features of SIFT and CSIFT with spatial pyramid (SP-BoF), histogram of oriented gradient (HoG), and Gabor texture features. In the experiments, we estimated ten food candidates for multiple-food images in the descending order of the confidence scores. As results, we have achieved the 55.8% classification rate, which improved the baseline result in case of using only DPM by 14.3 points, for a multiple-food image data set. This demonstrates that the proposed two-step method is effective for recognition of multiple-food images.

Journal ArticleDOI
TL;DR: This method can accurately recover both the intrinsic low-rank texture and the unknown transformation, and hence both the geometry and appearance of the associated planar region in 3D in the case of planar regions with significant affine or projective deformation.
Abstract: In this paper, we propose a new tool to efficiently extract a class of "low-rank textures" in a 3D scene from user-specified windows in 2D images despite significant corruptions and warping. The low-rank textures capture geometrically meaningful structures in an image, which encompass conventional local features such as edges and corners as well as many kinds of regular, symmetric patterns ubiquitous in urban environments and man-made objects. Our approach to finding these low-rank textures leverages the recent breakthroughs in convex optimization that enable robust recovery of a high-dimensional low-rank matrix despite gross sparse errors. In the case of planar regions with significant affine or projective deformation, our method can accurately recover both the intrinsic low-rank texture and the unknown transformation, and hence both the geometry and appearance of the associated planar region in 3D. Extensive experimental results demonstrate that this new technique works effectively for many regular and near-regular patterns or objects that are approximately low-rank, such as symmetrical patterns, building facades, printed text, and human faces.

Journal ArticleDOI
TL;DR: The sparse coding method for satellite scene classification is introduced, a two-stage linear support vector machine (SVM) classifier is designed and an improved rotation invariant texture descriptor based on LTPs is presented.
Abstract: This article presents a new method for high-resolution satellite scene classification. Specifically, we make three main contributions: (1) we introduce the sparse coding method for satellite scene classification; (2) we present local ternary pattern histogram Fourier (LTP-HF) features, an improved rotation invariant texture descriptor based on LTPs; (3) we effectively combine a set of diverse and complementary features to further improve the performance. A two-stage linear support vector machine (SVM) classifier is designed for this purpose. In the first stage, the SVM is used to generate probability images with a scale invariant feature transform (SIFT), LTP-HF and colour histogram features, respectively. The generated probability images with different features are fused in the second stage in order to obtain the final classification results. Experimental results show that the suggested classification method achieves very promising performance.

Journal ArticleDOI
TL;DR: Two descriptors are obtained which are rotation invariant without estimating a reference orientation, which appears to be a major error source for most of the existing methods, such as Scale Invariant Feature Transform (SIFT) and DAISY.
Abstract: This paper proposes a novel method for interest region description which pools local features based on their intensity orders in multiple support regions. Pooling by intensity orders is not only invariant to rotation and monotonic intensity changes, but also encodes ordinal information into a descriptor. Two kinds of local features are used in this paper, one based on gradients and the other on intensities; hence, two descriptors are obtained: the Multisupport Region Order-Based Gradient Histogram (MROGH) and the Multisupport Region Rotation and Intensity Monotonic Invariant Descriptor (MRRID). Thanks to the intensity order pooling scheme, the two descriptors are rotation invariant without estimating a reference orientation, which appears to be a major error source for most of the existing methods, such as Scale Invariant Feature Transform (SIFT), SURF, and DAISY. Promising experimental results on image matching and object recognition demonstrate the effectiveness of the proposed descriptors compared to state-of-the-art descriptors.

Journal ArticleDOI
TL;DR: A framework for computing low bit-rate feature descriptors with a 20× reduction in bit rate compared to state-of-the-art descriptors is proposed and it is shown how to efficiently compute distances between descriptors in the compressed domain eliminating the need for decoding.
Abstract: Establishing visual correspondences is an essential component of many computer vision problems, which is often done with local feature-descriptors Transmission and storage of these descriptors are of critical importance in the context of mobile visual search applications We propose a framework for computing low bit-rate feature descriptors with a 20× reduction in bit rate compared to state-of-the-art descriptors The framework offers low complexity and has significant speed-up in the matching stage We show how to efficiently compute distances between descriptors in the compressed domain eliminating the need for decoding We perform a comprehensive performance comparison with SIFT, SURF, BRIEF, MPEG-7 image signatures and other low bit-rate descriptors and show that our proposed CHoG descriptor outperforms existing schemes significantly over a wide range of bitrates We implement the descriptor in a mobile image retrieval system and for a database of 1 million CD, DVD and book covers, we achieve 96% retrieval accuracy using only 4 KB of data per query image

Proceedings ArticleDOI
16 Jun 2012
TL;DR: A new technique for extracting local features from images of architectural scenes, based on detecting and representing local symmetries, which can improve matching performance for this difficult task of matching challenging pairs of photos of urban scenes.
Abstract: We present a new technique for extracting local features from images of architectural scenes, based on detecting and representing local symmetries. These new features are motivated by the fact that local symmetries, at different scales, are a fundamental characteristic of many urban images, and are potentially more invariant to large appearance changes than lower-level features such as SIFT. Hence, we apply these features to the problem of matching challenging pairs of photos of urban scenes. Our features are based on simple measures of local bilateral and rotational symmetries computed using local image operations. These measures are used both for feature detection and for computing descriptors. We demonstrate our method on a challenging new dataset containing image pairs exhibiting a range of dramatic variations in lighting, age, and rendering style, and show that our features can improve matching performance for this difficult task.

Journal ArticleDOI
TL;DR: The proposed shape-contexts-based image hashing approach using robust local feature points yields better identification performances under geometric attacks such as rotation attacks and brightness changes, and provides comparable performances under classical distortions such as additive noise, blurring, and compression.
Abstract: Local feature points have been widely investigated in solving problems in computer vision, such as robust matching and object detection However, its investigation in the area of image hashing is still limited In this paper, we propose a novel shape-contexts-based image hashing approach using robust local feature points The contributions are twofold: 1) The robust SIFT-Harris detector is proposed to select the most stable SIFT keypoints under various content-preserving distortions 2) Compact and robust image hashes are generated by embedding the detected local features into shape-contexts-based descriptors Experimental results show that the proposed image hashing is robust to a wide range of distortions and attacks, due to the benefits of robust salient keypoints detection and the shape-contexts-based feature descriptors When compared with the current state-of-the-art schemes, the proposed scheme yields better identification performances under geometric attacks such as rotation attacks and brightness changes, and provides comparable performances under classical distortions such as additive noise, blurring, and compression Also, we demonstrate that the proposed approach could be applied for image tampering detection

Journal ArticleDOI
TL;DR: A segment buffer scheme is successfully developed that could not only feed data to the computing modules in a data-streaming manner, but also reduce about 50% memory requirement than a previous work.
Abstract: Feature extraction is an essential part in applications that require computer vision to recognize objects in an image processed. To extract the features robustly, feature extraction algorithms are often very demanding in computation so that the performance achieved by pure software is far from real-time. Among those feature extraction algorithms, scale-invariant feature transform (SIFT) has gained a lot of popularity recently. In this paper, we propose an all-hardware SIFT accelerator-the fastest of its kind to our knowledge. It consists of two interactive hardware components, one for key point identification, and the other for feature descriptor generation. We successfully developed a segment buffer scheme that could not only feed data to the computing modules in a data-streaming manner, but also reduce about 50% memory requirement than a previous work. With a parallel architecture incorporating a three-stage pipeline, the processing time of the key point identification is only 3.4 ms for one video graphics array (VGA) image. Taking also into account the feature descriptor generation part, the overall SIFT processing time for a VGA image can be kept within 33 ms (to support real-time operation) when the number of feature points to be extracted is fewer than 890.

Journal ArticleDOI
TL;DR: A novel method based on bilateral filter (BF) scale-invariant feature transform (SIFT) (BFSIFT) to find feature matches for synthetic aperture radar (SAR) image registration, where more accurately located matches can be found in the anisotropic scale space.
Abstract: In this letter, we propose a novel method based on bilateral filter (BF) scale-invariant feature transform (SIFT) (BFSIFT) to find feature matches for synthetic aperture radar (SAR) image registration. First, the anisotropic scale space of the image is constructed using BFs. The constructing process is noniterative and fast. Compared with the Gaussian scale space used in SIFT, more accurately located matches can be found in the anisotropic one. Then, keypoints are detected and described in the coarser scales using SIFT. At last, dual-matching strategy and random sample consensus are used to establish matches. The probability of correct matching is significantly increased by skipping the finest scale and by the dual-matching strategy. Experiments on various slant range images demonstrate the applicability of BFSIFT to find feature matches for SAR image registration.

Journal ArticleDOI
TL;DR: The proposed algorithm makes full use of the affine invariant advantage of ASIFT and the efficient merit of SURF while avoids their drawbacks and demonstrates the robustness and efficiency of the proposed algorithm.

Journal ArticleDOI
TL;DR: A planar to spherical mapping is introduced and an algorithm for its estimation is given, which allows to extract objects from an omnidirectional image given their SIFT descriptors in a planar image.
Abstract: A SIFT algorithm in spherical coordinates for omnidirectional images is proposed. This algorithm can generate two types of local descriptors, Local Spherical Descriptors and Local Planar Descriptors. With the first ones, point matching between two omnidirectional images can be performed, and with the second ones, the same matching process can be done but between omnidirectional and planar images. Furthermore, a planar to spherical mapping is introduced and an algorithm for its estimation is given. This mapping allows to extract objects from an omnidirectional image given their SIFT descriptors in a planar image. Several experiments, confirming the promising and accurate performance of the system, are conducted.

Journal ArticleDOI
TL;DR: Experiments showed that the proposed statistical approach to visual texture description outperforms existing static texture classification methods and is comparable to the top dynamic texture classification techniques.

Proceedings Article
03 Dec 2012
TL;DR: This paper proposes to use the boosting-trick to obtain a non-linear mapping of the input to a high-dimensional feature space and employs gradient-based weak learners resulting in a learned descriptor that closely resembles the well-known SIFT.
Abstract: In this paper we apply boosting to learn complex non-linear local visual feature representations, drawing inspiration from its successful application to visual object detection. The main goal of local feature descriptors is to distinctively represent a salient image region while remaining invariant to viewpoint and illumination changes. This representation can be improved using machine learning, however, past approaches have been mostly limited to learning linear feature mappings in either the original input or a kernelized input feature space. While kernelized methods have proven somewhat effective for learning non-linear local feature descriptors, they rely heavily on the choice of an appropriate kernel function whose selection is often difficult and non-intuitive. We propose to use the boosting-trick to obtain a non-linear mapping of the input to a high-dimensional feature space. The non-linear feature mapping obtained with the boosting-trick is highly intuitive. We employ gradient-based weak learners resulting in a learned descriptor that closely resembles the well-known SIFT. As demonstrated in our experiments, the resulting descriptor can be learned directly from intensity patches achieving state-of-the-art performance.

Journal ArticleDOI
TL;DR: The main conclusions of this study are: all analyzed methods perform very well under the conditions in which they were evaluated, except for the case of GJD that has low performance in outdoor setups; the best tradeoff between high recognition rate and fast processing speed is obtained by WLD-based methods; and in experiments where the test images are acquired in an outdoor setup and the gallery images are acquiring in an indoor setup, the performance of all evaluated methods is very low.

Journal ArticleDOI
TL;DR: Modifications to the SIFT algorithm are proposed that substantially improve the repeatability of detection and effectiveness of matching under radial distortion, while preserving the original invariance to scale and rotation.
Abstract: Keypoint detection and matching is of fundamental importance for many applications in computer and robot vision. The association of points across different views is problematic because image features can undergo significant changes in appearance. Unfortunately, state-of-the-art methods, like the scale-invariant feature transform (SIFT), are not resilient to the radial distortion that often arises in images acquired by cameras with microlenses and/or wide field-of-view. This paper proposes modifications to the SIFT algorithm that substantially improve the repeatability of detection and effectiveness of matching under radial distortion, while preserving the original invariance to scale and rotation. The scale-space representation of the image is obtained using adaptive filtering that compensates the local distortion, and the keypoint description is carried after implicit image gradient correction. Unlike competing methods, our approach avoids image resampling (the processing is carried out in the original image plane), it does not require accurate camera calibration (an approximate modeling of the distortion is sufficient), and it adds minimal computational overhead. Extensive experiments show the advantages of our method in establishing point correspondence across images with radial distortion.

Journal ArticleDOI
TL;DR: Exploiting vanishing points in query images and thus fully removing 3D rotation from the recognition problem allows then to simplify the feature invariance to a purely homothetic problem, which is shown to enable more discriminative power in feature descriptors than classical SIFT.
Abstract: Given a cell phone image of a building we address the problem of place-of-interest recognition in urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-level image data for database creation. Exploiting vanishing points in query images and thus fully removing 3D rotation from the recognition problem allows then to simplify the feature invariance to a purely homothetic problem, which we show enables more discriminative power in feature descriptors than classical SIFT. We rerank visual word based document queries using a fast stratified homothetic verification that in most cases boosts the correct document to top positions if it was in the short list. Since we exploit 3D building information, the approach finally outputs the camera pose in real world coordinates ready for augmenting the cell phone image with virtual 3D information. The whole system is demonstrated to outperform traditional approaches on city scale experiments for different sources of street-level image data and a challenging set of cell phone images.

Book ChapterDOI
07 Oct 2012
TL;DR: A hierarchical non-linear spatio-chromatic operator yields spatial and chromatic opponent channels, which mimics processing in the primate visual cortex, which is shown to outperform standard grayscale/shape-based descriptors as well as alternative color processing schemes on several datasets.
Abstract: We describe a novel framework for the joint processing of color and shape information in natural images. A hierarchical non-linear spatio-chromatic operator yields spatial and chromatic opponent channels, which mimics processing in the primate visual cortex. We extend two popular object recognition systems (i.e., the Hmax hierarchical model of visual processing and a sift-based bag-of-words approach) to incorporate color information along with shape information. We further use the framework in combination with the gist algorithm for scene categorization as well as the Berkeley segmentation algorithm. In all cases, the proposed approach is shown to outperform standard grayscale/shape-based descriptors as well as alternative color processing schemes on several datasets.

Journal ArticleDOI
TL;DR: A fast approach is proposed in this letter for large-size very high resolution image registration, which is accomplished based on coarse-to-fine strategy and blockwise scale-invariant feature transform (SIFT) matching.
Abstract: A fast approach is proposed in this letter for large-size very high resolution image registration, which is accomplished based on coarse-to-fine strategy and blockwise scale-invariant feature transform (SIFT) matching. Coarse registration is implemented at low resolution level, which provides a geometric constraint. The constraint makes the blockwise SIFT matching possible and is helpful for getting more matched keypoints at the latter refined procedure. Refined registration is achieved by blockwise SIFT matching and global optimization on the whole matched keypoints based on iterative reweighted least squares. To improve the efficiency, blockwise SIFT matching is implemented in a parallel manner. Experiments demonstrate the effectiveness of the proposed approach.

Patent
30 Oct 2012
TL;DR: In this paper, an image and/or video of the scene are captured using audio recorded at the scene, and an object search of the captured scene is narrowed down using shift invariant feature transform (SIFT).
Abstract: Methods, systems and articles of manufacture for recognizing and locating one or more objects in a scene are disclosed. An image and/or video of the scene are captured. Using audio recorded at the scene, an object search of the captured scene is narrowed down. For example, the direction of arrival (DOA) of a sound can be determined and used to limit the search area in a captured image/video. In another example, keypoint signatures may be selected based on types of sounds identified in the recorded audio. A keypoint signature corresponds to a particular object that the system is configured to recognize. Objects in the scene may then be recognized using a shift invariant feature transform (SIFT) analysis comparing keypoints identified in the captured scene to the selected keypoint signatures.

Book ChapterDOI
05 Nov 2012
TL;DR: Experimental results show that the proposed face recognition algorithm outperforms two commercial state-of-the-art face recognition SDKs (FaceVACS and PittPatt) for long distance face recognition in both daytime and nighttime operations, highlighting the need for better data capture setup and robust face matching algorithms for cross spectral matching at distances greater than 100 meters.
Abstract: Automatic face recognition capability in surveillance systems is important for security applications. However, few studies have addressed the problem of outdoor face recognition at a long distance (over 100 meters) in both daytime and nighttime environments. In this paper, we first report on a system that we have designed to collect face image database at a long distance, called the Long Distance Heterogeneous Face Database (LDHF-DB) to advance research on this topic. The LDHF-DB contains face images collected in an outdoor environment at distances of 60 meters, 100 meters, and 150 meters, with both visible light (VIS) face images captured in daytime and near infrared (NIR) face images captured in nighttime. Given this database, we have conducted two types of cross-distance face matching (matching long-distance probe to 1-meter gallery) experiments: (i) intra-spectral (VIS to VIS) face matching, and (ii) cross-spectral (NIR to VIS) face matching. The proposed face recognition algorithm consists of following three major steps: (i) Gaussian filtering to remove high frequency noise, (ii) Scale Invariant Feature Transform (SIFT) in local image regions for feature representation, and (iii) a random subspace method to build discriminant subspaces for face recognition. Experimental results show that the proposed face recognition algorithm outperforms two commercial state-of-the-art face recognition SDKs (FaceVACS and PittPatt) for long distance face recognition in both daytime and nighttime operations. These results highlight the need for better data capture setup and robust face matching algorithms for cross spectral matching at distances greater than 100 meters.