scispace - formally typeset
Search or ask a question

Showing papers on "Scale-invariant feature transform published in 2014"


Journal ArticleDOI
Maoguo Gong1, Shengmeng Zhao1, Licheng Jiao1, Dayong Tian1, Shuang Wang1 
TL;DR: A novel coarse-to-fine scheme for automatic image registration which is implemented by the scale-invariant feature transform approach equipped with a reliable outlier removal procedure and the maximization of mutual information using a modified Marquardt-Levenberg search strategy in a multiresolution framework.
Abstract: Automatic image registration is a vital yet challenging task, particularly for remote sensing images. A fully automatic registration approach which is accurate, robust, and fast is required. For this purpose, a novel coarse-to-fine scheme for automatic image registration is proposed in this paper. This scheme consists of a preregistration process (coarse registration) and a fine-tuning process (fine registration). To begin with, the preregistration process is implemented by the scale-invariant feature transform approach equipped with a reliable outlier removal procedure. The coarse results provide a near-optimal initial solution for the optimizer in the fine-tuning process. Next, the fine-tuning process is implemented by the maximization of mutual information using a modified Marquardt-Levenberg search strategy in a multiresolution framework. The proposed algorithm is tested on various remote sensing optical and synthetic aperture radar images taken at different situations (multispectral, multisensor, and multitemporal) with the affine transformation model. The experimental results demonstrate the accuracy, robustness, and efficiency of the proposed algorithm.

256 citations


Posted Content
TL;DR: A coupled Multi-Index (c-MI) framework to perform feature fusion at indexing level, which improves the retrieval accuracy significantly, while consuming only half of the query time compared to the baseline, and is well complementary to many prior techniques.
Abstract: In Bag-of-Words (BoW) based image retrieval, the SIFT visual word has a low discriminative power, so false positive matches occur prevalently. Apart from the information loss during quantization, another cause is that the SIFT feature only describes the local gradient distribution. To address this problem, this paper proposes a coupled Multi-Index (c-MI) framework to perform feature fusion at indexing level. Basically, complementary features are coupled into a multi-dimensional inverted index. Each dimension of c-MI corresponds to one kind of feature, and the retrieval process votes for images similar in both SIFT and other feature spaces. Specifically, we exploit the fusion of local color feature into c-MI. While the precision of visual match is greatly enhanced, we adopt Multiple Assignment to improve recall. The joint cooperation of SIFT and color features significantly reduces the impact of false positive matches. Extensive experiments on several benchmark datasets demonstrate that c-MI improves the retrieval accuracy significantly, while consuming only half of the query time compared to the baseline. Importantly, we show that c-MI is well complementary to many prior techniques. Assembling these methods, we have obtained an mAP of 85.8% and N-S score of 3.85 on Holidays and Ukbench datasets, respectively, which compare favorably with the state-of-the-arts.

206 citations


Journal ArticleDOI
TL;DR: A robust distance function based on the Gaussian Radial Basis Function (G-RBF) is proposed and evaluated on a new data set of 102k street view images; the experiments show it outperforms the state of the art by 10 percent.
Abstract: In this paper, we present a new framework for geo-locating an image utilizing a novel multiple nearest neighbor feature matching method using Generalized Minimum Clique Graphs (GMCP). First, we extract local features (e.g., SIFT) from the query image and retrieve a number of nearest neighbors for each query feature from the reference data set. Next, we apply our GMCP-based feature matching to select a single nearest neighbor for each query feature such that all matches are globally consistent. Our approach to feature matching is based on the proposition that the first nearest neighbors are not necessarily the best choices for finding correspondences in image matching. Therefore, the proposed method considers multiple reference nearest neighbors as potential matches and selects the correct ones by enforcing consistency among their global features (e.g., GIST) using GMCP. In this context, we argue that using a robust distance function for finding the similarity between the global features is essential for the cases where the query matches multiple reference images with dissimilar global features. Towards this end, we propose a robust distance function based on the Gaussian Radial Basis Function (G-RBF). We evaluated the proposed framework on a new data set of 102k street view images; the experiments show it outperforms the state of the art by 10 percent.

204 citations


Journal ArticleDOI
TL;DR: The proposed methodology for automatic food recognition, based on the bag-of-features (BoF) model, achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.
Abstract: Computer vision-based food recognition could be used to estimate a meal's carbohydrate content for diabetic patients. This study proposes a methodology for automatic food recognition, based on the bag-of-features (BoF) model. An extensive technical investigation was conducted for the identification and optimization of the best performing components involved in the BoF architecture, as well as the estimation of the corresponding parameters. For the design and evaluation of the prototype system, a visual dataset with nearly 5000 food images was created and organized into 11 classes. The optimized system computes dense local features, using the scale-invariant feature transform on the HSV color space, builds a visual dictionary of 10000 visual words by using the hierarchical k-means clustering and finally classifies the food images with a linear support vector machine classifier. The system achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.

198 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: Zhang et al. as mentioned in this paper proposed a coupled multi-index (c-MI) framework to perform feature fusion at indexing level, where complementary features are coupled into a multi-dimensional inverted index, and the retrieval process votes for images similar in both SIFT and other feature spaces.
Abstract: In Bag-of-Words (BoW) based image retrieval, the SIFT visual word has a low discriminative power, so false positive matches occur prevalently. Apart from the information loss during quantization, another cause is that the SIFT feature only describes the local gradient distribution. To address this problem, this paper proposes a coupled Multi-Index (c-MI) framework to perform feature fusion at indexing level. Basically, complementary features are coupled into a multi-dimensional inverted index. Each dimension of c-MI corresponds to one kind of feature, and the retrieval process votes for images similar in both SIFT and other feature spaces. Specifically, we exploit the fusion of local color feature into c-MI. While the precision of visual match is greatly enhanced, we adopt Multiple Assignment to improve recall. The joint cooperation of SIFT and color features significantly reduces the impact of false positive matches. Extensive experiments on several benchmark datasets demonstrate that c-MI improves the retrieval accuracy significantly, while consuming only half of the query time compared to the baseline. Importantly, we show that c-MI is well complementary to many prior techniques. Assembling these methods, we have obtained an mAP of 85.8% and N-S score of 3.85 on Holidays and Ukbench datasets, respectively, which compare favorably with the state-of-the-arts.

169 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed local descriptor based registration method can achieve reliable registration outcome, and the LSS-based similarity metric is robust to non-linear intensity differences among multispectral remote sensing images.
Abstract: Image registration is a crucial step for remote sensing image processing Automatic registration of multispectral remote sensing images could be challenging due to the significant non-linear intensity differences caused by radiometric variations among such images To address this problem, this paper proposes a local descriptor based registration method for multispectral remote sensing images The proposed method includes a two-stage process: pre-registration and fine registration The pre-registration is achieved using the Scale Restriction Scale Invariant Feature Transform (SR-SIFT) to eliminate the obvious translation, rotation, and scale differences between the reference and the sensed image In the fine registration stage, the evenly distributed interest points are first extracted in the pre-registered image using the Harris corner detector Then, we integrate the local self-similarity (LSS) descriptor as a new similarity metric to detect the tie points between the reference and the pre-registered image, followed by a global consistency check to remove matching blunders Finally, image registration is achieved using a piecewise linear transform The proposed method has been evaluated with three pairs of multispectral remote sensing images from TM, ETM+, ASTER, Worldview, and Quickbird sensors The experimental results demonstrate that the proposed method can achieve reliable registration outcome, and the LSS-based similarity metric is robust to non-linear intensity differences among multispectral remote sensing images

163 citations


Journal ArticleDOI
TL;DR: The joint integration of the SIFT visual word and binary features greatly enhances the precision of visual matching, reducing the impact of false positive matches and the proposed method significantly improves the baseline approach.
Abstract: Visual matching is a crucial step in image retrieval based on the bag-of-words (BoW) model In the baseline method, two keypoints are considered as a matching pair if their SIFT descriptors are quantized to the same visual word However, the SIFT visual word has two limitations First, it loses most of its discriminative power during quantization Second, SIFT only describes the local texture feature Both drawbacks impair the discriminative power of the BoW model and lead to false positive matches To tackle this problem, this paper proposes to embed multiple binary features at indexing level To model correlation between features, a multi-IDF scheme is introduced, through which different binary features are coupled into the inverted file We show that matching verification methods based on binary features, such as Hamming embedding, can be effectively incorporated in our framework As an extension, we explore the fusion of binary color feature into image retrieval The joint integration of the SIFT visual word and binary features greatly enhances the precision of visual matching, reducing the impact of false positive matches Our method is evaluated through extensive experiments on four benchmark datasets (Ukbench, Holidays, DupImage, and MIR Flickr 1M) We show that our method significantly improves the baseline approach In addition, large-scale experiments indicate that the proposed method requires acceptable memory usage and query time compared with other approaches Further, when global color feature is integrated, our method yields competitive performance with the state-of-the-arts

146 citations


Journal ArticleDOI
TL;DR: A feature correlation hypergraph (FCH) is constructed to model the high-order relations among multimodal features and a multiclass boosting strategy is developed to obtain a strong classifier by combining the weak classifiers learned from each partition.
Abstract: In computer vision and multimedia analysis, it is common to use multiple features (or multimodal features) to represent an object. For example, to well characterize a natural scene image, we typically extract a set of visual features to represent its color, texture, and shape. However, it is challenging to integrate multimodal features optimally. Since they are usually high-order correlated, e.g., the histogram of gradient (HOG), bag of scale invariant feature transform descriptors, and wavelets are closely related because they collaboratively reflect the image texture. Nevertheless, the existing algorithms fail to capture the high-order correlation among multimodal features. To solve this problem, we present a new multimodal feature integration framework. Particularly, we first define a new measure to capture the high-order correlation among the multimodal features, which can be deemed as a direct extension of the previous binary correlation. Therefore, we construct a feature correlation hypergraph (FCH) to model the high-order relations among multimodal features. Finally, a clustering algorithm is performed on FCH to group the original multimodal features into a set of partitions. Moreover, a multiclass boosting strategy is developed to obtain a strong classifier by combining the weak classifiers learned from each partition. The experimental results on seven popular datasets show the effectiveness of our approach.

142 citations


Proceedings ArticleDOI
04 May 2014
TL;DR: This paper employs Motion SIFT (MoSIFT) algorithm to extract the low-level description of a query video and adopts sparse coding scheme to further process the selected MoSIFTs to obtain the highly discriminative video feature.
Abstract: To detect violence in a video, a common video description method is to apply local spatio-temporal description on the query video. Then, the low-level description is further summarized onto the high-level feature based on Bag-of-Words (BoW) model. However, traditional spatio-temporal descriptors are not discriminative enough. Moreover, BoW model roughly assigns each feature vector to only one visual word, therefore inevitably causing quantization error. To tackle the constrains, this paper employs Motion SIFT (MoSIFT) algorithm to extract the low-level description of a query video. To eliminate the feature noise, Kernel Density Estimation (KDE) is exploited for feature selection on the MoSIFT descriptor. In order to obtain the highly discriminative video feature, this paper adopts sparse coding scheme to further process the selected MoSIFTs. Encouraging experimental results are obtained based on two challenging datasets which record both crowded scenes and non-crowded scenes.

100 citations


Book ChapterDOI
01 Jan 2014
TL;DR: The interest point is the keypoints in each image, and often provides the scale, rotational, and illumination invariance attributes for the descriptor; the descriptor adds more detail and more invariant attributes.
Abstract: Many algorithms for computer vision rely on locating interest points, or keypoints in each image, and calculating a feature description from the pixel region surrounding the interest point. This is in contrast to methods such as correlation, where a larger rectangular pattern is stepped over the image at pixel intervals and the correlation is measured at each location. The interest point is the, and often provides the scale, rotational, and illumination invariance attributes for the descriptor; the descriptor adds more detail and more invariance attributes. Groups of interest points and descriptors together describe the actual objects.

91 citations


Journal ArticleDOI
TL;DR: A novel and powerful local image descriptor that extracts the histograms of second-order gradients (HSOGs) to capture the curvature related geometric properties of the neural landscape, i.e., cliffs, ridges, summits, valleys, basins, and so on is introduced.
Abstract: Recent investigations on human vision discover that the retinal image is a landscape or a geometric surface, consisting of features such as ridges and summits. However, most of existing popular local image descriptors in the literature, e.g., scale invariant feature transform (SIFT), histogram of oriented gradient (HOG), DAISY, local binary Patterns (LBP), and gradient location and orientation histogram, only employ the first-order gradient information related to the slope and the elasticity, i.e., length, area, and so on of a surface, and thereby partially characterize the geometric properties of a landscape. In this paper, we introduce a novel and powerful local image descriptor that extracts the histograms of second-order gradients (HSOGs) to capture the curvature related geometric properties of the neural landscape, i.e., cliffs, ridges, summits, valleys, basins, and so on. We conduct comprehensive experiments on three different applications, including the problem of local image matching, visual object categorization, and scene classification. The experimental results clearly evidence the discriminative power of HSOG as compared with its first-order gradient-based counterparts, e.g., SIFT, HOG, DAISY, and center-symmetric LBP, and the complementarity in terms of image representation, demonstrating the effectiveness of the proposed local descriptor.

Journal ArticleDOI
27 May 2014-PLOS ONE
TL;DR: A novel recognition approach for contact-free palm-vein recognition that performs feature extraction and matching on all vein textures distributed over the palm surface, including finger veins and palm veins, to minimize the loss of feature information is presented.
Abstract: Contact-free palm-vein recognition is one of the most challenging and promising areas in hand biometrics. In view of the existing problems in contact-free palm-vein imaging, including projection transformation, uneven illumination and difficulty in extracting exact ROIs, this paper presents a novel recognition approach for contact-free palm-vein recognition that performs feature extraction and matching on all vein textures distributed over the palm surface, including finger veins and palm veins, to minimize the loss of feature information. First, a hierarchical enhancement algorithm, which combines a DOG filter and histogram equalization, is adopted to alleviate uneven illumination and to highlight vein textures. Second, RootSIFT, a more stable local invariant feature extraction method in comparison to SIFT, is adopted to overcome the projection transformation in contact-free mode. Subsequently, a novel hierarchical mismatching removal algorithm based on neighborhood searching and LBP histograms is adopted to improve the accuracy of feature matching. Finally, we rigorously evaluated the proposed approach using two different databases and obtained 0.996% and 3.112% Equal Error Rates (EERs), respectively, which demonstrate the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: A framework of a complete image stitching system based on feature based approaches will be introduced and the current challenges of image stitching will be discussed.
Abstract: stitching (Mosaicing) is considered as an active research area in computer vision and computer graphics. Image stitching is concerned with combining two or more images of the same scene into one high resolution image which is called panoramic image. Image stitching techniques can be categorized into two general approaches: direct and feature based techniques. Direct techniques compare all the pixel intensities of the images with each other, whereas feature based techniques aim to determine a relationship between the images through distinct features extracted from the processed images. The last approach has the advantage of being more robust against scene movement, faster, and has the ability to automatically discover the overlapping relationships among an unordered set of images. The purpose of this paper is to present a survey about the feature based image stitching. The main components of image stitching will be described. A framework of a complete image stitching system based on feature based approaches will be introduced. Finally, the current challenges of image stitching will be discussed. Keywordsstitching/mosaicing, panoramic image, features based detection, SIFT, SURF, image blending.

Posted Content
TL;DR: Wang et al. as mentioned in this paper proposed a sparse coding based Fisher vector coding (SCFVC) method for high-dimensional local features, where each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace.
Abstract: Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, % FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.

Journal ArticleDOI
TL;DR: The comparison results show that the GA-SIFT outperforms some previously reported SIFT algorithms in the feature extraction from a multispectral image, and it is comparable with its counterparts in thefeature extraction of color images, indicating good performance in various applications of image analysis.

Journal ArticleDOI
TL;DR: This article investigates the accuracy of similarity measures for thermal–visible image registration of human silhouettes, including MI, Sum of Squared Differences (SSD), Normalized Cross-Correlation (NCC), Histograms of Oriented Gradients (HOG), Local Self-Similarity (LSS), Scale-Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), Census and Binary Robust Independent Elementary Feature (BRIEF).

Journal ArticleDOI
Oliver Woodford1, Minh-Tri Pham1, Atsuto Maki1, Frank Perbet1, Bjorn Stenger1 
TL;DR: Two new and powerful improvements to this popular inference method, intrinsic and minimum-entropy Hough, are developed, which solve the problem of exponential memory requirements of the standard Hough transform by exploiting the sparsity of the Hough space.
Abstract: In applying the Hough transform to the problem of 3D shape recognition and registration, we develop two new and powerful improvements to this popular inference method. The first, intrinsic Hough, solves the problem of exponential memory requirements of the standard Hough transform by exploiting the sparsity of the Hough space. The second, minimum-entropy Hough, explains away incorrect votes, substantially reducing the number of modes in the posterior distribution of class and pose, and improving precision. Our experiments demonstrate that these contributions make the Hough transform not only tractable but also highly accurate for our example application. Both contributions can be applied to other tasks that already use the standard Hough transform.

Journal ArticleDOI
TL;DR: A Block-SIFT method is designed to overcome the memory limitation of SIFT for extracting and matching features from large photogrammetric images, and it is demonstrated that more than 33 million features can be extracted and matched from the Taian dataset with 737 images within 21 h using the L 2 -SIFT algorithm.
Abstract: The primary contribution of this paper is an efficient feature extraction and matching implementation for large images in large-scale aerial photogrammetry experiments. First, a Block-SIFT method is designed to overcome the memory limitation of SIFT for extracting and matching features from large photogrammetric images. For each pair of images, the original large image is split into blocks and the possible corresponding blocks in the other image are determined by pre-estimating the relative transformation between the two images. Because of the reduced memory requirement, features can be extracted and matched from the original images without down-sampling. Next, a red-black tree data structure is applied to create a feature relationship to reduce the search complexity when matching tie points. Meanwhile, tree key exchange and segment matching methods are proposed to match the tie points along-track and across-track. Finally, to evaluate the accuracy of the features extracted and matched from the proposed L 2 -SIFT algorithm, a bundle adjustment with parallax angle feature parametrization (ParallaxBA 1 ) is applied to obtain the Mean Square Error (MSE) of the feature reprojections, where the feature extraction and matching result is the only information used in the nonlinear optimisation system. Seven different experimental aerial photogrammetric datasets are used to demonstrate the efficiency and validity of the proposed algorithm. It is demonstrated that more than 33 million features can be extracted and matched from the Taian dataset with 737 images within 21 h using the L 2 -SIFT algorithm. In addition, the ParallaxBA involving more than 2.7 million features and 6 million image points can easily converge to an MSE of 0.03874. The C/C++ source code for the proposed algorithm is available at http://services.eng.uts.edu.au/sdhuang/research.htm .

Journal ArticleDOI
TL;DR: Experiments based on UAV images with five-centimeter ground resolution demonstrate the effectiveness of the proposed object-based hierarchical method, leading to the conclusion that this method is practically applicable for frequent monitoring.
Abstract: There have been increasing demands for automatically monitoring urban areas in very high detail, and the Unmanned Aerial Vehicle (UAV) with auto-navigation (AUNA) system offers such capability. This study proposes an object-based hierarchical method to detect changes from UAV images taken at different times. It consists of several steps. In the first step, an octocopter with AUNA capability is used to acquire images at different dates. These images are registered automatically, based on SIFT (Scale-Invariant Feature Transform) feature points, via the general bundle adjustment framework. Thus, the Digital Surface Models (DSMs) and orthophotos can be generated for raster-based change analysis. In the next step, a multi-primitive segmentation method combining the spectral and geometric information is proposed for object-based analysis. In the final step, a multi-criteria decision analysis is carried out concerning the height, spectral and geometric coherence, and shape regularity for change determination. Experiments based on UAV images with five-centimeter ground resolution demonstrate the effectiveness of the proposed method, leading to the conclusion that this method is practically applicable for frequent monitoring.

Journal ArticleDOI
TL;DR: The combination of ORB with ORB and MSER with SIFT can be preferable almost in all possible situations when the precision and recall results are considered and the speed of FAST with BRIEF is superior to others.
Abstract: Comparison of feature detectors and descriptors and assessing their performance is very important in computer vision. In this study, we evaluate the performance of seven combination of well-known detectors and descriptors which are SIFT with SIFT, SURF with SURF, MSER with SIFT, BRISK with FREAK, BRISK with BRISK, ORB with ORB and FAST with BRIEF. The popular Oxford dataset is used in test stage. To compare the performance of each combination objectively, the effects of JPEG compression, zoom and rotation, blur, viewpoint and illumination variation have investigated in terms of precision and recall values. Upon inspecting the obtained results, it is observed that the combination of ORB with ORB and MSER with SIFT can be preferable almost in all possible situations when the precision and recall results are considered. Moreover, the speed of FAST with BRIEF is superior to others.

Journal ArticleDOI
TL;DR: A unique method for copy-move forgery detection which can sustained various pre-processing attacks using a combination of Dyadic Wavelet Transform (DyWT) and Scale Invariant Feature Transform (SIFT).

Journal ArticleDOI
TL;DR: An optimized descriptor generation algorithm is proposed with square subregions arranged in 16 directions, and the descriptors are generated by reordering the histogram instead of window rotation, which improves the parallelism of the algorithm, but also avoids floating data calculation to save hardware consumption.
Abstract: This paper introduces a high-speed all-hardware scale-invariant feature transform (SIFT) architecture with parallel and pipeline technology for real-time extraction of image features. The task-level parallel and pipeline structure are exploited between the hardware blocks, and the data-level parallel and pipeline architecture are exploited inside each block. Two identical random access memories are adopted with ping-pong operation to execute the key point detection module and the descriptor generation module in task-level parallelism. With speeding up the key point detection module of SIFT, the descriptor generation module has become the bottleneck of the system's performance; therefore, this paper proposes an optimized descriptor generation algorithm. A novel window-dividing method is proposed with square subregions arranged in 16 directions, and the descriptors are generated by reordering the histogram instead of window rotation. Therefore, the main orientation detection block and descriptor generation block run in parallel instead of interactively. With the optimized algorithm cooperating with pipeline structure inside each block, we not only improve the parallelism of the algorithm, but also avoid floating data calculation to save hardware consumption. Thus, the descriptor generation module leads the speed almost 15 times faster than a recent solution. The proposed system was implemented on field programmable gate array and the overall time to extract SIFT features for an image having 512×512 pixels is only 6.55 ms (sufficient for real-time applications), and the number of feature points can reach up to 2900.

Journal ArticleDOI
TL;DR: This letter presents a novel method that uses normalized gradients as local image features for the description of keypoints in order to achieve robustness against non linear intensity changes between multispectral images.
Abstract: This letter presents a novel method for the description of multispectral image keypoints. The method proposed is based on a modified SIFT algorithm. It uses normalized gradients as local image features for the description of keypoints in order to achieve robustness against non linear intensity changes between multispectral images. The experimental results show that the method proposed achieves a better matching performance and outperforms the SIFT algorithm.

Journal ArticleDOI
TL;DR: This paper studies an alternative to current local descriptors and BoWs model by extracting the ultrashort binary descriptor (USB) and a compact auxiliary spatial feature from each keypoint detected in images and tests the competitive accuracy, memory consumption, and significantly better efficiency of this approach.
Abstract: Currently, many local descriptors have been proposed to tackle a basic issue in computer vision: duplicate visual content matching. These descriptors either are represented as high-dimensional vectors relatively expensive to extract and compare or are binary codes limited in robustness. Bag-of-visual words (BoWs) model compresses local features into a compact representation that allows for fast matching and scalable indexing. However, the codebook training, high-dimensional feature extraction, and quantization significantly degrade the flexibility and efficiency of BoWs model. In this paper, we study an alternative to current local descriptors and BoWs model by extracting the ultrashort binary descriptor (USB) and a compact auxiliary spatial feature from each keypoint detected in images. A typical USB is a 24-bit binary descriptor, hence it directly quantizes visual clues of image keypoints to about 16 million unique IDs. USB allows fast image matching and indexing and avoids the expensive codebook training and feature quantization in BoWs model. The spatial feature complementarily captures the spatial configuration in neighbor region of each keypoint, hence is used to filter mismatched USBs in a cascade verification. In image matching task, USB shows promising accuracy and nearly one-order faster speed than SIFT. We also test USB in retrieval tasks on UKbench, Oxford5K, and 1.2 million distractor images. Comparisons with recent retrieval methods manifest the competitive accuracy, memory consumption, and significantly better efficiency of our approach.

Journal ArticleDOI
01 Feb 2014
TL;DR: The results indicate that the proposed MIFT method can detect duplicated regions in copy–move image forgery with higher accuracy, especially when the size of the duplicated region is small.
Abstract: Copy---move image forgery detection has recently become a very active research topic in blind image forensics. In copy---move image forgery, a region from some image location is copied and pasted to a different location of the same image. Typically, post-processing is applied to better hide the forgery. Using keypoint-based features, such as SIFT features, for detecting copy---move image forgeries has produced promising results. The main idea is detecting duplicated regions in an image by exploiting the similarity between keypoint-based features in these regions. In this paper, we have adopted keypoint-based features for copy---move image forgery detection; however, our emphasis is on accurate and robust localization of duplicated regions. In this context, we are interested in estimating the transformation (e.g., affine) between the copied and pasted regions more accurately as well as extracting these regions as robustly by reducing the number of false positives and negatives. To address these issues, we propose using a more powerful set of keypoint-based features, called MIFT, which shares the properties of SIFT features but also are invariant to mirror reflection transformations. Moreover, we propose refining the affine transformation using an iterative scheme which improves the estimation of the affine transformation parameters by incrementally finding additional keypoint matches. To reduce false positives and negatives when extracting the copied and pasted regions, we propose using "dense" MIFT features, instead of standard pixel correlation, along with hysteresis thresholding and morphological operations. The proposed approach has been evaluated and compared with competitive approaches through a comprehensive set of experiments using a large dataset of real images (i.e., CASIA v2.0). Our results indicate that our method can detect duplicated regions in copy---move image forgery with higher accuracy, especially when the size of the duplicated region is small.

Proceedings Article
08 Dec 2014
TL;DR: A model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace, termed Sparse Coding based Fisher Vector Coding (SCFVC), which significantly outperforms the traditional GMM based Fisher vector encoding and achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.
Abstract: Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.

Journal ArticleDOI
TL;DR: A new feature extraction framework is proposed in order to determine and classify breast cancer cases and achieved a remarkable increase in recognition performance for the three-class study.

Journal ArticleDOI
TL;DR: A new FPGA-based embedded system architecture that consists of scale-invariant feature transform (SIFT) feature detection, as well as binary robust independent elementary features (BRIEF) feature description and matching, which achieves feature detection and matching at 60 frame/s for 720-p video.
Abstract: Detecting and matching image features is a fundamental task in video analytics and computer vision systems. It establishes the correspondences between two images taken at different time instants or from different viewpoints. However, its large computational complexity has been a challenge to most embedded systems. This paper proposes a new FPGA-based embedded system architecture for feature detection and matching. It consists of scale-invariant feature transform (SIFT) feature detection, as well as binary robust independent elementary features (BRIEF) feature description and matching. It is able to establish accurate correspondences between consecutive frames for 720-p (1280x720) video. It optimizes the FPGA architecture for the SIFT feature detection to reduce the utilization of FPGA resources. Moreover, it implements the BRIEF feature description and matching on FPGA. Due to these contributions, the proposed system achieves feature detection and matching at 60 frame/s for 720-p video. Its processing speed can meet and even exceed the demand of most real-life real-time video analytics applications. Extensive experiments have demonstrated its efficiency and effectiveness.

Journal ArticleDOI
TL;DR: A comparison between two popular feature extraction methods, Scale-invariant feature transform (or SIFT) and Speeded up robust features (or SURF) is presented.

Journal ArticleDOI
TL;DR: The paper for the first time analyzed and proposed the optimal finger shape model and results obtained from different fingerprint feature correspondences are analyzed and compared to show which features are more suitable for 3D fingerprint images generation.