Showing papers on "Scale-invariant feature transform published in 2014"

PDF

Open Access

Journal Article•DOI•

A Novel Coarse-to-Fine Scheme for Automatic Image Registration Based on SIFT and Mutual Information

[...]

Maoguo Gong¹, Shengmeng Zhao¹, Licheng Jiao¹, Dayong Tian¹, Shuang Wang¹ - Show less +1 more•Institutions (1)

01 Jul 2014-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A novel coarse-to-fine scheme for automatic image registration which is implemented by the scale-invariant feature transform approach equipped with a reliable outlier removal procedure and the maximization of mutual information using a modified Marquardt-Levenberg search strategy in a multiresolution framework.

...read moreread less

Abstract: Automatic image registration is a vital yet challenging task, particularly for remote sensing images. A fully automatic registration approach which is accurate, robust, and fast is required. For this purpose, a novel coarse-to-fine scheme for automatic image registration is proposed in this paper. This scheme consists of a preregistration process (coarse registration) and a fine-tuning process (fine registration). To begin with, the preregistration process is implemented by the scale-invariant feature transform approach equipped with a reliable outlier removal procedure. The coarse results provide a near-optimal initial solution for the optimizer in the fine-tuning process. Next, the fine-tuning process is implemented by the maximization of mutual information using a modified Marquardt-Levenberg search strategy in a multiresolution framework. The proposed algorithm is tested on various remote sensing optical and synthetic aperture radar images taken at different situations (multispectral, multisensor, and multitemporal) with the affine transformation model. The experimental results demonstrate the accuracy, robustness, and efficiency of the proposed algorithm.

...read moreread less

256 citations

Posted Content•

Packing and Padding: Coupled Multi-index for Accurate Image Retrieval

[...]

Liang Zheng¹, Shengjin Wang¹, Ziqiong Liu¹, Qi Tian²•Institutions (2)

Tsinghua University¹, University of Texas at San Antonio²

11 Feb 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: A coupled Multi-Index (c-MI) framework to perform feature fusion at indexing level, which improves the retrieval accuracy significantly, while consuming only half of the query time compared to the baseline, and is well complementary to many prior techniques.

...read moreread less

Abstract: In Bag-of-Words (BoW) based image retrieval, the SIFT visual word has a low discriminative power, so false positive matches occur prevalently. Apart from the information loss during quantization, another cause is that the SIFT feature only describes the local gradient distribution. To address this problem, this paper proposes a coupled Multi-Index (c-MI) framework to perform feature fusion at indexing level. Basically, complementary features are coupled into a multi-dimensional inverted index. Each dimension of c-MI corresponds to one kind of feature, and the retrieval process votes for images similar in both SIFT and other feature spaces. Specifically, we exploit the fusion of local color feature into c-MI. While the precision of visual match is greatly enhanced, we adopt Multiple Assignment to improve recall. The joint cooperation of SIFT and color features significantly reduces the impact of false positive matches. Extensive experiments on several benchmark datasets demonstrate that c-MI improves the retrieval accuracy significantly, while consuming only half of the query time compared to the baseline. Importantly, we show that c-MI is well complementary to many prior techniques. Assembling these methods, we have obtained an mAP of 85.8% and N-S score of 3.85 on Holidays and Ukbench datasets, respectively, which compare favorably with the state-of-the-arts.

...read moreread less

206 citations

Journal Article•DOI•

Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs

[...]

Amir Roshan Zamir¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

01 Aug 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A robust distance function based on the Gaussian Radial Basis Function (G-RBF) is proposed and evaluated on a new data set of 102k street view images; the experiments show it outperforms the state of the art by 10 percent.

...read moreread less

Abstract: In this paper, we present a new framework for geo-locating an image utilizing a novel multiple nearest neighbor feature matching method using Generalized Minimum Clique Graphs (GMCP). First, we extract local features (e.g., SIFT) from the query image and retrieve a number of nearest neighbors for each query feature from the reference data set. Next, we apply our GMCP-based feature matching to select a single nearest neighbor for each query feature such that all matches are globally consistent. Our approach to feature matching is based on the proposition that the first nearest neighbors are not necessarily the best choices for finding correspondences in image matching. Therefore, the proposed method considers multiple reference nearest neighbors as potential matches and selects the correct ones by enforcing consistency among their global features (e.g., GIST) using GMCP. In this context, we argue that using a robust distance function for finding the similarity between the global features is essential for the cases where the query matches multiple reference images with dissimilar global features. Towards this end, we propose a robust distance function based on the Gaussian Radial Basis Function (G-RBF). We evaluated the proposed framework on a new data set of 102k street view images; the experiments show it outperforms the state of the art by 10 percent.

...read moreread less

204 citations

Journal Article•DOI•

A food recognition system for diabetic patients based on an optimized bag-of-features model

[...]

Marios Anthimopoulos¹, Lauro Gianola¹, Luca Scarnato¹, Peter Diem¹, Stavroula Mougiakakou¹ - Show less +1 more•Institutions (1)

University of Bern¹

11 Mar 2014-IEEE Journal of Biomedical and Health Informatics

TL;DR: The proposed methodology for automatic food recognition, based on the bag-of-features (BoF) model, achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.

...read moreread less

Abstract: Computer vision-based food recognition could be used to estimate a meal's carbohydrate content for diabetic patients. This study proposes a methodology for automatic food recognition, based on the bag-of-features (BoF) model. An extensive technical investigation was conducted for the identification and optimization of the best performing components involved in the BoF architecture, as well as the estimation of the corresponding parameters. For the design and evaluation of the prototype system, a visual dataset with nearly 5000 food images was created and organized into 11 classes. The optimized system computes dense local features, using the scale-invariant feature transform on the HSV color space, builds a visual dictionary of 10000 visual words by using the hierarchical k-means clustering and finally classifies the food images with a linear support vector machine classifier. The system achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.

...read moreread less

198 citations

Proceedings Article•DOI•

Packing and Padding: Coupled Multi-index for Accurate Image Retrieval

[...]

Liang Zheng¹, Shengjin Wang¹, Ziqiong Liu¹, Qi Tian²•Institutions (2)

Tsinghua University¹, University of Texas at San Antonio²

23 Jun 2014

TL;DR: Zhang et al. as mentioned in this paper proposed a coupled multi-index (c-MI) framework to perform feature fusion at indexing level, where complementary features are coupled into a multi-dimensional inverted index, and the retrieval process votes for images similar in both SIFT and other feature spaces.

...read moreread less

169 citations

Journal Article•DOI•

A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences

[...]

Yuanxin Ye¹, Yuanxin Ye², Jie Shan³, Jie Shan²•Institutions (3)

Southwest Jiaotong University¹, Wuhan University², Purdue University³

01 Apr 2014-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: Experimental results demonstrate that the proposed local descriptor based registration method can achieve reliable registration outcome, and the LSS-based similarity metric is robust to non-linear intensity differences among multispectral remote sensing images.

...read moreread less

Abstract: Image registration is a crucial step for remote sensing image processing Automatic registration of multispectral remote sensing images could be challenging due to the significant non-linear intensity differences caused by radiometric variations among such images To address this problem, this paper proposes a local descriptor based registration method for multispectral remote sensing images The proposed method includes a two-stage process: pre-registration and fine registration The pre-registration is achieved using the Scale Restriction Scale Invariant Feature Transform (SR-SIFT) to eliminate the obvious translation, rotation, and scale differences between the reference and the sensed image In the fine registration stage, the evenly distributed interest points are first extracted in the pre-registered image using the Harris corner detector Then, we integrate the local self-similarity (LSS) descriptor as a new similarity metric to detect the tie points between the reference and the pre-registered image, followed by a global consistency check to remove matching blunders Finally, image registration is achieved using a piecewise linear transform The proposed method has been evaluated with three pairs of multispectral remote sensing images from TM, ETM+, ASTER, Worldview, and Quickbird sensors The experimental results demonstrate that the proposed method can achieve reliable registration outcome, and the LSS-based similarity metric is robust to non-linear intensity differences among multispectral remote sensing images

...read moreread less

163 citations

Journal Article•DOI•

Coupled binary embedding for large-scale image retrieval.

[...]

Liang Zheng¹, Shengjin Wang¹, Qi Tian²•Institutions (2)

Tsinghua University¹, University of Texas at San Antonio²

12 Jun 2014-IEEE Transactions on Image Processing

TL;DR: The joint integration of the SIFT visual word and binary features greatly enhances the precision of visual matching, reducing the impact of false positive matches and the proposed method significantly improves the baseline approach.

...read moreread less

Abstract: Visual matching is a crucial step in image retrieval based on the bag-of-words (BoW) model In the baseline method, two keypoints are considered as a matching pair if their SIFT descriptors are quantized to the same visual word However, the SIFT visual word has two limitations First, it loses most of its discriminative power during quantization Second, SIFT only describes the local texture feature Both drawbacks impair the discriminative power of the BoW model and lead to false positive matches To tackle this problem, this paper proposes to embed multiple binary features at indexing level To model correlation between features, a multi-IDF scheme is introduced, through which different binary features are coupled into the inverted file We show that matching verification methods based on binary features, such as Hamming embedding, can be effectively incorporated in our framework As an extension, we explore the fusion of binary color feature into image retrieval The joint integration of the SIFT visual word and binary features greatly enhances the precision of visual matching, reducing the impact of false positive matches Our method is evaluated through extensive experiments on four benchmark datasets (Ukbench, Holidays, DupImage, and MIR Flickr 1M) We show that our method significantly improves the baseline approach In addition, large-scale experiments indicate that the proposed method requires acceptable memory usage and query time compared with other approaches Further, when global color feature is integrated, our method yields competitive performance with the state-of-the-arts

...read moreread less

146 citations

Journal Article•DOI•

Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition.

[...]

Luming Zhang¹, Yue Gao¹, Chaoqun Hong², Yinfu Feng³, Jianke Zhu³, Deng Cai³ - Show less +2 more•Institutions (3)

National University of Singapore¹, Xiamen University of Technology², Zhejiang University³

01 Aug 2014-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A feature correlation hypergraph (FCH) is constructed to model the high-order relations among multimodal features and a multiclass boosting strategy is developed to obtain a strong classifier by combining the weak classifiers learned from each partition.

...read moreread less

Abstract: In computer vision and multimedia analysis, it is common to use multiple features (or multimodal features) to represent an object. For example, to well characterize a natural scene image, we typically extract a set of visual features to represent its color, texture, and shape. However, it is challenging to integrate multimodal features optimally. Since they are usually high-order correlated, e.g., the histogram of gradient (HOG), bag of scale invariant feature transform descriptors, and wavelets are closely related because they collaboratively reflect the image texture. Nevertheless, the existing algorithms fail to capture the high-order correlation among multimodal features. To solve this problem, we present a new multimodal feature integration framework. Particularly, we first define a new measure to capture the high-order correlation among the multimodal features, which can be deemed as a direct extension of the previous binary correlation. Therefore, we construct a feature correlation hypergraph (FCH) to model the high-order relations among multimodal features. Finally, a clustering algorithm is performed on FCH to group the original multimodal features into a set of partitions. Moreover, a multiclass boosting strategy is developed to obtain a strong classifier by combining the weak classifiers learned from each partition. The experimental results on seven popular datasets show the effectiveness of our approach.

...read moreread less

142 citations

Proceedings Article•DOI•

Violent video detection based on MoSIFT feature and sparse coding

[...]

Long Xu¹, Chen Gong¹, Jie Yang¹, Qiang Wu², Lixiu Yao¹ - Show less +1 more•Institutions (2)

Shanghai Jiao Tong University¹, University of Technology, Sydney²

04 May 2014

TL;DR: This paper employs Motion SIFT (MoSIFT) algorithm to extract the low-level description of a query video and adopts sparse coding scheme to further process the selected MoSIFTs to obtain the highly discriminative video feature.

...read moreread less

Abstract: To detect violence in a video, a common video description method is to apply local spatio-temporal description on the query video. Then, the low-level description is further summarized onto the high-level feature based on Bag-of-Words (BoW) model. However, traditional spatio-temporal descriptors are not discriminative enough. Moreover, BoW model roughly assigns each feature vector to only one visual word, therefore inevitably causing quantization error. To tackle the constrains, this paper employs Motion SIFT (MoSIFT) algorithm to extract the low-level description of a query video. To eliminate the feature noise, Kernel Density Estimation (KDE) is exploited for feature selection on the MoSIFT descriptor. In order to obtain the highly discriminative video feature, this paper adopts sparse coding scheme to further process the selected MoSIFTs. Encouraging experimental results are obtained based on two challenging datasets which record both crowded scenes and non-crowded scenes.

...read moreread less

100 citations

Book Chapter•DOI•

Interest Point Detector and Feature Descriptor Survey

[...]

Scott Krig

01 Jan 2014

TL;DR: The interest point is the keypoints in each image, and often provides the scale, rotational, and illumination invariance attributes for the descriptor; the descriptor adds more detail and more invariant attributes.

...read moreread less

Abstract: Many algorithms for computer vision rely on locating interest points, or keypoints in each image, and calculating a feature description from the pixel region surrounding the interest point. This is in contrast to methods such as correlation, where a larger rectangular pattern is stepped over the image at pixel intervals and the correlation is measured at each location. The interest point is the, and often provides the scale, rotational, and illumination invariance attributes for the descriptor; the descriptor adds more detail and more invariance attributes. Groups of interest points and descriptors together describe the actual objects.

...read moreread less

91 citations

Journal Article•DOI•

HSOG: A Novel Local Image Descriptor Based on Histograms of the Second-Order Gradients

[...]

Di Huang¹, Chao Zhu², Yunhong Wang¹, Liming Chen³•Institutions (3)

Beihang University¹, Peking University², École centrale de Lyon³

04 Sep 2014-IEEE Transactions on Image Processing

TL;DR: A novel and powerful local image descriptor that extracts the histograms of second-order gradients (HSOGs) to capture the curvature related geometric properties of the neural landscape, i.e., cliffs, ridges, summits, valleys, basins, and so on is introduced.

...read moreread less

Abstract: Recent investigations on human vision discover that the retinal image is a landscape or a geometric surface, consisting of features such as ridges and summits. However, most of existing popular local image descriptors in the literature, e.g., scale invariant feature transform (SIFT), histogram of oriented gradient (HOG), DAISY, local binary Patterns (LBP), and gradient location and orientation histogram, only employ the first-order gradient information related to the slope and the elasticity, i.e., length, area, and so on of a surface, and thereby partially characterize the geometric properties of a landscape. In this paper, we introduce a novel and powerful local image descriptor that extracts the histograms of second-order gradients (HSOGs) to capture the curvature related geometric properties of the neural landscape, i.e., cliffs, ridges, summits, valleys, basins, and so on. We conduct comprehensive experiments on three different applications, including the problem of local image matching, visual object categorization, and scene classification. The experimental results clearly evidence the discriminative power of HSOG as compared with its first-order gradient-based counterparts, e.g., SIFT, HOG, DAISY, and center-symmetric LBP, and the complementarity in terms of image representation, demonstrating the effectiveness of the proposed local descriptor.

...read moreread less

Journal Article•DOI•

Contact-free palm-vein recognition based on local invariant features.

[...]

Wenxiong Kang¹, Yang Liu¹, Qiuxia Wu¹, Xishun Yue¹•Institutions (1)

South China University of Technology¹

27 May 2014-PLOS ONE

TL;DR: A novel recognition approach for contact-free palm-vein recognition that performs feature extraction and matching on all vein textures distributed over the palm surface, including finger veins and palm veins, to minimize the loss of feature information is presented.

...read moreread less

Abstract: Contact-free palm-vein recognition is one of the most challenging and promising areas in hand biometrics. In view of the existing problems in contact-free palm-vein imaging, including projection transformation, uneven illumination and difficulty in extracting exact ROIs, this paper presents a novel recognition approach for contact-free palm-vein recognition that performs feature extraction and matching on all vein textures distributed over the palm surface, including finger veins and palm veins, to minimize the loss of feature information. First, a hierarchical enhancement algorithm, which combines a DOG filter and histogram equalization, is adopted to alleviate uneven illumination and to highlight vein textures. Second, RootSIFT, a more stable local invariant feature extraction method in comparison to SIFT, is adopted to overcome the projection transformation in contact-free mode. Subsequently, a novel hierarchical mismatching removal algorithm based on neighborhood searching and LBP histograms is adopted to improve the accuracy of feature matching. Finally, we rigorously evaluated the proposed approach using two different databases and obtained 0.996% and 3.112% Equal Error Rates (EERs), respectively, which demonstrate the effectiveness of the proposed approach.

...read moreread less

Journal Article•DOI•

Image Stitching based on Feature Extraction Techniques: A Survey

[...]

Ebtsam Adel, Mohammed Elmogy, Hazem M. El-Bakry

20 Aug 2014-International Journal of Computer Applications

TL;DR: A framework of a complete image stitching system based on feature based approaches will be introduced and the current challenges of image stitching will be discussed.

...read moreread less

Abstract: stitching (Mosaicing) is considered as an active research area in computer vision and computer graphics. Image stitching is concerned with combining two or more images of the same scene into one high resolution image which is called panoramic image. Image stitching techniques can be categorized into two general approaches: direct and feature based techniques. Direct techniques compare all the pixel intensities of the images with each other, whereas feature based techniques aim to determine a relationship between the images through distinct features extracted from the processed images. The last approach has the advantage of being more robust against scene movement, faster, and has the ability to automatically discover the overlapping relationships among an unordered set of images. The purpose of this paper is to present a survey about the feature based image stitching. The main components of image stitching will be described. A framework of a complete image stitching system based on feature based approaches will be introduced. Finally, the current challenges of image stitching will be discussed. Keywordsstitching/mosaicing, panoramic image, features based detection, SIFT, SURF, image blending.

...read moreread less

Posted Content•

Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors

[...]

Lingqiao Liu¹, Chunhua Shen¹, Lei Wang², Anton van den Hengel¹, Chao Wang² - Show less +1 more•Institutions (2)

University of Adelaide¹, University of Wollongong²

24 Nov 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as mentioned in this paper proposed a sparse coding based Fisher vector coding (SCFVC) method for high-dimensional local features, where each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace.

...read moreread less

Abstract: Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, % FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.

...read moreread less

Journal Article•DOI•

GA-SIFT: A new scale invariant feature transform for multispectral image using geometric algebra

[...]

Yanshan Li¹, Weiming Liu², Xiaotang Li³, Qinghua Huang², Xuelong Li⁴ - Show less +1 more•Institutions (4)

Shenzhen University¹, South China University of Technology², Harbin University of Commerce³, Chinese Academy of Sciences⁴

01 Oct 2014-Information Sciences

TL;DR: The comparison results show that the GA-SIFT outperforms some previously reported SIFT algorithms in the feature extraction from a multispectral image, and it is comparable with its counterparts in thefeature extraction of color images, indicating good performance in various applications of image analysis.

...read moreread less

Journal Article•DOI•

Thermal–visible registration of human silhouettes: A similarity measure performance evaluation

[...]

Guillaume-Alexandre Bilodeau, Atousa Torabi¹, Pierre-Luc St-Charles, Dorra Riahi•Institutions (1)

Université de Montréal¹

01 May 2014-Infrared Physics & Technology

TL;DR: This article investigates the accuracy of similarity measures for thermal–visible image registration of human silhouettes, including MI, Sum of Squared Differences (SSD), Normalized Cross-Correlation (NCC), Histograms of Oriented Gradients (HOG), Local Self-Similarity (LSS), Scale-Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), Census and Binary Robust Independent Elementary Feature (BRIEF).

...read moreread less

Journal Article•DOI•

Demisting the Hough Transform for 3D Shape Recognition and Registration

[...]

Oliver Woodford¹, Minh-Tri Pham¹, Atsuto Maki¹, Frank Perbet¹, Bjorn Stenger¹ - Show less +1 more•Institutions (1)

Toshiba¹

01 Feb 2014-International Journal of Computer Vision

TL;DR: Two new and powerful improvements to this popular inference method, intrinsic and minimum-entropy Hough, are developed, which solve the problem of exponential memory requirements of the standard Hough transform by exploiting the sparsity of the Hough space.

...read moreread less

Abstract: In applying the Hough transform to the problem of 3D shape recognition and registration, we develop two new and powerful improvements to this popular inference method. The first, intrinsic Hough, solves the problem of exponential memory requirements of the standard Hough transform by exploiting the sparsity of the Hough space. The second, minimum-entropy Hough, explains away incorrect votes, substantially reducing the number of modes in the posterior distribution of class and pose, and improving precision. Our experiments demonstrate that these contributions make the Hough transform not only tractable but also highly accurate for our example application. Both contributions can be applied to other tasks that already use the standard Hough transform.

...read moreread less

Journal Article•DOI•

L2-SIFT: SIFT feature extraction and matching for large images in large-scale aerial photogrammetry

[...]

Yanbiao Sun¹, Yanbiao Sun², Liang Zhao², Shoudong Huang², Lei Yan¹, Gamini Dissanayake² - Show less +2 more•Institutions (2)

Peking University¹, University of Technology, Sydney²

01 May 2014-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: A Block-SIFT method is designed to overcome the memory limitation of SIFT for extracting and matching features from large photogrammetric images, and it is demonstrated that more than 33 million features can be extracted and matched from the Taian dataset with 737 images within 21 h using the L 2 -SIFT algorithm.

...read moreread less

Abstract: The primary contribution of this paper is an efficient feature extraction and matching implementation for large images in large-scale aerial photogrammetry experiments. First, a Block-SIFT method is designed to overcome the memory limitation of SIFT for extracting and matching features from large photogrammetric images. For each pair of images, the original large image is split into blocks and the possible corresponding blocks in the other image are determined by pre-estimating the relative transformation between the two images. Because of the reduced memory requirement, features can be extracted and matched from the original images without down-sampling. Next, a red-black tree data structure is applied to create a feature relationship to reduce the search complexity when matching tie points. Meanwhile, tree key exchange and segment matching methods are proposed to match the tie points along-track and across-track. Finally, to evaluate the accuracy of the features extracted and matched from the proposed L 2 -SIFT algorithm, a bundle adjustment with parallax angle feature parametrization (ParallaxBA 1 ) is applied to obtain the Mean Square Error (MSE) of the feature reprojections, where the feature extraction and matching result is the only information used in the nonlinear optimisation system. Seven different experimental aerial photogrammetric datasets are used to demonstrate the efficiency and validity of the proposed algorithm. It is demonstrated that more than 33 million features can be extracted and matched from the Taian dataset with 737 images within 21 h using the L 2 -SIFT algorithm. In addition, the ParallaxBA involving more than 2.7 million features and 6 million image points can easily converge to an MSE of 0.03874. The C/C++ source code for the proposed algorithm is available at http://services.eng.uts.edu.au/sdhuang/research.htm .

...read moreread less

Journal Article•DOI•

An Object-Based Hierarchical Method for Change Detection Using Unmanned Aerial Vehicle Images

[...]

Rongjun Qin

25 Aug 2014-Remote Sensing

TL;DR: Experiments based on UAV images with five-centimeter ground resolution demonstrate the effectiveness of the proposed object-based hierarchical method, leading to the conclusion that this method is practically applicable for frequent monitoring.

...read moreread less

Abstract: There have been increasing demands for automatically monitoring urban areas in very high detail, and the Unmanned Aerial Vehicle (UAV) with auto-navigation (AUNA) system offers such capability. This study proposes an object-based hierarchical method to detect changes from UAV images taken at different times. It consists of several steps. In the first step, an octocopter with AUNA capability is used to acquire images at different dates. These images are registered automatically, based on SIFT (Scale-Invariant Feature Transform) feature points, via the general bundle adjustment framework. Thus, the Digital Surface Models (DSMs) and orthophotos can be generated for raster-based change analysis. In the next step, a multi-primitive segmentation method combining the spectral and geometric information is proposed for object-based analysis. In the final step, a multi-criteria decision analysis is carried out concerning the height, spectral and geometric coherence, and shape regularity for change determination. Experiments based on UAV images with five-centimeter ground resolution demonstrate the effectiveness of the proposed method, leading to the conclusion that this method is practically applicable for frequent monitoring.

...read moreread less

Journal Article•DOI•

A Comparative Evaluation of Well-known Feature Detectors and Descriptors

[...]

Sahin Isik¹•Institutions (1)

Eskişehir Osmangazi University¹

18 Dec 2014-International Journal of Applied Mathematics, Electronics and Computers

TL;DR: The combination of ORB with ORB and MSER with SIFT can be preferable almost in all possible situations when the precision and recall results are considered and the speed of FAST with BRIEF is superior to others.

...read moreread less

Abstract: Comparison of feature detectors and descriptors and assessing their performance is very important in computer vision. In this study, we evaluate the performance of seven combination of well-known detectors and descriptors which are SIFT with SIFT, SURF with SURF, MSER with SIFT, BRISK with FREAK, BRISK with BRISK, ORB with ORB and FAST with BRIEF. The popular Oxford dataset is used in test stage. To compare the performance of each combination objectively, the effects of JPEG compression, zoom and rotation, blur, viewpoint and illumination variation have investigated in terms of precision and recall values. Upon inspecting the obtained results, it is observed that the combination of ORB with ORB and MSER with SIFT can be preferable almost in all possible situations when the precision and recall results are considered. Moreover, the speed of FAST with BRIEF is superior to others.

...read moreread less

Journal Article•DOI•

Copy-move Image Forgery Detection Using an Efficient and Robust Method Combining Un-decimated Wavelet Transform and Scale Invariant Feature Transform

[...]

Mohammad Farukh Hashmi¹, Vijay Anand¹, Avinas G. Keskar¹•Institutions (1)

Visvesvaraya National Institute of Technology¹

01 Jan 2014-AASRI Procedia

TL;DR: A unique method for copy-move forgery detection which can sustained various pre-processing attacks using a combination of Dyadic Wavelet Transform (DyWT) and Scale Invariant Feature Transform (SIFT).

...read moreread less

Journal Article•DOI•

SIFT Hardware Implementation for Real-Time Image Feature Extraction

[...]

Jie Jiang¹, Xiaoyang Li¹, Guangjun Zhang¹•Institutions (1)

Beihang University¹

28 Jan 2014-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: An optimized descriptor generation algorithm is proposed with square subregions arranged in 16 directions, and the descriptors are generated by reordering the histogram instead of window rotation, which improves the parallelism of the algorithm, but also avoids floating data calculation to save hardware consumption.

...read moreread less

Abstract: This paper introduces a high-speed all-hardware scale-invariant feature transform (SIFT) architecture with parallel and pipeline technology for real-time extraction of image features. The task-level parallel and pipeline structure are exploited between the hardware blocks, and the data-level parallel and pipeline architecture are exploited inside each block. Two identical random access memories are adopted with ping-pong operation to execute the key point detection module and the descriptor generation module in task-level parallelism. With speeding up the key point detection module of SIFT, the descriptor generation module has become the bottleneck of the system's performance; therefore, this paper proposes an optimized descriptor generation algorithm. A novel window-dividing method is proposed with square subregions arranged in 16 directions, and the descriptors are generated by reordering the histogram instead of window rotation. Therefore, the main orientation detection block and descriptor generation block run in parallel instead of interactively. With the optimized algorithm cooperating with pipeline structure inside each block, we not only improve the parallelism of the algorithm, but also avoid floating data calculation to save hardware consumption. Thus, the descriptor generation module leads the speed almost 15 times faster than a recent solution. The proposed system was implemented on field programmable gate array and the overall time to extract SIFT features for an image having 512×512 pixels is only 6.55 ms (sufficient for real-time applications), and the number of feature points can reach up to 2900.

...read moreread less

Journal Article•DOI•

A Robust SIFT Descriptor for Multispectral Images

[...]

Sajid Saleem¹, Robert Sablatnig¹•Institutions (1)

Vienna University of Technology¹

31 Jan 2014-IEEE Signal Processing Letters

TL;DR: This letter presents a novel method that uses normalized gradients as local image features for the description of keypoints in order to achieve robustness against non linear intensity changes between multispectral images.

...read moreread less

Abstract: This letter presents a novel method for the description of multispectral image keypoints. The method proposed is based on a modified SIFT algorithm. It uses normalized gradients as local image features for the description of keypoints in order to achieve robustness against non linear intensity changes between multispectral images. The experimental results show that the method proposed achieves a better matching performance and outperforms the SIFT algorithm.

...read moreread less

Journal Article•DOI•

USB: ultrashort binary descriptor for fast visual matching and retrieval.

[...]

Shiliang Zhang¹, Qi Tian¹, Qingming Huang², Wen Gao³, Yong Rui⁴ - Show less +1 more•Institutions (4)

University of Texas at San Antonio¹, Chinese Academy of Sciences², Peking University³, Microsoft⁴

12 Jun 2014-IEEE Transactions on Image Processing

TL;DR: This paper studies an alternative to current local descriptors and BoWs model by extracting the ultrashort binary descriptor (USB) and a compact auxiliary spatial feature from each keypoint detected in images and tests the competitive accuracy, memory consumption, and significantly better efficiency of this approach.

...read moreread less

Abstract: Currently, many local descriptors have been proposed to tackle a basic issue in computer vision: duplicate visual content matching. These descriptors either are represented as high-dimensional vectors relatively expensive to extract and compare or are binary codes limited in robustness. Bag-of-visual words (BoWs) model compresses local features into a compact representation that allows for fast matching and scalable indexing. However, the codebook training, high-dimensional feature extraction, and quantization significantly degrade the flexibility and efficiency of BoWs model. In this paper, we study an alternative to current local descriptors and BoWs model by extracting the ultrashort binary descriptor (USB) and a compact auxiliary spatial feature from each keypoint detected in images. A typical USB is a 24-bit binary descriptor, hence it directly quantizes visual clues of image keypoints to about 16 million unique IDs. USB allows fast image matching and indexing and avoids the expensive codebook training and feature quantization in BoWs model. The spatial feature complementarily captures the spatial configuration in neighbor region of each keypoint, hence is used to filter mismatched USBs in a cascade verification. In image matching task, USB shows promising accuracy and nearly one-order faster speed than SIFT. We also test USB in retrieval tasks on UKbench, Oxford5K, and 1.2 million distractor images. Comparisons with recent retrieval methods manifest the competitive accuracy, memory consumption, and significantly better efficiency of our approach.

...read moreread less

Journal Article•DOI•

Accurate and robust localization of duplicated region in copy---move image forgery

[...]

Maryam Jaberi¹, George Bebis¹, Muhammad Hussain², Ghulam Muhammad²•Institutions (2)

University of Nevada, Reno¹, King Saud University²

01 Feb 2014

TL;DR: The results indicate that the proposed MIFT method can detect duplicated regions in copy–move image forgery with higher accuracy, especially when the size of the duplicated region is small.

...read moreread less

Abstract: Copy---move image forgery detection has recently become a very active research topic in blind image forensics. In copy---move image forgery, a region from some image location is copied and pasted to a different location of the same image. Typically, post-processing is applied to better hide the forgery. Using keypoint-based features, such as SIFT features, for detecting copy---move image forgeries has produced promising results. The main idea is detecting duplicated regions in an image by exploiting the similarity between keypoint-based features in these regions. In this paper, we have adopted keypoint-based features for copy---move image forgery detection; however, our emphasis is on accurate and robust localization of duplicated regions. In this context, we are interested in estimating the transformation (e.g., affine) between the copied and pasted regions more accurately as well as extracting these regions as robustly by reducing the number of false positives and negatives. To address these issues, we propose using a more powerful set of keypoint-based features, called MIFT, which shares the properties of SIFT features but also are invariant to mirror reflection transformations. Moreover, we propose refining the affine transformation using an iterative scheme which improves the estimation of the affine transformation parameters by incrementally finding additional keypoint matches. To reduce false positives and negatives when extracting the copied and pasted regions, we propose using "dense" MIFT features, instead of standard pixel correlation, along with hysteresis thresholding and morphological operations. The proposed approach has been evaluated and compared with competitive approaches through a comprehensive set of experiments using a large dataset of real images (i.e., CASIA v2.0). Our results indicate that our method can detect duplicated regions in copy---move image forgery with higher accuracy, especially when the size of the duplicated region is small.

...read moreread less

Proceedings Article•

Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors

[...]

Lingqiao Liu¹, Chunhua Shen¹, Lei Wang², Anton van den Hengel¹, Chao Wang² - Show less +1 more•Institutions (2)

University of Adelaide¹, University of Wollongong²

08 Dec 2014

TL;DR: A model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace, termed Sparse Coding based Fisher Vector Coding (SCFVC), which significantly outperforms the traditional GMM based Fisher vector encoding and achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.

...read moreread less

Abstract: Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.

...read moreread less

Journal Article•DOI•

A new feature extraction framework based on wavelets for breast cancer diagnosis

[...]

Semih Ergin¹, Onur Kilinc²•Institutions (2)

Eskişehir Osmangazi University¹, University of Pardubice²

01 Aug 2014-Computers in Biology and Medicine

TL;DR: A new feature extraction framework is proposed in order to determine and classify breast cancer cases and achieved a remarkable increase in recognition performance for the three-class study.

...read moreread less

Journal Article•DOI•

An Embedded System-on-Chip Architecture for Real-time Visual Detection and Matching

[...]

Jianhui Wang¹, Sheng Zhong¹, Luxin Yan¹, Zhiguo Cao¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Mar 2014-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A new FPGA-based embedded system architecture that consists of scale-invariant feature transform (SIFT) feature detection, as well as binary robust independent elementary features (BRIEF) feature description and matching, which achieves feature detection and matching at 60 frame/s for 720-p video.

...read moreread less

Abstract: Detecting and matching image features is a fundamental task in video analytics and computer vision systems. It establishes the correspondences between two images taken at different time instants or from different viewpoints. However, its large computational complexity has been a challenge to most embedded systems. This paper proposes a new FPGA-based embedded system architecture for feature detection and matching. It consists of scale-invariant feature transform (SIFT) feature detection, as well as binary robust independent elementary features (BRIEF) feature description and matching. It is able to establish accurate correspondences between consecutive frames for 720-p (1280x720) video. It optimizes the FPGA architecture for the SIFT feature detection to reduce the utilization of FPGA resources. Moreover, it implements the BRIEF feature description and matching on FPGA. Due to these contributions, the proposed system achieves feature detection and matching at 60 frame/s for 720-p video. Its processing speed can meet and even exceed the demand of most real-life real-time video analytics applications. Extensive experiments have demonstrated its efficiency and effectiveness.

...read moreread less

Journal Article•DOI•

Comparison of SIFT and SURF Methods for Use on Hand Gesture Recognition based on Depth Map

[...]

Peter Sykora¹, Patrik Kamencay¹, Robert Hudec¹•Institutions (1)

University of Žilina¹

01 Jan 2014-AASRI Procedia

TL;DR: A comparison between two popular feature extraction methods, Scale-invariant feature transform (or SIFT) and Speeded up robust features (or SURF) is presented.

...read moreread less

Journal Article•DOI•

3D fingerprint reconstruction system using feature correspondences and prior estimated finger model

[...]

Feng Liu¹, David Zhang¹•Institutions (1)

Hong Kong Polytechnic University¹

01 Jan 2014-Pattern Recognition

TL;DR: The paper for the first time analyzed and proposed the optimal finger shape model and results obtained from different fingerprint feature correspondences are analyzed and compared to show which features are more suitable for 3D fingerprint images generation.

...read moreread less

Collapse