scispace - formally typeset
Search or ask a question

Showing papers on "Scale-invariant feature transform published in 2017"


Posted Content
TL;DR: This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision and introduces Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection repeatability and performing cross-domain adaptation.
Abstract: This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. We introduce Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection repeatability and performing cross-domain adaptation (e.g., synthetic-to-real). Our model, when trained on the MS-COCO generic image dataset using Homographic Adaptation, is able to repeatedly detect a much richer set of interest points than the initial pre-adapted deep model and any other traditional corner detector. The final system gives rise to state-of-the-art homography estimation results on HPatches when compared to LIFT, SIFT and ORB.

641 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: The good generalization ability shown by experiments indicates that L2-Net can serve as a direct substitution of the existing handcrafted descriptors as well as a progressive sampling strategy which enables the network to access billions of training samples in a few epochs.
Abstract: The research focus of designing local patch descriptors has gradually shifted from handcrafted ones (e.g., SIFT) to learned ones. In this paper, we propose to learn high performance descriptor in Euclidean space via the Convolutional Neural Network (CNN). Our method is distinctive in four aspects: (i) We propose a progressive sampling strategy which enables the network to access billions of training samples in a few epochs. (ii) Derived from the basic concept of local patch matching problem, we empha-size the relative distance between descriptors. (iii) Extra supervision is imposed on the intermediate feature maps. (iv) Compactness of the descriptor is taken into account. The proposed network is named as L2-Net since the output descriptor can be matched in Euclidean space by L2 distance. L2-Net achieves state-of-the-art performance on the Brown datasets [16], Oxford dataset [18] and the newly proposed Hpatches dataset [11]. The good generalization ability shown by experiments indicates that L2-Net can serve as a direct substitution of the existing handcrafted descriptors. The pre-trained L2-Net is publicly available.

457 citations


Journal ArticleDOI
TL;DR: A fast image similarity measurement based on random verification is proposed to efficiently implement copy detection and the proposed method achieves higher accuracy than the state-of-the-art methods, and has comparable efficiency to the baseline method based on the BOW quantization.
Abstract: To detect illegal copies of copyrighted images, recent copy detection methods mostly rely on the bag-of-visual-words (BOW) model, in which local features are quantized into visual words for image matching. However, both the limited discriminability of local features and the BOW quantization errors will lead to many false local matches, which make it hard to distinguish similar images from copies. Geometric consistency verification is a popular technology for reducing the false matches, but it neglects global context information of local features and thus cannot solve this problem well. To address this problem, this paper proposes a global context verification scheme to filter false matches for copy detection. More specifically, after obtaining initial scale invariant feature transform (SIFT) matches between images based on the BOW quantization, the overlapping region-based global context descriptor (OR-GCD) is proposed for the verification of these matches to filter false matches. The OR-GCD not only encodes relatively rich global context information of SIFT features but also has good robustness and efficiency. Thus, it allows an effective and efficient verification. Furthermore, a fast image similarity measurement based on random verification is proposed to efficiently implement copy detection. In addition, we also extend the proposed method for partial-duplicate image detection. Extensive experiments demonstrate that our method achieves higher accuracy than the state-of-the-art methods, and has comparable efficiency to the baseline method based on the BOW quantization.

332 citations


Posted Content
TL;DR: This paper compares the performance of three different image matching techniques, i.e., SIFT, SURF, and ORB, against different kinds of transformations and deformations such as scaling, rotation, noise, fish eye distortion, and shearing and shows that which algorithm is the best more robust against each kind of distortion.
Abstract: Fast and robust image matching is a very important task with various applications in computer vision and robotics. In this paper, we compare the performance of three different image matching techniques, i.e., SIFT, SURF, and ORB, against different kinds of transformations and deformations such as scaling, rotation, noise, fish eye distortion, and shearing. For this purpose, we manually apply different types of transformations on original images and compute the matching evaluation parameters such as the number of key points in images, the matching rate, and the execution time required for each algorithm and we will show that which algorithm is the best more robust against each kind of distortion. Index Terms-Image matching, scale invariant feature transform (SIFT), speed up robust feature (SURF), robust independent elementary features (BRIEF), oriented FAST, rotated BRIEF (ORB).

261 citations


Journal ArticleDOI
Wenping Ma1, Wen Zelian1, Yue Wu1, Licheng Jiao1, Maoguo Gong1, Yafei Zheng1, Liang Liu1 
TL;DR: A new gradient definition is introduced to overcome the difference of image intensity between the remote image pairs and an enhanced feature matching method by combining the position, scale, and orientation of each keypoint is introduction to increase the number of correct correspondences.
Abstract: The scale-invariant feature transform algorithm and its many variants are widely used in feature-based remote sensing image registration. However, it may be difficult to find enough correct correspondences for remote image pairs in some cases that exhibit a significant difference in intensity mapping. In this letter, a new gradient definition is introduced to overcome the difference of image intensity between the remote image pairs. Then, an enhanced feature matching method by combining the position, scale, and orientation of each keypoint is introduced to increase the number of correct correspondences. The proposed algorithm is tested on multispectral and multisensor remote sensing images. The experimental results show that the proposed method improves the matching performance compared with several state-of-the-art methods in terms of the number of correct correspondences and aligning accuracy.

243 citations


Journal ArticleDOI
TL;DR: The proposed approach is extensively evaluated on three challenging benchmark scene datasets and the experimental results show that the proposed approach leads to superior classification performance compared with the state-of-the-art classification methods.
Abstract: In this paper, a fused global saliency-based multiscale multiresolution multistructure local binary pattern (salM 3LBP) feature and local codebookless model (CLM) feature is proposed for high-resolution image scene classification. First, two different but complementary types of descriptors (pixel intensities and differences) are developed to extract global features, characterizing the dominant spatial features in multiple scale, multiple resolution, and multiple structure manner. The micro/macrostructure information and rotation invariance are guaranteed in the global feature extraction process. For dense local feature extraction, CLM is utilized to model local enrichment scale invariant feature transform descriptor and dimension reduction is conducted via joint low-rank learning with support vector machine. Finally, a fused feature representation between salM3LBP and CLM as the scene descriptor to train a kernel-based extreme learning machine for scene classification is presented. The proposed approach is extensively evaluated on three challenging benchmark scene datasets (the 21-class land-use scene, 19-class satellite scene, and a newly available 30-class aerial scene), and the experimental results show that the proposed approach leads to superior classification performance compared with the state-of-the-art classification methods.

156 citations


Journal ArticleDOI
TL;DR: Three improved false matches elimination algorithms based on distance from epipole to epipolar line is proposed and this algorithm can also solve the real-time calibration of the shrink-amplify center for zooming images.
Abstract: Feature matching is one of the most important steps in the location technology of zooming images. According to the scale-invariant feature transform matching algorithm, several improved false matches elimination algorithms are proposed and compared in this article. First, features of zooming images and ranging models are introduced in detail in the theory framework of the scale-invariant feature transform feature detection and matching algorithm. The key role of the feature matching algorithm and false matches elimination in the ranging technology of zooming images is discussed and addressed. Second, false matches are eliminated by the proposed approach based on geometry constraint in zooming images with a higher accuracy. Third, false matches are removed by an elimination algorithm based on properties of the scale-invariant feature transform features. Finally, an iterative false matches elimination algorithm based on distance from epipole to epipolar line is proposed and this algorithm can also solve the...

139 citations


Journal ArticleDOI
TL;DR: A novel copy-move forgery detection method based on hybrid features that can precisely detect duplicated regions even after distortions such as rotation, scaling, JPEG compression and adding noise is proposed.

124 citations


Journal ArticleDOI
TL;DR: A multi-viewpoint remote sensing image registration method which contains a geometric constraint term introduced into the L2E-based energy function for better behaving the non-rigid transformation and compared with five state-of-the-art methods.
Abstract: Remote sensing image registration plays an important role in military and civilian fields, such as natural disaster damage assessment, military damage assessment and ground targets identification, etc. However, due to the ground relief variations and imaging viewpoint changes, non-rigid geometric distortion occurs between remote sensing images with different viewpoint, which further increases the difficulty of remote sensing image registration. To address the problem, we propose a multi-viewpoint remote sensing image registration method which contains the following contributions. (i) A multiple features based finite mixture model is constructed for dealing with different types of image features. (ii) Three features are combined and substituted into the mixture model to form a feature complementation, i.e., the Euclidean distance and shape context are used to measure the similarity of geometric structure, and the SIFT (scale-invariant feature transform) distance which is endowed with the intensity information is used to measure the scale space extrema. (iii) To prevent the ill-posed problem, a geometric constraint term is introduced into the L2E-based energy function for better behaving the non-rigid transformation. We evaluated the performances of the proposed method by three series of remote sensing images obtained from the unmanned aerial vehicle (UAV) and Google Earth, and compared with five state-of-the-art methods where our method shows the best alignments in most cases.

110 citations


Journal ArticleDOI
TL;DR: In this paper, two new methods for automated counting of fruit in images of mango tree canopies, one using texture-based dense segmentation and one using shape-based fruit detection, were proposed.
Abstract: Machine vision technologies hold the promise of enabling rapid and accurate fruit crop yield predictions in the field. The key to fulfilling this promise is accurate segmentation and detection of fruit in images of tree canopies. This paper proposes two new methods for automated counting of fruit in images of mango tree canopies, one using texture-based dense segmentation and one using shape-based fruit detection, and compares the use of these methods relative to existing techniques:—(i) a method based on K-nearest neighbour pixel classification and contour segmentation, and (ii) a method based on super-pixel over-segmentation and classification using support vector machines. The robustness of each algorithm was tested on multiple sets of images of mango trees acquired over a period of 3 years. These image sets were acquired under varying conditions (light and exposure), distance to the tree, average number of fruit on the tree, orchard and season. For images collected under the same conditions as the calibration images, estimated fruit numbers were within 16 % of actual fruit numbers, and the F1 measure of detection performance was above 0.68 for these methods. Results were poorer when models were used for estimating fruit numbers in trees of different canopy shape and when different imaging conditions were used. For fruit-background segmentation, K-nearest neighbour pixel classification based on colour and smoothness or pixel classification based on super-pixel over-segmentation, clustering of dense scale invariant feature transform features into visual words and bag-of-visual-word super-pixel classification using support vector machines was more effective than simple contrast and colour based segmentation. Pixel classification was best followed by fruit detection using an elliptical shape model or blob detection using colour filtering and morphological image processing techniques. Method results were also compared using precision–recall plots. Imaging at night under artificial illumination with careful attention to maintaining constant illumination conditions is highly recommended.

93 citations


Book ChapterDOI
07 Aug 2017
TL;DR: This method does not modify the content of the image itself, therefore, can effectively resist steganalysis tools and is compared with state-of-art coverless steganography method which also based on image hash, and experimental results show that it has higher capacity, robustness and security than the method proposed in [15].
Abstract: Traditional image steganography modifies the content of the image more or less, it is hard to resist the detection of image steganalysis tools. New kind of steganography methods, coverless steganography methods, attract research attention recently due to its virtue of do not modify the content of the stego image at all. In this paper, we propose a new coverless steganography method based on robust image hashing. Firstly, we design an effective and stable image hash by using the orientation information of the SIFT feature points. Then the local image database is created and the corresponding hash values of these images in the database are computed. Secondly, the secret message is divided into segments with the same length as the hash sequences. And a series of images are chosen from the image database by matching the secret information segments and the hash sequences of all the images. Finally, these images are transmitted as the carriers of the secret information. When the receiver receives these images, the secret information is extracted by using the shared hash method. Due to the characteristics that SIFT features can resist common image attacks in a certain extent, the secret information corresponding to the hash has strong robustness. To improve the retrieval and matching efficiency of the hashing system, an inverted index of quadtree structure is designed. Compared with the traditional image steganography, this method does not modify the content of the image itself, therefore, can effectively resist steganalysis tools. Furthermore, we compare the proposed method with state-of-art coverless steganography method which also based on image hash, and experimental results show that our method has higher capacity, robustness and security than the method proposed in [15].

Journal ArticleDOI
TL;DR: A novel airport detection and aircraft recognition method that is based on the two-layer visual saliency analysis model and support vector machines is proposed for high-resolution broad-area remote-sensing images and produces more robust results in complex scenes.
Abstract: Efficient airport detection and aircraft recognition are essential due to the strategic importance of these regions and targets in economic and military construction. In this paper, a novel airport detection and aircraft recognition method that is based on the two-layer visual saliency analysis model and support vector machines (SVMs) is proposed for high-resolution broad-area remote-sensing images. In the first layer saliency (FLS) model, we introduce a spatial-frequency visual saliency analysis algorithm that is based on a CIE Lab color space to reduce the interference of backgrounds and efficiently detect well-defined airport regions in broad-area remote-sensing images. In the second layer saliency model, we propose a saliency analysis strategy that is based on an edge feature preserving wavelet transform and high-frequency wavelet coefficient reconstruction to complete the pre-extraction of aircraft candidates from airport regions that are detected by the FLS and crudely extract as many aircraft candidates as possible for additional classification in detected airport regions. Then, we utilize feature descriptors that are based on a dense SIFT and Hu moment to accurately describe these features of the aircraft candidates. Finally, these object features are inputted to the SVM, and the aircraft are recognized. The experimental results indicate that the proposed method not only reliably and effectively detects targets in high-resolution broad-area remote-sensing images but also produces more robust results in complex scenes.

Journal ArticleDOI
Zhe Wang1, Limin Wang2, Yali Wang1, Bowen Zhang1, Yu Qiao1 
TL;DR: A hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition is proposed, which achieves an excellent performance on two standard benchmarks.
Abstract: Traditional feature encoding scheme (e.g., Fisher vector) with local descriptors (e.g., SIFT) and recent convolutional neural networks (CNNs) are two classes of successful methods for image recognition. In this paper, we propose a hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition. To this end, we make three main contributions from the following aspects. First, we propose a patch-level and end-to-end architecture to model the appearance of local patches, called PatchNet . PatchNet is essentially a customized network trained in a weakly supervised manner, which uses the image-level supervision to guide the patch-level feature extraction. Second, we present a hybrid visual representation, called VSAD , by utilizing the robust feature representations of PatchNet to describe local patches and exploiting the semantic probabilities of PatchNet to aggregate these local patches into a global representation. Third, based on the proposed VSAD representation, we propose a new state-of-the-art scene recognition approach, which achieves an excellent performance on two standard benchmarks: MIT Indoor67 (86.2%) and SUN397 (73.0%).

Journal ArticleDOI
TL;DR: The method extends the scale invariant feature transform (SIFT) to arbitrary dimensions by making key modifications to orientation assignment and gradient histograms and rotation invariance is proven mathematically.
Abstract: We present a method for image registration based on 3D scale- and rotation-invariant keypoints. The method extends the scale invariant feature transform (SIFT) to arbitrary dimensions by making key modifications to orientation assignment and gradient histograms. Rotation invariance is proven mathematically. Additional modifications are made to extrema detection and keypoint matching based on the demands of image registration. Our experiments suggest that the choice of neighborhood in discrete extrema detection has a strong impact on image registration accuracy. In head MR images, the brain is registered to a labeled atlas with an average Dice coefficient of 92%, outperforming registration from mutual information as well as an existing 3D SIFT implementation. In abdominal CT images, the spine is registered with an average error of 4.82 mm. Furthermore, keypoints are matched with high precision in simulated head MR images exhibiting lesions from multiple sclerosis. These results were achieved using only affine transforms, and with no change in parameters across a wide variety of medical images. This paper is freely available as a cross-platform software library.

Journal ArticleDOI
TL;DR: It is shown, for the first time to the authors' knowledge, that the space of Gaussians can be equipped with a Lie group structure by defining a multiplication operation on this manifold, and that it is isomorphic to a subgroup of the upper triangular matrix group.
Abstract: This paper presents a novel image descriptor to effectively characterize the local, high-order image statistics. Our work is inspired by the Diffusion Tensor Imaging and the structure tensor method (or covariance descriptor), and motivated by popular distribution-based descriptors such as SIFT and HoG. Our idea is to associate one pixel with a multivariate Gaussian distribution estimated in the neighborhood. The challenge lies in that the space of Gaussians is not a linear space but a Riemannian manifold. We show, for the first time to our knowledge, that the space of Gaussians can be equipped with a Lie group structure by defining a multiplication operation on this manifold, and that it is isomorphic to a subgroup of the upper triangular matrix group. Furthermore, we propose methods to embed this matrix group in the linear space, which enables us to handle Gaussians with Euclidean operations rather than complicated Riemannian operations. The resulting descriptor, called Local Log-Euclidean Multivariate Gaussian (L $^2$ EMG) descriptor, works well with low-dimensional and high-dimensional raw features. Moreover, our descriptor is a continuous function of features without quantization, which can model the first- and second-order statistics. Extensive experiments were conducted to evaluate thoroughly L $^2$ EMG, and the results showed that L $^2$ EMG is very competitive with state-of-the-art descriptors in image classification.

Journal ArticleDOI
TL;DR: Simulation results confirm the superiority of the proposed methods in comparison with classic ones in terms of True Positive rate, mismatches ratio, total number of matching, and two newly proposed evaluation criteria.

Journal ArticleDOI
TL;DR: The most successful approach in the CBIR framework is to use LLC for Coil20 data set and FBSR for Corel1000 data set, and three methods recently proposed in literature (Online Dictionary Learning, Locality-constrained Linear Coding and Feature-based Sparse Representation) are tested and compared with the framework results.

Journal ArticleDOI
01 Feb 2017-Optik
TL;DR: A new image descriptor using SIFT and LDP is introduced that is able to find similarities and matches between images and produces highly discriminative features for describing image content.

Journal ArticleDOI
TL;DR: A recently proposed method for Hough parameter space regularization is used for parameter quantization, and each line is estimated with an orthogonal least squares fit among the candidate points returned from the Hough transform.
Abstract: The Hough transform is a voting scheme for locating geometric objects in point clouds. This paper describes its application for detecting lines in three dimensional point clouds. For parameter quantization, a recently proposed method for Hough parameter space regularization is used. The voting process is done in an iterative way by selecting the line with the most votes and removing the corresponding points in each step. To overcome the inherent inaccuracies of the parameter space discretization, each line is estimated with an orthogonal least squares fit among the candidate points returned from the Hough transform.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a registration scheme, including an Accelerated Binary Robust Invariant Scalable Keypoints (ABRISK) algorithm and spatial analysis of corresponding control points for image registration.
Abstract: Using an Unmanned Aerial Vehicle (UAV) drone with an attached non-metric camera has become a popular low-cost approach for collecting geospatial data. A well-georeferenced orthoimage is a fundamental product for geomatics professionals. To achieve high positioning accuracy of orthoimages, precise sensor position and orientation data, or a number of ground control points (GCPs), are often required. Alternatively, image registration is a solution for improving the accuracy of a UAV orthoimage, as long as a historical reference image is available. This study proposes a registration scheme, including an Accelerated Binary Robust Invariant Scalable Keypoints (ABRISK) algorithm and spatial analysis of corresponding control points for image registration. To determine a match between two input images, feature descriptors from one image are compared with those from another image. A “Sorting Ring” is used to filter out uncorrected feature pairs as early as possible in the stage of matching feature points, to speed up the matching process. The results demonstrate that the proposed ABRISK approach outperforms the vector-based Scale Invariant Feature Transform (SIFT) approach where radiometric variations exist. ABRISK is 19.2 times and 312 times faster than SIFT for image sizes of 1000 × 1000 pixels and 4000 × 4000 pixels, respectively. ABRISK is 4.7 times faster than Binary Robust Invariant Scalable Keypoints (BRISK). Furthermore, the positional accuracy of the UAV orthoimage after applying the proposed image registration scheme is improved by an average of root mean square error (RMSE) of 2.58 m for six test orthoimages whose spatial resolutions vary from 6.7 cm to 10.7 cm.

Journal ArticleDOI
TL;DR: The proposed matched pair grouping algorithm can significantly improve the detection performance in smooth tampered regions, and considerably reduce the clustering time in the case of a mass of matched pairs, compared with the state-of-the-art methods.
Abstract: In looking to improve the detection performance of the keypoint-based method involving smooth tampered regions, there are three problems to be addressed, namely the nonuniform distribution of the keypoints, the discriminative power of low contrast keypoints, and the high computational cost of clustering. In this study, the classical implementation framework of the keypoint-based method is improved by introducing new techniques and algorithms in order to overcome these problems. First, to acquire uniformly distributed keypoints in the test image, we propose a new solution of selecting the keypoints by region instead of contrast. To this end, we first separate the keypoint detection and selection processes. After obtaining all discernible keypoints, we adapt the non-maximum value suppression algorithm to select keypoints by combining the contrast and density of each keypoint. Second, we apply the opponent scale-invariant feature transform descriptor to enhance the discriminative power of keypoints by adding color information. Finally, to alleviate the computational cost of clustering, we optimize the J-Linkage algorithm by altering the method of computing initial clusters and affine transformation hypotheses. For this purpose, we propose the matched pair grouping algorithm that can obtain a smaller number of initial clusters by utilizing the correspondence between the superpixels in the original and duplicated regions. Experiments performed on three representative datasets confirm that the proposed method can significantly improve the detection performance in smooth tampered regions, and considerably reduce the clustering time in the case of a mass of matched pairs, compared with the state-of-the-art methods.

Journal ArticleDOI
TL;DR: A robust feature matching method based on support-line voting and affine-invariant ratios and more robust to distortions caused by elevation differences than the global affine transformation, especially for high-resolution remote sensing images and UAV images is proposed.
Abstract: Robust image matching is crucial for many applications of remote sensing and photogrammetry, such as image fusion, image registration, and change detection. In this paper, we propose a robust feature matching method based on support-line voting and affine-invariant ratios. We first use popular feature matching algorithms, such as SIFT, to obtain a set of initial matches. A support-line descriptor based on multiple adaptive binning gradient histograms is subsequently applied in the support-line voting stage to filter outliers. In addition, we use affine-invariant ratios computed by a two-line structure to refine the matching results and estimate the local affine transformation. The local affine model is more robust to distortions caused by elevation differences than the global affine transformation, especially for high-resolution remote sensing images and UAV images. Thus, the proposed method is suitable for both rigid and non-rigid image matching problems. Finally, we extract as many high-precision correspondences as possible based on the local affine extension and build a grid-wise affine model for remote sensing image registration. We compare the proposed method with six state-of-the-art algorithms on several data sets and show that our method significantly outperforms the other methods. The proposed method achieves 94.46% average precision on 15 challenging remote sensing image pairs, while the second-best method, RANSAC, only achieves 70.3%. In addition, the number of detected correct matches of the proposed method is approximately four times the number of initial SIFT matches.

Journal ArticleDOI
TL;DR: This is the first attempt, to the best of the authors knowledge, to recognize both coins and paper banknotes on a smartphone using SIFT algorithm, which has been developed to be the most robust and efficient local invariant feature descriptor.

Journal ArticleDOI
TL;DR: An image stitching algorithm based on histogram matching and scale-invariant feature transform (SIFT) algorithm is brought out to solve the problem of image stitching among images that have significant illumination changes and results show that the algorithm is effective.
Abstract: Image stitching among images that have significant illumination changes will lead to unnatural mosaic image. An image stitching algorithm based on histogram matching and scale-invariant feature transform (SIFT) algorithm is brought out to solve the problem in this paper. First, histogram matching is used for image adjustment, so that the images to be stitched are at the same level of illumination, then the paper adopts SIFT algorithm to extract the key points of the images and performs the rough matching process, followed by RANSAC algorithm for fine matches, and finally calculates the appropriate mathematical mapping model between two images and according to the mapping relationship, a simple weighted average algorithm is used for image blending. The experimental results show that the algorithm is effective.

Journal ArticleDOI
20 Mar 2017-Sensors
TL;DR: This research proposes a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN, and proves the superiority of the proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.
Abstract: Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images.

Journal ArticleDOI
TL;DR: A classification method for images of grapevine buds detection in natural field conditions using well-known computer vision technologies: Scale-Invariant Feature Transform for calculating low-level features, Bag of Features for building an image descriptor, and Support Vector Machines for training a classifier.

Journal ArticleDOI
TL;DR: It is found that feature tracking using nonlinear scale-spaces is preferable due to its high efficiency against noise with respect to image features compared with other existing feature tracking alternatives that make use of Gaussian or linear scale spaces.
Abstract: In this paper, we propose a feature-tracking algorithm for sea ice drift retrieval from a pair of sequential satellite synthetic aperture radar (SAR) images. The method is based on feature tracking comprising feature detection, description, and matching steps. The approach exploits the benefits of nonlinear multiscale image representations using accelerated-KAZE (A-KAZE) features, a method that detects and describes image features in an anisotropic scale space. We evaluated several state-of-the-art feature-based algorithms, including A-KAZE, Scale Invariant Feature Transform (SIFT), and a very fast feature extractor that computes binary descriptors known as Oriented FAST and Rotated BRIEF (ORB) on dual polarized Sentinel-1A C-SAR extra wide swath mode data over the Arctic. The A-KAZE approach outperforms both ORB and SIFT up to an order of magnitude in ice drift. The experimental results showed high relevance of the proposed algorithm for retrieval of ice drift at subkilometre resolution from a pair of SAR images with 100-m pixel size. From this paper, we found that feature tracking using nonlinear scale-spaces is preferable due to its high efficiency against noise with respect to image features compared with other existing feature tracking alternatives that make use of Gaussian or linear scale spaces.

Journal ArticleDOI
TL;DR: A novel and efficient method for dorsal hand vein recognition by improving two major steps of the SIFT-like framework, i.e. key-point detection and matching and fine-grained matching is introduced which makes use of Multi-task Sparse Representation Classifier.

Journal ArticleDOI
TL;DR: These experiments show that the proposed scheme provides a short hash length that is robust to most common image content-preserving manipulations like large angle rotations, and allows us to correctly locating forged image regions as well as detecting types of forgery image.
Abstract: A novel robust image hashing scheme based on quaternion Zernike moments (QZMs) and the scale invariant feature transform (SIFT) is proposed for image authentication. The proposed method can locate tampered region and detect the nature of the modification, including object insertion, removal, replacement, copy-move and cut-to-paste operations. QZMs considered as global features are used for image authentication while SIFT key-point features provide image forgery localization and classification. Proposed approach performance were evaluated on the color images database of UCID and compared with several recent and efficient methods. These experiments show that the proposed scheme provides a short hash length that is robust to most common image content-preserving manipulations like large angle rotations, and allows us to correctly locating forged image regions as well as detecting types of forgery image.

Journal ArticleDOI
TL;DR: This work presents texture operators encoding class-specific local organizations of image directions (LOIDs) in a rotation-invariant fashion, and experimentally demonstrates the effectiveness of the proposed operators for classifying natural textures.
Abstract: We present texture operators encoding class-specific local organizations of image directions (LOIDs) in a rotation-invariant fashion. The LOIDs are key for visual understanding, and are at the origin of the success of the popular approaches, such as local binary patterns (LBPs) and the scale-invariant feature transform (SIFT). Whereas, LBPs and SIFT yield hand-crafted image representations, we propose to learn data-specific representations of the LOIDs in a rotation-invariant fashion. The image operators are based on steerable circular harmonic wavelets (CHWs), offering a rich and yet compact initial representation for characterizing natural textures. The joint location and orientation required to encode the LOIDs is preserved by using moving frames (MFs) texture representations built from locally-steered image gradients that are invariant to rigid motions. In a second step, we use support vector machines to learn a multi-class shaping matrix for the initial CHW representation, yielding data-driven MFs called steerable wavelet machines (SWMs). The SWM forward function is composed of linear operations ( i.e., convolution and weighted combinations) interleaved with non-linear steermax operations. We experimentally demonstrate the effectiveness of the proposed operators for classifying natural textures. Our scheme outperforms recent approaches on several test suites of the Outex and the CUReT databases.