scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

On Effectiveness of Histogram of Oriented Gradient Features for Visible to Near Infrared Face Matching

TL;DR: The results demonstrate that DSIFT with subspace LDA outperforms a commercial matcher and other HOG variants by at least 15% and that histogram of oriented gradient features are able to encode similar facial features across spectrums.
Abstract: The advent of near infrared imagery and it's applications in face recognition has instigated research in cross spectral (visible to near infrared) matching. Existing research has focused on extracting textural features including variants of histogram of oriented gradients. This paper focuses on studying the effectiveness of these features for cross spectral face recognition. On NIR-VIS-2.0 cross spectral face database, three HOG variants are analyzed along with dimensionality reduction approaches and linear discriminant analysis. The results demonstrate that DSIFT with subspace LDA outperforms a commercial matcher and other HOG variants by at least 15%. We also observe that histogram of oriented gradient features are able to encode similar facial features across spectrums.
Citations
More filters
Journal ArticleDOI
TL;DR: Wasserstein convolutional neural network (WCNN) as discussed by the authors was proposed to learn invariant features between near-infrared (NIR) and visual (VIS) face images, and the Wasserstein distance was introduced into the NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions.
Abstract: Heterogeneous face recognition (HFR) aims at matching facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR presents more challenging issues than traditional face recognition because of the large intra-class variation among heterogeneous face images and the limited availability of training samples of cross-modality face image pairs. This paper proposes the novel Wasserstein convolutional neural network (WCNN) approach for learning invariant features between near-infrared (NIR) and visual (VIS) face images (i.e., NIR-VIS face recognition). The low-level layers of the WCNN are trained with widely available face images in the VIS spectrum, and the high-level layer is divided into three parts: the NIR layer, the VIS layer and the NIR-VIS shared layer. The first two layers aim at learning modality-specific features, and the NIR-VIS shared layer is designed to learn a modality-invariant feature subspace. The Wasserstein distance is introduced into the NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. W-CNN learning is performed to minimize the Wasserstein distance between the NIR distribution and the VIS distribution for invariant deep feature representations of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected WCNN layers to reduce the size of the parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at the training stage and an efficient computation for heterogeneous data at the testing stage. Extensive experiments using three challenging NIR-VIS face recognition databases demonstrate the superiority of the WCNN method over state-of-the-art methods.

231 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This paper develops a method to reconstruct VIS images in the NIR domain and vice-versa using a cross-spectral joint ℓ0 minimization based dictionary learning approach to learn a mapping function between the two domains.
Abstract: A lot of real-world data is spread across multiple domains. Handling such data has been a challenging task. Heterogeneous face biometrics has begun to receive attention in recent years. In real-world scenarios, many surveillance cameras capture data in the NIR (near infrared) spectrum. However, most datasets accessible to law enforcement have been collected in the VIS (visible light) domain. Thus, there exists a need to match NIR to VIS face images. In this paper, we approach the problem by developing a method to reconstruct VIS images in the NIR domain and vice-versa. This approach is more applicable to real-world scenarios since it does not involve having to project millions of VIS database images into learned common subspace for subsequent matching. We present a cross-spectral joint l 0 minimization based dictionary learning approach to learn a mapping function between the two domains. One can then use the function to reconstruct facial images between the domains. Our method is open set and can reconstruct any face not present in the training data. We present results on the CASIA NIR-VIS v2.0 database and report state-of-the-art results.

150 citations


Cites background or result from "On Effectiveness of Histogram of Or..."

  • ...[4] study how the histogram of oriented gradients (HOG) feature and its variants can help crossspectral face recognition tasks....

    [...]

  • ...We are able to significantly outperform the baseline [24] as well as some good results reported in [4] (73....

    [...]

Journal ArticleDOI
TL;DR: Experimental results on six widely used face datasets including the LFW, YouTube Face, FERET, PaSC, CASIA VIS-NIR 2.0, and Multi-PIE datasets are presented to demonstrate the effectiveness of the proposed methods.
Abstract: In this paper, we propose a simultaneous local binary feature learning and encoding (SLBFLE) approach for both homogeneous and heterogeneous face recognition. Unlike existing hand-crafted face descriptors such as local binary pattern (LBP) and Gabor features which usually require strong prior knowledge, our SLBFLE is an unsupervised feature learning approach which automatically learns face representation from raw pixels. Unlike existing binary face descriptors such as the LBP, discriminant face descriptor (DFD), and compact binary face descriptor (CBFD) which use a two-stage feature extraction procedure, our SLBFLE jointly learns binary codes and the codebook for local face patches so that discriminative information from raw pixels from face images of different identities can be obtained by using a one-stage feature learning and encoding procedure. Moreover, we propose a coupled simultaneous local binary feature learning and encoding (C-SLBFLE) method to make the proposed approach suitable for heterogenous face matching. Unlike most existing coupled feature learning methods which learn a pair of transformation matrices for each modality, we exploit both the common and specific information from heterogeneous face samples to characterize their underlying correlations. Experimental results on six widely used face datasets including the LFW, YouTube Face (YTF), FERET, PaSC, CASIA VIS-NIR 2.0, and Multi-PIE datasets are presented to demonstrate the effectiveness of the proposed methods.

146 citations

Proceedings Article
13 Feb 2017
TL;DR: A deep convolutional network approach that uses only one network to map both NIR and VIS images to a compact Euclidean space and achieves 94% verification rate at FAR=0.1% on the challenging CASIA NIR-VIS 2.0 face recognition dataset.
Abstract: Visual versus near infrared (VIS-NIR) face recognition is still a challenging heterogeneous task due to large appearance difference between VIS and NIR modalities. This paper presents a deep convolutional network approach that uses only one network to map both NIR and VIS images to a compact Euclidean space. The low-level layers of this network are trained only on large-scale VIS data. Each convolutional layer is implemented by the simplest case of maxout operator. The high-level layer is divided into two orthogonal subspaces that contain modality-invariant identity information and modality-variant spectrum information respectively. Our joint formulation leads to an alternating minimization approach for deep representation at the training time and an efficient computation for heterogeneous data at the testing time. Experimental evaluations show that our method achieves 94% verification rate at FAR=0.1% on the challenging CASIA NIR-VIS 2.0 face recognition dataset. Compared with state-of-the-art methods, it reduces the error rate by 58% only with a compact 64-D representation.

143 citations


Cites background from "On Effectiveness of Histogram of Or..."

  • ...…2013; Jin, Lu, and Ruan 2015), coupled discriminant face descriptor (C-DFD) (Lei, Pietikainen, and Li 2014; Jin, Lu, and Ruan 2015), DSIFT+PCA+LDA (Dhamecha et al. 2014), coupled discriminant feature learning (CDFL) (Jin, Lu, and Ruan 2015), Gabor+RBM+Remove 11PCs (Yi et al. 2015), VIS+NIR…...

    [...]

  • ...To verify the performance of IDR, we compare our method with state-of-the-art NIR-VIS recognition methods, including PCA+Sym+HCA (Li et al. 2013), learning coupled feature spaces (LCFS) method (Wang et al. 2013; Jin, Lu, and Ruan 2015), coupled discriminant face descriptor (C-DFD) (Lei, Pietikainen, and Li 2014; Jin, Lu, and Ruan 2015), DSIFT+PCA+LDA (Dhamecha et al. 2014), coupled discriminant feature learning (CDFL) (Jin, Lu, and Ruan 2015), Gabor+RBM+Remove 11PCs (Yi et al. 2015), VIS+NIR reconstruction+UDP (Juefei-Xu, Pal, and Savvides 2015)....

    [...]

  • ...2013; Jin, Lu, and Ruan 2015), coupled discriminant face descriptor (C-DFD) (Lei, Pietikainen, and Li 2014; Jin, Lu, and Ruan 2015), DSIFT+PCA+LDA (Dhamecha et al. 2014), coupled discriminant feature learning (CDFL) (Jin, Lu, and Ruan 2015), Gabor+RBM+Remove 11PCs (Yi et al....

    [...]

  • ...The methods can be ordered in ascending rank-1 accuracy as PCA+Sym+HCA, LCFS, C-DFD, CDFL, DSIFT+PCA+LDA, VIS+NIR reconstruction+UDP, Gabor+RBM+Remove 11PCs, IDR....

    [...]

Proceedings ArticleDOI
13 Jun 2016
TL;DR: A deep TransfeR NIR-VIS heterogeneous face recognition neTwork (TRIVET) with deep convolutional neural network with ordinal measures to learn discriminative models achieves state-of-the-art recognition performance on the most challenging CASIA Nir-VIS 2.0 Face Database.
Abstract: One task of heterogeneous face recognition is to match a near infrared (NIR) face image to a visible light (VIS) image. In practice, there are often a few pairwise NIR-VIS face images but it is easy to collect lots of VIS face images. Therefore, how to use these unpaired VIS images to improve the NIR-VIS recognition accuracy is an ongoing issue. This paper presents a deep TransfeR NIR-VIS heterogeneous facE recognition neTwork (TRIVET) for NIR-VIS face recognition. First, to utilize large numbers of unpaired VIS face images, we employ the deep convolutional neural network (CNN) with ordinal measures to learn discriminative models. The ordinal activation function (Max-Feature-Map) is used to select discriminative features and make the models robust and lighten. Second, we transfer these models to NIR-VIS domain by fine-tuning with two types of NIR-VIS triplet loss. The triplet loss not only reduces intra-class NIR-VIS variations but also augments the number of positive training sample pairs. It makes fine-tuning deep models on a small dataset possible. The proposed method achieves state-of-the-art recognition performance on the most challenging CASIA NIR-VIS 2.0 Face Database. It achieves a new record on rank-1 accuracy of 95.74% and verification rate of 91.03% at FAR=0.001. It cuts the error rate in comparison with the best accuracy [27] by 69%.

128 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"On Effectiveness of Histogram of Or..." refers background or methods in this paper

  • ...Note that, DSIFT and HOG-UoCTTI utilize directed gradient whereas HOG-DT utilizes undirected gradient....

    [...]

  • ...The key contributions of this work are: • Evaluation and performance analysis of three HOG variants, namely ◦ Dense Scale Invariant Feature Transform (DSIFT) [16], ◦ Dalal-Triggs HOG (HOG-DT) [17], and ◦ HOG-UoCTTI [18], along with classification by LDA and direct feature matching....

    [...]

  • ...• The performances of DSIFT, HOG-UoCTTI, and HOG-DT reveal that the later lags in the comparison of formers....

    [...]

  • ...For classical HOG d = 128, for HOGDT d = 36, and for HOG-UoCTTI d = 31 (no is set as 8, 9, and 9, respectively and m = 8 leads to a total of 64 key points located on a uniform grid)....

    [...]

  • ...There are many variants of HOG descriptors such as DSIFT [16], HOG-DT [17], and HOG-UoCTTI [18]....

    [...]

Reference EntryDOI
15 Oct 2005
TL;DR: Principal component analysis (PCA) as discussed by the authors replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables.
Abstract: When large multivariate datasets are analyzed, it is often desirable to reduce their dimensionality. Principal component analysis is one technique for doing this. It replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables. Often, it is possible to retain most of the variability in the original variables with q very much smaller than p. Despite its apparent simplicity, principal component analysis has a number of subtleties, and it has many uses and extensions. A number of choices associated with the technique are briefly discussed, namely, covariance or correlation, how many components, and different normalization constraints, as well as confusion with factor analysis. Various uses and extensions are outlined. Keywords: dimension reduction; factor analysis; multivariate analysis; variance maximization

14,773 citations

Journal ArticleDOI
TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
Abstract: We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.

10,501 citations


"On Effectiveness of Histogram of Or..." refers background or methods in this paper

  • ...Note that, DSIFT and HOG-UoCTTI utilize directed gradient whereas HOG-DT utilizes undirected gradient....

    [...]

  • ...The key contributions of this work are: • Evaluation and performance analysis of three HOG variants, namely ◦ Dense Scale Invariant Feature Transform (DSIFT) [16], ◦ Dalal-Triggs HOG (HOG-DT) [17], and ◦ HOG-UoCTTI [18], along with classification by LDA and direct feature matching....

    [...]

  • ...• The performances of DSIFT, HOG-UoCTTI, and HOG-DT reveal that the later lags in the comparison of formers....

    [...]

  • ...For classical HOG d = 128, for HOGDT d = 36, and for HOG-UoCTTI d = 31 (no is set as 8, 9, and 9, respectively and m = 8 leads to a total of 64 key points located on a uniform grid)....

    [...]

  • ...Dimensionality of the feature descriptor f is (m×m)×d. Generally, the size of this descriptor is 2,000-10,000 which is a significantly large feature vector size with the training set of approximately 2000 samples....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors provide an up-to-date critical survey of still-and video-based face recognition research, and provide some insights into the studies of machine recognition of faces.
Abstract: As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system.This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.

6,384 citations