On Effectiveness of Histogram of Oriented Gradient Features for Visible to Near Infrared Face Matching

doi:10.1109/ICPR.2014.314

Home
/
Papers
/
On Effectiveness of Histogram of Oriented Gradient Features for Visible to Near Infrared Face Matching

Proceedings Article•DOI•

On Effectiveness of Histogram of Oriented Gradient Features for Visible to Near Infrared Face Matching

Tejas I. Dhamecha¹, Praneet Sharma¹, Richa Singh¹, Mayank Vatsa¹•Institutions (1)

Indraprastha Institute of Information Technology¹

24 Aug 2014-pp 1788-1793

TL;DR: The results demonstrate that DSIFT with subspace LDA outperforms a commercial matcher and other HOG variants by at least 15% and that histogram of oriented gradient features are able to encode similar facial features across spectrums.

read less

Abstract: The advent of near infrared imagery and it's applications in face recognition has instigated research in cross spectral (visible to near infrared) matching. Existing research has focused on extracting textural features including variants of histogram of oriented gradients. This paper focuses on studying the effectiveness of these features for cross spectral face recognition. On NIR-VIS-2.0 cross spectral face database, three HOG variants are analyzed along with dimensionality reduction approaches and linear discriminant analysis. The results demonstrate that DSIFT with subspace LDA outperforms a commercial matcher and other HOG variants by at least 15%. We also observe that histogram of oriented gradient features are able to encode similar facial features across spectrums.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition

[...]

Ran He¹, Xiang Wu¹, Zhenan Sun¹, Tieniu Tan¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jul 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Wasserstein convolutional neural network (WCNN) as discussed by the authors was proposed to learn invariant features between near-infrared (NIR) and visual (VIS) face images, and the Wasserstein distance was introduced into the NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions.

...read moreread less

Abstract: Heterogeneous face recognition (HFR) aims at matching facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR presents more challenging issues than traditional face recognition because of the large intra-class variation among heterogeneous face images and the limited availability of training samples of cross-modality face image pairs. This paper proposes the novel Wasserstein convolutional neural network (WCNN) approach for learning invariant features between near-infrared (NIR) and visual (VIS) face images (i.e., NIR-VIS face recognition). The low-level layers of the WCNN are trained with widely available face images in the VIS spectrum, and the high-level layer is divided into three parts: the NIR layer, the VIS layer and the NIR-VIS shared layer. The first two layers aim at learning modality-specific features, and the NIR-VIS shared layer is designed to learn a modality-invariant feature subspace. The Wasserstein distance is introduced into the NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. W-CNN learning is performed to minimize the Wasserstein distance between the NIR distribution and the VIS distribution for invariant deep feature representations of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected WCNN layers to reduce the size of the parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at the training stage and an efficient computation for heterogeneous data at the testing stage. Extensive experiments using three challenging NIR-VIS face recognition databases demonstrate the superiority of the WCNN method over state-of-the-art methods.

...read moreread less

231 citations

Proceedings Article•DOI•

NIR-VIS heterogeneous face recognition via cross-spectral joint dictionary learning and reconstruction

[...]

Felix Juefei-Xu¹, Dipan K. Pal¹, Marios Savvides¹•Institutions (1)

Carnegie Mellon University¹

07 Jun 2015

TL;DR: This paper develops a method to reconstruct VIS images in the NIR domain and vice-versa using a cross-spectral joint ℓ0 minimization based dictionary learning approach to learn a mapping function between the two domains.

...read moreread less

Abstract: A lot of real-world data is spread across multiple domains. Handling such data has been a challenging task. Heterogeneous face biometrics has begun to receive attention in recent years. In real-world scenarios, many surveillance cameras capture data in the NIR (near infrared) spectrum. However, most datasets accessible to law enforcement have been collected in the VIS (visible light) domain. Thus, there exists a need to match NIR to VIS face images. In this paper, we approach the problem by developing a method to reconstruct VIS images in the NIR domain and vice-versa. This approach is more applicable to real-world scenarios since it does not involve having to project millions of VIS database images into learned common subspace for subsequent matching. We present a cross-spectral joint l 0 minimization based dictionary learning approach to learn a mapping function between the two domains. One can then use the function to reconstruct facial images between the domains. Our method is open set and can reconstruct any face not present in the training data. We present results on the CASIA NIR-VIS v2.0 database and report state-of-the-art results.

...read moreread less

150 citations

Cites background or result from "On Effectiveness of Histogram of Or..."

...[4] study how the histogram of oriented gradients (HOG) feature and its variants can help crossspectral face recognition tasks....
[...]
...We are able to significantly outperform the baseline [24] as well as some good results reported in [4] (73....
[...]

Journal Article•DOI•

Simultaneous Local Binary Feature Learning and Encoding for Homogeneous and Heterogeneous Face Recognition

[...]

Jiwen Lu¹, Venice Erin Liong², Jie Zhou¹•Institutions (2)

Tsinghua University¹, Nanyang Technological University²

01 Aug 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Experimental results on six widely used face datasets including the LFW, YouTube Face, FERET, PaSC, CASIA VIS-NIR 2.0, and Multi-PIE datasets are presented to demonstrate the effectiveness of the proposed methods.

...read moreread less

Abstract: In this paper, we propose a simultaneous local binary feature learning and encoding (SLBFLE) approach for both homogeneous and heterogeneous face recognition. Unlike existing hand-crafted face descriptors such as local binary pattern (LBP) and Gabor features which usually require strong prior knowledge, our SLBFLE is an unsupervised feature learning approach which automatically learns face representation from raw pixels. Unlike existing binary face descriptors such as the LBP, discriminant face descriptor (DFD), and compact binary face descriptor (CBFD) which use a two-stage feature extraction procedure, our SLBFLE jointly learns binary codes and the codebook for local face patches so that discriminative information from raw pixels from face images of different identities can be obtained by using a one-stage feature learning and encoding procedure. Moreover, we propose a coupled simultaneous local binary feature learning and encoding (C-SLBFLE) method to make the proposed approach suitable for heterogenous face matching. Unlike most existing coupled feature learning methods which learn a pair of transformation matrices for each modality, we exploit both the common and specific information from heterogeneous face samples to characterize their underlying correlations. Experimental results on six widely used face datasets including the LFW, YouTube Face (YTF), FERET, PaSC, CASIA VIS-NIR 2.0, and Multi-PIE datasets are presented to demonstrate the effectiveness of the proposed methods.

...read moreread less

146 citations

Proceedings Article•

Learning Invariant Deep Representation for NIR-VIS Face Recognition

[...]

Ran He¹, Xiang Wu¹, Zhenan Sun¹, Tieniu Tan¹•Institutions (1)

Chinese Academy of Sciences¹

13 Feb 2017

TL;DR: A deep convolutional network approach that uses only one network to map both NIR and VIS images to a compact Euclidean space and achieves 94% verification rate at FAR=0.1% on the challenging CASIA NIR-VIS 2.0 face recognition dataset.

...read moreread less

Abstract: Visual versus near infrared (VIS-NIR) face recognition is still a challenging heterogeneous task due to large appearance difference between VIS and NIR modalities. This paper presents a deep convolutional network approach that uses only one network to map both NIR and VIS images to a compact Euclidean space. The low-level layers of this network are trained only on large-scale VIS data. Each convolutional layer is implemented by the simplest case of maxout operator. The high-level layer is divided into two orthogonal subspaces that contain modality-invariant identity information and modality-variant spectrum information respectively. Our joint formulation leads to an alternating minimization approach for deep representation at the training time and an efficient computation for heterogeneous data at the testing time. Experimental evaluations show that our method achieves 94% verification rate at FAR=0.1% on the challenging CASIA NIR-VIS 2.0 face recognition dataset. Compared with state-of-the-art methods, it reduces the error rate by 58% only with a compact 64-D representation.

...read moreread less

143 citations

Cites background from "On Effectiveness of Histogram of Or..."

...…2013; Jin, Lu, and Ruan 2015), coupled discriminant face descriptor (C-DFD) (Lei, Pietikainen, and Li 2014; Jin, Lu, and Ruan 2015), DSIFT+PCA+LDA (Dhamecha et al. 2014), coupled discriminant feature learning (CDFL) (Jin, Lu, and Ruan 2015), Gabor+RBM+Remove 11PCs (Yi et al. 2015), VIS+NIR…...
[...]
...To verify the performance of IDR, we compare our method with state-of-the-art NIR-VIS recognition methods, including PCA+Sym+HCA (Li et al. 2013), learning coupled feature spaces (LCFS) method (Wang et al. 2013; Jin, Lu, and Ruan 2015), coupled discriminant face descriptor (C-DFD) (Lei, Pietikainen, and Li 2014; Jin, Lu, and Ruan 2015), DSIFT+PCA+LDA (Dhamecha et al. 2014), coupled discriminant feature learning (CDFL) (Jin, Lu, and Ruan 2015), Gabor+RBM+Remove 11PCs (Yi et al. 2015), VIS+NIR reconstruction+UDP (Juefei-Xu, Pal, and Savvides 2015)....
[...]
...2013; Jin, Lu, and Ruan 2015), coupled discriminant face descriptor (C-DFD) (Lei, Pietikainen, and Li 2014; Jin, Lu, and Ruan 2015), DSIFT+PCA+LDA (Dhamecha et al. 2014), coupled discriminant feature learning (CDFL) (Jin, Lu, and Ruan 2015), Gabor+RBM+Remove 11PCs (Yi et al....
[...]
...The methods can be ordered in ascending rank-1 accuracy as PCA+Sym+HCA, LCFS, C-DFD, CDFL, DSIFT+PCA+LDA, VIS+NIR reconstruction+UDP, Gabor+RBM+Remove 11PCs, IDR....
[...]

Proceedings Article•DOI•

Transferring deep representation for NIR-VIS heterogeneous face recognition

[...]

Xiaoxiang Liu¹, Lingxiao Song¹, Xiang Wu¹, Tieniu Tan¹•Institutions (1)

Chinese Academy of Sciences¹

13 Jun 2016

TL;DR: A deep TransfeR NIR-VIS heterogeneous face recognition neTwork (TRIVET) with deep convolutional neural network with ordinal measures to learn discriminative models achieves state-of-the-art recognition performance on the most challenging CASIA Nir-VIS 2.0 Face Database.

...read moreread less

Abstract: One task of heterogeneous face recognition is to match a near infrared (NIR) face image to a visible light (VIS) image. In practice, there are often a few pairwise NIR-VIS face images but it is easy to collect lots of VIS face images. Therefore, how to use these unpaired VIS images to improve the NIR-VIS recognition accuracy is an ongoing issue. This paper presents a deep TransfeR NIR-VIS heterogeneous facE recognition neTwork (TRIVET) for NIR-VIS face recognition. First, to utilize large numbers of unpaired VIS face images, we employ the deep convolutional neural network (CNN) with ordinal measures to learn discriminative models. The ordinal activation function (Max-Feature-Map) is used to select discriminative features and make the models robust and lighten. Second, we transfer these models to NIR-VIS domain by fine-tuning with two types of NIR-VIS triplet loss. The triplet loss not only reduces intra-class NIR-VIS variations but also augments the number of positive training sample pairs. It makes fine-tuning deep models on a small dataset possible. The proposed method achieves state-of-the-art recognition performance on the most challenging CASIA NIR-VIS 2.0 Face Database. It achieves a new record on rank-1 accuracy of 95.74% and verification rate of 91.03% at FAR=0.001. It cuts the error rate in comparison with the best accuracy [27] by 69%.

...read moreread less

128 citations

1
2
3
4
…
5
6
7
8
9
10

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Proceedings Article•DOI•

Histograms of oriented gradients for human detection

[...]

Navneet Dalal¹, Bill Triggs¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

20 Jun 2005

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

...read moreread less

31,952 citations

"On Effectiveness of Histogram of Or..." refers background or methods in this paper

...Note that, DSIFT and HOG-UoCTTI utilize directed gradient whereas HOG-DT utilizes undirected gradient....
[...]
...The key contributions of this work are: • Evaluation and performance analysis of three HOG variants, namely ◦ Dense Scale Invariant Feature Transform (DSIFT) [16], ◦ Dalal-Triggs HOG (HOG-DT) [17], and ◦ HOG-UoCTTI [18], along with classification by LDA and direct feature matching....
[...]
...• The performances of DSIFT, HOG-UoCTTI, and HOG-DT reveal that the later lags in the comparison of formers....
[...]
...For classical HOG d = 128, for HOGDT d = 36, and for HOG-UoCTTI d = 31 (no is set as 8, 9, and 9, respectively and m = 8 leads to a total of 64 key points located on a uniform grid)....
[...]
...There are many variants of HOG descriptors such as DSIFT [16], HOG-DT [17], and HOG-UoCTTI [18]....
[...]

Reference Entry•DOI•

Principal Component Analysis

[...]

Ian T. Jolliffe¹•Institutions (1)

University of Aberdeen¹

15 Oct 2005

TL;DR: Principal component analysis (PCA) as discussed by the authors replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables.

...read moreread less

Abstract: When large multivariate datasets are analyzed, it is often desirable to reduce their dimensionality. Principal component analysis is one technique for doing this. It replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables. Often, it is possible to retain most of the variability in the original variables with q very much smaller than p. Despite its apparent simplicity, principal component analysis has a number of subtleties, and it has many uses and extensions. A number of choices associated with the technique are briefly discussed, namely, covariance or correlation, how many components, and different normalization constraints, as well as confusion with factor analysis. Various uses and extensions are outlined. Keywords: dimension reduction; factor analysis; multivariate analysis; variance maximization

...read moreread less

14,773 citations

Journal Article•DOI•

Object Detection with Discriminatively Trained Part-Based Models

[...]

Pedro F. Felzenszwalb¹, Ross Girshick¹, David McAllester², Deva Ramanan³•Institutions (3)

University of Chicago¹, Toyota², University of California, Irvine³

01 Sep 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.

...read moreread less

Abstract: We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.

...read moreread less

10,501 citations

"On Effectiveness of Histogram of Or..." refers background or methods in this paper

...Note that, DSIFT and HOG-UoCTTI utilize directed gradient whereas HOG-DT utilizes undirected gradient....
[...]
...The key contributions of this work are: • Evaluation and performance analysis of three HOG variants, namely ◦ Dense Scale Invariant Feature Transform (DSIFT) [16], ◦ Dalal-Triggs HOG (HOG-DT) [17], and ◦ HOG-UoCTTI [18], along with classification by LDA and direct feature matching....
[...]
...• The performances of DSIFT, HOG-UoCTTI, and HOG-DT reveal that the later lags in the comparison of formers....
[...]
...For classical HOG d = 128, for HOGDT d = 36, and for HOG-UoCTTI d = 31 (no is set as 8, 9, and 9, respectively and m = 8 leads to a total of 64 key points located on a uniform grid)....
[...]
...Dimensionality of the feature descriptor f is (m×m)×d. Generally, the size of this descriptor is 2,000-10,000 which is a significantly large feature vector size with the training set of approximately 2000 samples....
[...]

Journal Article•DOI•

Face recognition: A literature survey

[...]

W. Zhao¹, Rama Chellappa², P. J. Phillips³, Azriel Rosenfeld²•Institutions (3)

Sarnoff Corporation¹, University of Maryland, College Park², National Institute of Standards and Technology³

01 Dec 2003-ACM Computing Surveys

TL;DR: In this paper, the authors provide an up-to-date critical survey of still-and video-based face recognition research, and provide some insights into the studies of machine recognition of faces.

...read moreread less

Abstract: As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system.This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.

...read moreread less

6,384 citations