Unraveling representations for face recognition : from handcrafted to deep learning

Open AccessDissertation

Unraveling representations for face recognition : from handcrafted to deep learning

TLDR

This dissertation proposes novel feature extraction and fusion paradigms along with improvements to existing methodologies in order to address the challenge of unconstrained face recognition and presents a novel methodology to improve the robustness of such algorithms in a generalizable manner.

Abstract:

Automatic face recognition in unconstrained environments is a popular and challenging research problem. With the improvements in recognition algorithms, focus has shifted from addressing various covariates individually to performing face recognition in truly unconstrained scenarios. Face databases such as the YouTube Faces and the Point-and-shoot-challenge capture a wide array of challenges such as pose, expression, illumination, resolution, and occlusion simultaneously. In general, every face recognition algorithm relies on some form of feature extraction mechanism to succinctly represent the most important characteristics of face images so that machine learning techniques can successfully distinguish face images of one individual apart from those of others. This dissertation proposes novel feature extraction and fusion paradigms along with improvements to existing methodologies in order to address the challenge of unconstrained face recognition. In addition, it also presents a novel methodology to improve the robustness of such algorithms in a generalizable manner. We begin with addressing the challenge of utilizing face data captured from consumer level RGB-D devices to improve face recognition performance without increasing the operational cost. The images captured using such devices is of poor quality compared to specialized 3D sensors. To solve this, we propose a novel feature descriptor based on the entropy of RGB-D faces along with the saliency feature obtained from a 2D face. Geometric facial attributes are also extracted from the depth image and face recognition is performed by fusing both the descriptor and attribute match scores. While score level fusion does increase the robustness of the overall framework, it cannot take into account and utilize the additional information present at the feature level. To address this challenge, we need a better feature-level fusion algorithm that can combine multiple features while preserving as much of this information before the score computation stage. To accomplish this, we propose the Group Sparse Representation based Classifier (GSRC) which removes the requirement for a separate feature-level fusion mechanism and integrates multiple features seamlessly into classification. We also propose a kernelization based extension to the GSRC that further improves its ability to separate classes that have high inter-class similarity. We next address the problem of efficiently using large amount of video data to perform face recognition. A single video contains hundreds of images, however, not all frames of a video contain useful features for face recognition and some frames might even deteriorate performance. Keeping this in mind, we propose a novel face verification algorithm which starts with selecting featurerich frames from a video sequence using discrete wavelet transform and entropy computation. Frame selection is followed by learning a joint representation from the proposed deep learning VII architecture which is a combination of stacked denoising sparse autoencoder and deep Boltzmann machine. A multilayer neural network is used as classifier to obtain the verification decision. Currently, most of the highly accurate face recognition algorithms are based on deep learning based feature extraction. These networks have been shown in literature to be vulnerable to engineered adversarial attacks. We assess that non-learning based image-level distortions can also adversely affect the performance of such algorithms. We capitalize on how some of these errors propagate through the network to devise detection and mitigation methodologies that can help improve the real-world robustness of deep network based face recognition. The proposed algorithm does not require any re-training of the existing networks and is not specific to a particular type of network. We also evaluate the generalizability and efficacy of the approach by testing it with multiple networks and distortions. We observe favorable results that are consistently better than existing methodologies in all the test cases.

Unraveling representations for face recognition : from handcrafted to deep learning

Citations

Handbook Of Biometrics

On discrete cosine transform

Probabilistic Elastic Part Model: A Pose-Invariant Representation for Real-world Face Verification

References

Going deeper with convolutions

Dropout: a simple way to prevent neural networks from overfitting

Histograms of oriented gradients for human detection

Rapid object detection using a boosted cascade of simple features

A method for registration of 3-D shapes

Related Papers (5)

Special issue on deep learning for document analysis and recognition

Advances in Deep Learning

3-D face recognition

Research on Image Recognition Based on Deep Learning Technology

LightFace: A Hybrid Deep Face Recognition Framework