scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Deep face recognition: A survey

14 Mar 2021-Neurocomputing (Elsevier)-Vol. 429, pp 215-244
TL;DR: A comprehensive review of the recent developments on deep face recognition can be found in this paper, covering broad topics on algorithm designs, databases, protocols, and application scenes, as well as the technical challenges and several promising directions.
About: This article is published in Neurocomputing.The article was published on 2021-03-14 and is currently open access. It has received 353 citations till now. The article focuses on the topics: Deep learning & Feature extraction.
Citations
More filters
Reference EntryDOI
15 Oct 2004

2,118 citations

Journal ArticleDOI
TL;DR: This survey provides a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors, and lists the traditional and new applications.
Abstract: Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in people's life, such as monitoring security, autonomous driving and so on, with the purpose of locating instances of semantic objects of a certain class. With the rapid development of deep learning algorithms for detection tasks, the performance of object detectors has been greatly improved. In order to understand the main development status of object detection pipeline thoroughly and deeply, in this survey, we analyze the methods of existing typical detection models and describe the benchmark datasets at first. Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors. Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well. Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.

749 citations

Journal ArticleDOI
TL;DR: In this article, the authors provide a review of deep neural network concepts in background subtraction for novices and experts in order to analyze this success and to provide further directions.

278 citations

01 Jan 2006
TL;DR: It is concluded that the problem of age-progression on face recognition (FR) is not unique to the algorithm used in this work, and the efficacy of this algorithm is evaluated against the variables of gender and racial origin.
Abstract: This paper details MORPH a longitudinal face database developed for researchers investigating all facets of adult age-progression, e.g. face modeling, photo-realistic animation, face recognition, etc. This database contributes to several active research areas, most notably face recognition, by providing: the largest set of publicly available longitudinal images; longitudinal spans from a few months to over twenty years; and, the inclusion of key physical parameters that affect aging appearance. The direct contribution of this data corpus for face recognition is highlighted in the evaluation of a standard face recognition algorithm, which illustrates the impact that age-progression, has on recognition rates. Assessment of the efficacy of this algorithm is evaluated against the variables of gender and racial origin. This work further concludes that the problem of age-progression on face recognition (FR) is not unique to the algorithm used in this work.

139 citations

References
More filters
Journal ArticleDOI
TL;DR: The proposed couple mappings method significantly improves the recognition performance especially for very low resolution probe face images (11.4% improvement in recognition accuracy) and can reconstruct a high resolution image from its corresponding lowresolution probe image which is comparable with state-of-the-art super-resolution methods in terms of visual quality.
Abstract: We propose a novel coupled mappings method for low resolution face recognition using deep convolutional neural networks (DCNNs). The proposed architecture consists of two branches of DCNNs to map the high and low resolution face images into a common space with nonlinear transformations. The branch corresponding to transformation of high resolution images consists of 14 layers and the other branch which maps the low resolution face images to the common space includes a 5-layer super-resolution network connected to a 14-layer network. The distance between the features of corresponding high and low resolution images are backpropagated to train the networks. Our proposed method is evaluated on FERET, LFW, and MBGC datasets and compared with state-of-the-art competing methods. Our extensive experimental evaluations show that the proposed method significantly improves the recognition performance especially for very low resolution probe face images (5% improvement in recognition accuracy). Furthermore, it can reconstruct a high resolution image from its corresponding low resolution probe image which is comparable with the state-of-the-art super-resolution methods in terms of visual quality.

86 citations

Journal ArticleDOI
TL;DR: A transform-invariant PCA technique which aims to accurately characterize the intrinsic structures of the human face that are invariant to the in-plane transformations of the training images, and suggests that state-of-the-art invariant descriptors, such as local binary pattern, histogram of oriented gradient, and Gabor energy filter, can benefit from using the TIPCA-aligned faces.
Abstract: We develop a transform-invariant PCA (TIPCA) technique which aims to accurately characterize the intrinsic structures of the human face that are invariant to the in-plane transformations of the training images. Specially, TIPCA alternately aligns the image ensemble and creates the optimal eigenspace, with the objective to minimize the mean square error between the aligned images and their reconstructions. The learning from the FERET facial image ensemble of 1,196 subjects validates the mutual promotion between image alignment and eigenspace representation, which eventually leads to the optimized coding and recognition performance that surpasses the handcrafted alignment based on facial landmarks. Experimental results also suggest that state-of-the-art invariant descriptors, such as local binary pattern (LBP), histogram of oriented gradient (HOG), and Gabor energy filter (GEF), and classification methods, such as sparse representation based classification (SRC) and support vector machine (SVM), can benefit from using the TIPCA-aligned faces, instead of the manually eye-aligned faces that are widely regarded as the ground-truth alignment. Favorable accuracies against the state-of-the-art results on face coding and face recognition are reported.

85 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this paper, an image-to-video feature-level domain adaptation approach is proposed to learn discriminative video frame representations by distilling knowledge from the network to a video adaptation network through feature matching, performing feature restoration through synthetic data augmentation and learning a domain-invariant feature through a domain adversarial discriminator.
Abstract: Despite rapid advances in face recognition, there remains a clear gap between the performance of still image-based face recognition and video-based face recognition, due to the vast difference in visual quality between the domains and the difficulty of curating diverse large-scale video datasets. This paper addresses both of those challenges, through an image to video feature-level domain adaptation approach, to learn discriminative video frame representations. The framework utilizes large-scale unlabeled video data to reduce the gap between different domains while transferring discriminative knowledge from large-scale labeled still images. Given a face recognition network that is pretrained in the image domain, the adaptation is achieved by (i) distilling knowledge from the network to a video adaptation network through feature matching, (ii) performing feature restoration through synthetic data augmentation and (iii) learning a domain-invariant feature through a domain adversarial discriminator. We further improve performance through a discriminator-guided feature fusion that boosts high-quality frames while eliminating those degraded by video domain-specific factors. Experiments on the YouTube Faces and IJB-A datasets demonstrate that each module contributes to our feature-level domain adaptation framework and substantially improves video face recognition performance to achieve state-of-the-art accuracy. We demonstrate qualitatively that the network learns to suppress diverse artifacts in videos such as pose, illumination or occlusion without being explicitly trained for them.

84 citations

Book ChapterDOI
08 Sep 2018
TL;DR: In this paper, the authors proposed Quantization Mimic, which first quantizes the large network, then mimics a quantized small network to better match the feature maps from teacher network.
Abstract: In this paper, we propose a simple and general framework for training very tiny CNNs (e.g. VGG with the number of channels reduced to \(\frac{1}{32}\)) for object detection. Due to limited representation ability, it is challenging to train very tiny networks for complicated tasks like detection. To the best of our knowledge, our method, called Quantization Mimic, is the first one focusing on very tiny networks. We utilize two types of acceleration methods: mimic and quantization. Mimic improves the performance of a student network by transfering knowledge from a teacher network. Quantization converts a full-precision network to a quantized one without large degradation of performance. If the teacher network is quantized, the search scope of the student network will be smaller. Using this feature of the quantization, we propose Quantization Mimic. It first quantizes the large network, then mimic a quantized small network. The quantization operation can help student network to better match the feature maps from teacher network. To evaluate our approach, we carry out experiments on various popular CNNs including VGG and Resnet, as well as different detection frameworks including Faster R-CNN and R-FCN. Experiments on Pascal VOC and WIDER FACE verify that our Quantization Mimic algorithm can be applied on various settings and outperforms state-of-the-art model acceleration methods given limited computing resouces.

83 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A deep convolutional neural network based method that leverages a large visible image face dataset to prevent overfitting is described and experimental results on two benchmark datasets showing its effectiveness are presented.
Abstract: Heterogeneous face recognition is the problem of identifying a person from a face image acquired with a nontraditional sensor by matching it to a visible gallery. Most approaches to this problem involve modeling the relationship between corresponding images from the visible and sensing domains. This is typically done at the patch level and/or with shallow models with the aim to prevent overfitting. In this work, rather than modeling local patches or using a simple model, we propose to use a complex, deep model to learn the relationship between the entirety of cross-modal face images. We describe a deep convolutional neural network based method that leverages a large visible image face dataset to prevent overfitting. We present experimental results on two benchmark datasets showing its effectiveness.

80 citations