scispace - formally typeset
Search or ask a question

Showing papers on "Facial recognition system published in 2018"


Proceedings ArticleDOI
Qiong Cao1, Li Shen1, Weidi Xie1, Omkar M. Parkhi1, Andrew Zisserman1 
15 May 2018
TL;DR: VGGFace2 as discussed by the authors is a large-scale face dataset with 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject.
Abstract: In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimise the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity. To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS-Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on the IJB-A and IJB-B face recognition benchmarks, exceeding the previous state-of-the-art by a large margin. The dataset and models are publicly available.

2,365 citations


Proceedings ArticleDOI
Hao Wang1, Yitong Wang1, Zhou Zheng1, Ji Xing1, Dihong Gong1, Jingchao Zhou1, Zhifeng Li1, Wei Liu1 
18 Jun 2018
TL;DR: In this article, the authors proposed a large margin cosine loss (LMCL), which normalizes both features and weight vectors to remove radial variations, based on which a cosine margin term is introduced to further maximize the decision margin in the angular space.
Abstract: Face recognition has made extraordinary progress owing to the advancement of deep convolutional neural networks (CNNs). The central task of face recognition, including face verification and identification, involves face feature discrimination. However, the traditional softmax loss of deep CNNs usually lacks the power of discrimination. To address this problem, recently several loss functions such as center loss, large margin softmax loss, and angular softmax loss have been proposed. All these improved losses share the same idea: maximizing inter-class variance and minimizing intra-class variance. In this paper, we propose a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. More specifically, we reformulate the softmax loss as a cosine loss by L2 normalizing both features and weight vectors to remove radial variations, based on which a cosine margin term is introduced to further maximize the decision margin in the angular space. As a result, minimum intra-class variance and maximum inter-class variance are achieved by virtue of normalization and cosine decision margin maximization. We refer to our model trained with LMCL as CosFace. Extensive experimental evaluations are conducted on the most popular public-domain face recognition datasets such as MegaFace Challenge, Youtube Faces (YTF) and Labeled Face in the Wild (LFW). We achieve the state-of-the-art performance on these benchmarks, which confirms the effectiveness of our proposed approach.

1,879 citations


Posted Content
TL;DR: This article proposed an additive angular margin loss (ArcFace) to obtain highly discriminative features for face recognition, which has a clear geometric interpretation due to the exact correspondence to the geodesic distance on the hypersphere.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that enhance discriminative power. Centre loss penalises the distance between the deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in an angular space and penalises the angles between the deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to the exact correspondence to the geodesic distance on the hypersphere. We present arguably the most extensive experimental evaluation of all the recent state-of-the-art face recognition methods on over 10 face recognition benchmarks including a new large-scale image database with trillion level of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state-of-the-art and can be easily implemented with negligible computational overhead. We release all refined training data, training codes, pre-trained models and training logs, which will help reproduce the results in this paper.

1,122 citations


Proceedings ArticleDOI
15 May 2018
TL;DR: OpenFace 2.0 is an extension of OpenFace toolkit and is capable of more accurate facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
Abstract: Over the past few years, there has been an increased interest in automatic facial behavior analysis and understanding. We present OpenFace 2.0 - a tool intended for computer vision and machine learning researchers, affective computing community and people interested in building interactive applications based on facial behavior analysis. OpenFace 2.0 is an extension of OpenFace toolkit and is capable of more accurate facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation. The computer vision algorithms which represent the core of OpenFace 2.0 demonstrate state-of-the-art results in all of the above mentioned tasks. Furthermore, our tool is capable of real-time performance and is able to run from a simple webcam without any specialist hardware. Finally, unlike a lot of modern approaches or toolkits, OpenFace 2.0 source code for training models and running them is freely available for research purposes.

1,107 citations


Proceedings ArticleDOI
01 Feb 2018
TL;DR: The IARPA Janus Benchmark–C (IJB-C) face dataset advances the goal of robust unconstrained face recognition, improving upon the previous public domain IJB-B dataset, by increasing dataset size and variability, and by introducing end-to-end protocols that more closely model operational face recognition use cases.
Abstract: Although considerable work has been done in recent years to drive the state of the art in facial recognition towards operation on fully unconstrained imagery, research has always been restricted by a lack of datasets in the public domain In addition, traditional biometrics experiments such as single image verification and closed set recognition do not adequately evaluate the ways in which unconstrained face recognition systems are used in practice The IARPA Janus Benchmark–C (IJB-C) face dataset advances the goal of robust unconstrained face recognition, improving upon the previous public domain IJB-B dataset, by increasing dataset size and variability, and by introducing end-to-end protocols that more closely model operational face recognition use cases IJB-C adds 1,661 new subjects to the 1,870 subjects released in IJB-B, with increased emphasis on occlusion and diversity of subject occupation and geographic origin with the goal of improving representation of the global population Annotations on IJB-C imagery have been expanded to allow for further covariate analysis, including a spatial occlusion grid to standardize analysis of occlusion Due to these enhancements, the IJB-C dataset is significantly more challenging than other datasets in the public domain and will advance the state of the art in unconstrained face recognition

510 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: This paper argues the importance of auxiliary supervision to guide the learning toward discriminative and generalizable cues, and introduces a new face anti-spoofing database that covers a large range of illumination, subject, and pose variations.
Abstract: Face anti-spoofing is crucial to prevent face recognition systems from a security breach. Previous deep learning approaches formulate face anti-spoofing as a binary classification problem. Many of them struggle to grasp adequate spoofing cues and generalize poorly. In this paper, we argue the importance of auxiliary supervision to guide the learning toward discriminative and generalizable cues. A CNN-RNN model is learned to estimate the face depth with pixel-wise supervision, and to estimate rPPG signals with sequence-wise supervision. The estimated depth and rPPG are fused to distinguish live vs. spoof faces. Further, we introduce a new face anti-spoofing database that covers a large range of illumination, subject, and pose variations. Experiments show that our model achieves the state-of-the-art results on both intra- and cross-database testing.

502 citations


Journal ArticleDOI
TL;DR: It is shown that faces contain much more information about sexual orientation than can be perceived or interpreted by the human brain, and that companies and governments are increasingly using computer vision algorithms to detect people’s intimate traits to expose a threat to the privacy and safety of gay men and women.
Abstract: We show that faces contain much more information about sexual orientation than can be perceived or interpreted by the human brain. We used deep neural networks to extract features from 35,326 facial images. These features were entered into a logistic regression aimed at classifying sexual orientation. Given a single facial image, a classifier could correctly distinguish between gay and heterosexual men in 81% of cases, and in 71% of cases for women. Human judges achieved much lower accuracy: 61% for men and 54% for women. The accuracy of the algorithm increased to 91% and 83%, respectively, given five facial images per person. Facial features employed by the classifier included both fixed (e.g., nose shape) and transient facial features (e.g., grooming style). Consistent with the prenatal hormone theory of sexual orientation, gay men and women tended to have gender-atypical facial morphology, expression, and grooming styles. Prediction models aimed at gender alone allowed for detecting gay males with 57% accuracy and gay females with 58% accuracy. Those findings advance our understanding of the origins of sexual orientation and the limits of human perception. Additionally, given that companies and governments are increasingly using computer vision algorithms to detect people's intimate traits, our findings expose a threat to the privacy and safety of gay men and women. (PsycINFO Database Record

420 citations


Journal ArticleDOI
TL;DR: A Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components, achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces.
Abstract: Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low- and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.

392 citations


Posted Content
TL;DR: This paper presents the first publicly available set of Deepfake videos generated from videos of VidTIMIT database, and demonstrates that GAN-generated Deep fake videos are challenging for both face recognition systems and existing detection methods.
Abstract: It is becoming increasingly easy to automatically replace a face of one person in a video with the face of another person by using a pre-trained generative adversarial network (GAN). Recent public scandals, e.g., the faces of celebrities being swapped onto pornographic videos, call for automated ways to detect these Deepfake videos. To help developing such methods, in this paper, we present the first publicly available set of Deepfake videos generated from videos of VidTIMIT database. We used open source software based on GANs to create the Deepfakes, and we emphasize that training and blending parameters can significantly impact the quality of the resulted videos. To demonstrate this impact, we generated videos with low and high visual quality (320 videos each) using differently tuned parameter sets. We showed that the state of the art face recognition systems based on VGG and Facenet neural networks are vulnerable to Deepfake videos, with 85.62% and 95.00% false acceptance rates respectively, which means methods for detecting Deepfake videos are necessary. By considering several baseline approaches, we found that audio-visual approach based on lip-sync inconsistency detection was not able to distinguish Deepfake videos. The best performing method, which is based on visual quality metrics and is often used in presentation attack detection domain, resulted in 8.97% equal error rate on high quality Deepfakes. Our experiments demonstrate that GAN-generated Deepfake videos are challenging for both face recognition systems and existing detection methods, and the further development of face swapping technology will make it even more so.

369 citations


Proceedings ArticleDOI
01 Oct 2018
TL;DR: The survey provides a clear, structured presentation of the principal, state-of-the-art (SOTA) face recognition techniques appearing within the past five years in top computer vision venues with some open issues currently overlooked by the community.
Abstract: Face recognition made tremendous leaps in the last five years with a myriad of systems proposing novel techniques substantially backed by deep convolutional neural networks (DCNN). Although face recognition performance sky-rocketed using deep-learning in classic datasets like LFW, leading to the belief that this technique reached human performance, it still remains an open problem in unconstrained environments as demonstrated by the newly released IJB datasets. This survey aims to summarize the main advances in deep face recognition and, more in general, in learning face representations for verification and identification. The survey provides a clear, structured presentation of the principal, state-of-the-art (SOTA) face recognition techniques appearing within the past five years in top computer vision venues. The survey is broken down into multiple parts that follow a standard face recognition pipeline: (a) how SOTA systems are trained and which public data sets have they used; (b) face preprocessing part (detection, alignment, etc.); (c) architecture and loss functions used for transfer learning (d) face recognition for verification and identification. The survey concludes with an overview of the SOTA results at a glance along with some open issues currently overlooked by the community.

347 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: The DeRL method has been evaluated on five databases, CK+, Oulu-CASIA, MMI, BU-3DFE, and BP4D+.
Abstract: A facial expression is a combination of an expressive component and a neutral component of a person. In this paper, we propose to recognize facial expressions by extracting information of the expressive component through a de-expression learning procedure, called De-expression Residue Learning (DeRL). First, a generative model is trained by cGAN. This model generates the corresponding neutral face image for any input face image. We call this procedure de-expression because the expressive information is filtered out by the generative model; however, the expressive information is still recorded in the intermediate layers. Given the neutral face image, unlike previous works using pixel-level or feature-level difference for facial expression classification, our new method learns the deposition (or residue) that remains in the intermediate layers of the generative model. Such a residue is essential as it contains the expressive component deposited in the generative model from any input facial expression images. Seven public facial expression databases are employed in our experiments. With two databases (BU-4DFE and BP4D-spontaneous) for pre-training, the DeRL method has been evaluated on five databases, CK+, Oulu-CASIA, MMI, BU-3DFE, and BP4D+. The experimental results demonstrate the superior performance of the proposed method.

Proceedings ArticleDOI
15 Jun 2018
TL;DR: In this paper, a method for training a regression network from image pixels to 3D morphable model coordinates using only unlabeled photographs is presented. But the training loss is based on features from a facial recognition network, computed on-the-fly by rendering the predicted faces with a differentiable renderer.
Abstract: We present a method for training a regression network from image pixels to 3D morphable model coordinates using only unlabeled photographs. The training loss is based on features from a facial recognition network, computed on-the-fly by rendering the predicted faces with a differentiable renderer. To make training from features feasible and avoid network fooling effects, we introduce three objectives: a batch distribution loss that encourages the output distribution to match the distribution of the morphable model, a loopback loss that ensures the network can correctly reinterpret its own output, and a multi-view identity loss that compares the features of the predicted 3D face and the input photograph from multiple viewing angles. We train a regression network using these objectives, a set of unlabeled photographs, and the morphable model itself, and demonstrate state-of-the-art results.

Proceedings ArticleDOI
01 May 2018
TL;DR: Experimental results on four benchmark expression databases have demonstrated that the CNN with the proposed island loss (IL-CNN) outperforms the baseline CNN models with either traditional softmax loss or center loss and achieves comparable or better performance compared with the state-of-the-art methods for facial expression recognition.
Abstract: Over the past few years, Convolutional Neural Networks (CNNs) have shown promise on facial expression recognition. However, the performance degrades dramatically under real-world settings due to variations introduced by subtle facial appearance changes, head pose variations, illumination changes, and occlusions. In this paper, a novel island loss is proposed to enhance the discriminative power of deeply learned features. Specifically, the island loss is designed to reduce the intra-class variations while enlarging the inter-class differences simultaneously. Experimental results on four benchmark expression databases have demonstrated that the CNN with the proposed island loss (IL-CNN) outperforms the baseline CNN models with either traditional softmax loss or center loss and achieves comparable or better performance compared with the state-of-the-art methods for facial expression recognition.

Proceedings ArticleDOI
15 May 2018
TL;DR: It is shown that a standard fully convolutional network (FCN) can achieve remarkably fast and accurate segmentations, provided that it is trained on a rich enough example set.
Abstract: We show that even when face images are unconstrained and arbitrarily paired, face swapping between them is quite simple. To this end, we make the following contributions. (a) Instead of tailoring systems for face segmentation, as others previously proposed, we show that a standard fully convolutional network (FCN) can achieve remarkably fast and accurate segmentations, provided that it is trained on a rich enough example set. For this purpose, we describe novel data collection and generation routines which provide challenging segmented face examples. (b) We use our segmentations for robust face swapping under unprecedented conditions. (c) Unlike previous work, our swapping is robust enough to allow for extensive quantitative tests. To this end, we use the Labeled Faces in the Wild (LFW) benchmark and measure the effect of intra- and inter-subject face swapping on recognition. We show that our intra-subject swapped faces remain as recognizable as their sources, testifying to the effectiveness of our method. In line with established perceptual studies, we show that better face swapping produces less recognizable inter-subject results. This is the first time this effect was quantitatively demonstrated by machine vision systems.

Proceedings ArticleDOI
18 Jun 2018
TL;DR: An end-to-end deep learning model by exploiting different poses and expressions jointly for simultaneous facial image synthesis and pose-invariant facial expression recognition, based on generative adversarial network (GAN).
Abstract: Facial expression recognition (FER) is a challenging task due to different expressions under arbitrary poses. Most conventional approaches either perform face frontalization on a non-frontal facial image or learn separate classifiers for each pose. Different from existing methods, in this paper, we propose an end-to-end deep learning model by exploiting different poses and expressions jointly for simultaneous facial image synthesis and pose-invariant facial expression recognition. The proposed model is based on generative adversarial network (GAN) and enjoys several merits. First, the encoder-decoder structure of the generator can learn a generative and discriminative identity representation for face images. Second, the identity representation is explicitly disentangled from both expression and pose variations through the expression and pose codes. Third, our model can automatically generate face images with different expressions under arbitrary poses to enlarge and enrich the training set for FER. Quantitative and qualitative evaluations on both controlled and in-the-wild datasets demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.

Journal ArticleDOI
TL;DR: In a comprehensive comparison of face identification by humans and computers, it is found that forensic facial examiners, facial reviewers, and superrecognizers were more accurate than fingerprint examiners and students on a challenging face identification test.
Abstract: Achieving the upper limits of face identification accuracy in forensic applications can minimize errors that have profound social and personal consequences. Although forensic examiners identify faces in these applications, systematic tests of their accuracy are rare. How can we achieve the most accurate face identification: using people and/or machines working alone or in collaboration? In a comprehensive comparison of face identification by humans and computers, we found that forensic facial examiners, facial reviewers, and superrecognizers were more accurate than fingerprint examiners and students on a challenging face identification test. Individual performance on the test varied widely. On the same test, four deep convolutional neural networks (DCNNs), developed between 2015 and 2017, identified faces within the range of human accuracy. Accuracy of the algorithms increased steadily over time, with the most recent DCNN scoring above the median of the forensic facial examiners. Using crowd-sourcing methods, we fused the judgments of multiple forensic facial examiners by averaging their rating-based identity judgments. Accuracy was substantially better for fused judgments than for individuals working alone. Fusion also served to stabilize performance, boosting the scores of lower-performing individuals and decreasing variability. Single forensic facial examiners fused with the best algorithm were more accurate than the combination of two examiners. Therefore, collaboration among humans and between humans and machines offers tangible benefits to face identification accuracy in important applications. These results offer an evidence-based roadmap for achieving the most accurate face identification possible.

Proceedings ArticleDOI
01 Jun 2018
TL;DR: Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks demonstrate the superiority of the proposed Pose Invariant Model for face recognition in the wild over the state of thearts.
Abstract: Pose variation is one key challenge in face recognition. As opposed to current techniques for pose invariant face recognition, which either directly extract pose invariant features for recognition, or first normalize profile face images to frontal pose before feature extraction, we argue that it is more desirable to perform both tasks jointly to allow them to benefit from each other. To this end, we propose a Pose Invariant Model (PIM) for face recognition in the wild, with three distinct novelties. First, PIM is a novel and unified deep architecture, containing a Face Frontalization sub-Net (FFN) and a Discriminative Learning sub-Net (DLN), which are jointly learned from end to end. Second, FFN is a well-designed dual-path Generative Adversarial Network (GAN) which simultaneously perceives global structures and local details, incorporated with an unsupervised cross-domain adversarial training and a "learning to learn" strategy for high-fidelity and identity-preserving frontal view synthesis. Third, DLN is a generic Convolutional Neural Network (CNN) for face recognition with our enforced cross-entropy optimization strategy for learning discriminative yet generalized feature representation. Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks demonstrate the superiority of the proposed model over the state-of-the-arts.

Journal ArticleDOI
TL;DR: A dynamic-weighting scheme to automatically assign the loss weights to each side task solves the crucial problem of balancing between different tasks in MTL and achieves comparable or better performance on LFW, CFP, and IJB-A datasets.
Abstract: This paper explores multi-task learning (MTL) for face recognition. First, we propose a multi-task convolutional neural network (CNN) for face recognition, where identity classification is the main task and pose, illumination, and expression (PIE) estimations are the side tasks. Second, we develop a dynamic-weighting scheme to automatically assign the loss weights to each side task, which solves the crucial problem of balancing between different tasks in MTL. Third, we propose a pose-directed multi-task CNN by grouping different poses to learn pose-specific identity features, simultaneously across all poses in a joint framework. Last but not least, we propose an energy-based weight analysis method to explore how CNN-based MTL works. We observe that the side tasks serve as regularizations to disentangle the PIE variations from the learnt identity features. Extensive experiments on the entire multi-PIE dataset demonstrate the effectiveness of the proposed approach. To the best of our knowledge, this is the first work using all data in multi-PIE for face recognition. Our approach is also applicable to in-the-wild data sets for pose-invariant face recognition and achieves comparable or better performance than state of the art on LFW, CFP, and IJB-A datasets.

Journal ArticleDOI
TL;DR: A comprehensive survey of facial feature point detection with the assistance of abundant manually labeled images is presented in this article, where the authors categorize existing methods into two primary categories according to whether there is the need of a parametric shape model: Parametric Shape Model-based methods and Nonparametric Shape Models-Based methods.

Proceedings ArticleDOI
18 Jun 2018
TL;DR: The proposed method leverages Fully Convolutional Network (FCN) to generate fix-sized spatial feature maps such that pixel-level features are consistent and can decrease the similarity of coupled images from different persons and increase that from the same person.
Abstract: Partial person re-identification (re-id) is a challenging problem, where only several partial observations (images) of people are available for matching. However, few studies have provided flexible solutions to identifying a person in an image containing arbitrary part of the body. In this paper, we propose a fast and accurate matching method to address this problem. The proposed method leverages Fully Convolutional Network (FCN) to generate fix-sized spatial feature maps such that pixel-level features are consistent. To match a pair of person images of different sizes, a novel method called Deep Spatial feature Reconstruction (DSR) is further developed to avoid explicit alignment. Specifically, DSR exploits the reconstructing error from popular dictionary learning models to calculate the similarity between different spatial feature maps. In that way, we expect that the proposed FCN can decrease the similarity of coupled images from different persons and increase that from the same person. Experimental results on two partial person datasets demonstrate the efficiency and effectiveness of the proposed method in comparison with several state-of-the-art partial person re-id approaches. Additionally, DSR achieves competitive results on a benchmark person dataset Market1501 with 83.58% Rank-1 accuracy.

Journal ArticleDOI
06 Aug 2018
TL;DR: This paper presents the applications of RPCA in video processing which utilize additional spatial and temporal information compared to image processing and provides perspectives on possible future research directions and algorithmic frameworks that are suitable for these applications.
Abstract: Robust principal component analysis (RPCA) via decomposition into low-rank plus sparse matrices offers a powerful framework for a large variety of applications such as image processing, video processing, and 3-D computer vision. Indeed, most of the time these applications require to detect sparse outliers from the observed imagery data that can be approximated by a low-rank matrix. Moreover, most of the time experiments show that RPCA with additional spatial and/or temporal constraints often outperforms the state-of-the-art algorithms in these applications. Thus, the aim of this paper is to survey the applications of RPCA in computer vision. In the first part of this paper, we review representative image processing applications as follows: 1) low-level imaging such as image recovery and denoising, image composition, image colorization, image alignment and rectification, multifocus image, and face recognition; 2) medical imaging such as dynamic magnetic resonance imaging (MRI) for acceleration of data acquisition, background suppression, and learning of interframe motion fields; and 3) imaging for 3-D computer vision with additional depth information such as in structure from motion (SfM) and 3-D motion recovery. In the second part, we present the applications of RPCA in video processing which utilize additional spatial and temporal information compared to image processing. Specifically, we investigate video denoising and restoration, hyperspectral video, and background/foreground separation. Finally, we provide perspectives on possible future research directions and algorithmic frameworks that are suitable for these applications.

Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this article, the authors introduce CNN architectures for both binary and multi-way cross-modal face and audio matching, and compare dynamic and static testing with human testing as a baseline to calibrate the difficulty of the task.
Abstract: We introduce a seemingly impossible task: given only an audio clip of someone speaking, decide which of two face images is the speaker. In this paper we study this, and a number of related cross-modal tasks, aimed at answering the question: how much can we infer from the voice about the face and vice versa? We study this task "in the wild", employing the datasets that are now publicly available for face recognition from static images (VGGFace) and speaker identification from audio (VoxCeleb). These provide training and testing scenarios for both static and dynamic testing of cross-modal matching. We make the following contributions: (i) we introduce CNN architectures for both binary and multi-way cross-modal face and audio matching: (ii) we compare dynamic testing (where video information is available, but the audio is not from the same video) with static testing (where only a single still image is available): and (iii) we use human testing as a baseline to calibrate the difficulty of the task. We show that a CNN can indeed be trained to solve this task in both the static and dynamic scenarios, and is even well above chance on 10-way classification of the face given the voice. The CNN matches human performance on easy examples (e.g. different gender across faces) but exceeds human performance on more challenging examples (e.g. faces with the same gender, age and nationality).

Proceedings ArticleDOI
18 Jun 2018
TL;DR: Zhang et al. as mentioned in this paper proposed to incorporate global semantic priors as input and impose local structure losses to regularize the output within a multi-scale deep CNN to restore sharp images with more facial details.
Abstract: In this paper, we present an effective and efficient face deblurring algorithm by exploiting semantic cues via deep convolutional neural networks (CNNs). As face images are highly structured and share several key semantic components (e.g., eyes and mouths), the semantic information of a face provides a strong prior for restoration. As such, we propose to incorporate global semantic priors as input and impose local structure losses to regularize the output within a multi-scale deep CNN. We train the network with perceptual and adversarial losses to generate photo-realistic results and develop an incremental training strategy to handle random blur kernels in the wild. Quantitative and qualitative evaluations demonstrate that the proposed face deblurring algorithm restores sharp images with more facial details and performs favorably against state-of-the-art methods in terms of restoration quality, face recognition and execution speed.

Book ChapterDOI
08 Sep 2018
TL;DR: Wang et al. as discussed by the authors proposed a new large-scale noise-controlled IMDb-Face dataset and analyzed label noise properties of MegaFace and MS-Celeb-1M datasets.
Abstract: The growing scale of face recognition datasets empowers us to train strong convolutional networks for face recognition. While a variety of architectures and loss functions have been devised, we still have a limited understanding of the source and consequence of label noise inherent in existing datasets. We make the following contributions: (1) We contribute cleaned subsets of popular face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new large-scale noise-controlled IMDb-Face dataset. (2) With the original datasets and cleaned subsets, we profile and analyze label noise properties of MegaFace and MS-Celeb-1M. We show that a few orders more samples are needed to achieve the same accuracy yielded by a clean subset. (3) We study the association between different types of noise, i.e., label flips and outliers, with the accuracy of face recognition models. (4) We investigate ways to improve data cleanliness, including a comprehensive user study on the influence of data labeling strategies to annotation accuracy. The IMDb-Face dataset has been released on https://github.com/fwang91/IMDb-Face.

Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this article, a generative adversarial network based approach is proposed to model the constraints for the intrinsic subject-specific characteristics and the age-specific facial changes with respect to the elapsed time, ensuring that the generated faces present desired aging effects while simultaneously keeping personalized properties stable.
Abstract: The two underlying requirements of face age progression, i.e. aging accuracy and identity permanence, are not well studied in the literature. In this paper, we present a novel generative adversarial network based approach. It separately models the constraints for the intrinsic subject-specific characteristics and the age-specific facial changes with respect to the elapsed time, ensuring that the generated faces present desired aging effects while simultaneously keeping personalized properties stable. Further, to generate more lifelike facial details, high-level age-specific features conveyed by the synthesized face are estimated by a pyramidal adversarial discriminator at multiple scales, which simulates the aging effects in a finer manner. The proposed method is applicable to diverse face samples in the presence of variations in pose, expression, makeup, etc., and remarkably vivid aging effects are achieved. Both visual fidelity and quantitative evaluations show that the approach advances the state-of-the-art.

Journal ArticleDOI
TL;DR: The results show that accurate individual pig recognition is possible with accuracy rates of 96.7% on 1553 images, and Class Activated Mapping using Grad-CAM is used to show the regions that the network uses to discriminate between pigs.

Journal ArticleDOI
TL;DR: A context-aware local binary multi-scale feature learning method for face recognition that exploits the contextual information of adjacent bits by constraining the number of shifts from different binary bits, so that more robust information can be exploited for face representation.
Abstract: In this paper, we propose a context-aware local binary feature learning (CA-LBFL) method for face recognition Unlike existing learning-based local face descriptors such as discriminant face descriptor (DFD) and compact binary face descriptor (CBFD) which learn each feature code individually, our CA-LBFL exploits the contextual information of adjacent bits by constraining the number of shifts from different binary bits, so that more robust information can be exploited for face representation Given a face image, we first extract pixel difference vectors (PDV) in local patches, and learn a discriminative mapping in an unsupervised manner to project each pixel difference vector into a context-aware binary vector Then, we perform clustering on the learned binary codes to construct a codebook, and extract a histogram feature for each face image with the learned codebook as the final representation In order to exploit local information from different scales, we propose a context-aware local binary multi-scale feature learning (CA-LBMFL) method to jointly learn multiple projection matrices for face representation To make the proposed methods applicable for heterogeneous face recognition, we present a coupled CA-LBFL (C-CA-LBFL) method and a coupled CA-LBMFL (C-CA-LBMFL) method to reduce the modality gap of corresponding heterogeneous faces in the feature level, respectively Extensive experimental results on four widely used face datasets clearly show that our methods outperform most state-of-the-art face descriptors

Journal ArticleDOI
TL;DR: An overview of deep-learning methods used for face recognition is provided and different modules involved in designing an automatic face recognition system are discussed and the role of deep learning for each of them is discussed.
Abstract: Recent developments in deep convolutional neural networks (DCNNs) have shown impressive performance improvements on various object detection/recognition problems. This has been made possible due to the availability of large annotated data and a better understanding of the nonlinear mapping between images and class labels, as well as the affordability of powerful graphics processing units (GPUs). These developments in deep learning have also improved the capabilities of machines in understanding faces and automatically executing the tasks of face detection, pose estimation, landmark localization, and face recognition from unconstrained images and videos. In this article, we provide an overview of deep-learning methods used for face recognition. We discuss different modules involved in designing an automatic face recognition system and the role of deep learning for each of them. Some open issues regarding DCNNs for face recognition problems are then discussed. This article should prove valuable to scientists, engineers, and end users working in the fields of face recognition, security, visual surveillance, and biometrics.

Proceedings ArticleDOI
Yibo Hu1, Xiang Wu, Bing Yu2, Ran He1, Zhenan Sun1 
01 Jun 2018
TL;DR: This work focuses on flexible face rotation of arbitrary head poses, including extreme profile views, with a novel Couple-Agent Pose-Guided Generative Adversarial Network (CAPG-GAN) to generate both neutral and profile head pose face images.
Abstract: Face rotation provides an effective and cheap way for data augmentation and representation learning of face recognition. It is a challenging generative learning problem due to the large pose discrepancy between two face images. This work focuses on flexible face rotation of arbitrary head poses, including extreme profile views. We propose a novel Couple-Agent Pose-Guided Generative Adversarial Network (CAPG-GAN) to generate both neutral and profile head pose face images. The head pose information is encoded by facial landmark heatmaps. It not only forms a mask image to guide the generator in learning process but also provides a flexible controllable condition during inference. A couple-agent discriminator is introduced to reinforce on the realism of synthetic arbitrary view faces. Besides the generator and conditional adversarial loss, CAPG-GAN further employs identity preserving loss and total variation regularization to preserve identity information and refine local textures respectively. Quantitative and qualitative experimental results on the Multi-PIE and LFW databases consistently show the superiority of our face rotation method over the state-of-the-art.

Journal ArticleDOI
TL;DR: A novel distance metric optimization driven learning approach that integrates these traditional steps via a deep convolutional neural network, which learns feature representations and the decision function in an end-to-end way, and can be optimized simultaneously by backward propagation.