Showing papers on "Facial recognition system published in 2021"

PDF

Open Access

Journal Article•DOI•

Recent Advances in Open Set Recognition: A Survey

[...]

Chuanxing Geng¹, Sheng-Jun Huang¹, Songcan Chen¹•Institutions (1)

Nanjing University of Aeronautics and Astronautics¹

01 Oct 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, evaluation criteria, and algorithm comparisons to highlight the limitations of existing approaches and point out some promising subsequent research directions.

...read moreread less

Abstract: In real-world recognition/classification tasks, limited by various objective factors, it is usually difficult to collect training samples to exhaust all classes when training a recognizer or classifier. A more realistic scenario is open set recognition (OSR), where incomplete knowledge of the world exists at training time, and unknown classes can be submitted to an algorithm during testing, requiring the classifiers to not only accurately classify the seen classes, but also effectively deal with unseen ones. This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, evaluation criteria, and algorithm comparisons. Furthermore, we briefly analyze the relationships between OSR and its related tasks including zero-shot, one-shot (few-shot) recognition/learning techniques, classification with reject option, and so forth. Additionally, we also review the open world recognition which can be seen as a natural extension of OSR. Importantly, we highlight the limitations of existing approaches and point out some promising subsequent research directions in this field.

...read moreread less

492 citations

Proceedings Article•DOI•

MagFace: A Universal Representation for Face Recognition and Quality Assessment

[...]

Qiang Meng, Shichao Zhao, Zhida Huang, Feng Zhou

11 Mar 2021

TL;DR: MagFace as discussed by the authors introduces an adaptive mechanism to learn a well-structured within-class feature distributions by pulling easy samples to class centers while pushing hard samples away, which prevents models from overfitting on noisy low-quality samples and improves face recognition in the wild.

...read moreread less

Abstract: The performance of face recognition system degrades when the variability of the acquired faces increases. Prior work alleviates this issue by either monitoring the face quality in pre-processing or predicting the data uncertainty along with the face feature. This paper proposes MagFace, a category of losses that learn a universal feature embedding whose magnitude can measure the quality of the given face. Under the new loss, it can be proven that the magnitude of the feature embedding monotonically increases if the subject is more likely to be recognized. In addition, Mag-Face introduces an adaptive mechanism to learn a well-structured within-class feature distributions by pulling easy samples to class centers while pushing hard samples away. This prevents models from overfitting on noisy low-quality samples and improves face recognition in the wild. Extensive experiments conducted on face recognition, quality assessments as well as clustering demonstrate its superiority over state-of-the-arts. The code is available at https://github.com/IrvingMeng/MagFace.

...read moreread less

268 citations

Journal Article•DOI•

Survey on Emotional Body Gesture Recognition

[...]

Fatemeh Noroozi¹, Ciprian A. Corneanu², Dorota Kamińska³, Tomasz Sapiński³, Sergio Escalera², Gholamreza Anbarjafari¹ - Show less +2 more•Institutions (3)

University of Tartu¹, University of Barcelona², Lodz University of Technology³

01 Apr 2021-IEEE Transactions on Affective Computing

TL;DR: In this paper, the authors present a comprehensive survey of body gesture recognition methods and discuss multi-modal approaches that combine speech or face with body gestures for improved emotion recognition, and define a complete framework for automatic emotional body gestures recognition.

...read moreread less

Abstract: Automatic emotion recognition has become a trending research topic in the past decade. While works based on facial expressions or speech abound, recognizing affect from body gestures remains a less explored topic. We present a new comprehensive survey hoping to boost research in the field. We first introduce emotional body gestures as a component of what is commonly known as ”body language” and comment general aspects as gender differences and culture dependence. We then define a complete framework for automatic emotional body gesture recognition. We introduce person detection and comment static and dynamic body pose estimation methods both in RGB and 3D. We then comment the recent literature related to representation learning and emotion recognition from images of emotionally expressive gestures. We also discuss multi-modal approaches that combine speech or face with body gestures for improved emotion recognition. While pre-processing methodologies (e.g., human detection and pose estimation) are nowadays mature technologies fully developed for robust large scale analysis, we show that for emotion recognition the quantity of labelled data is scarce. There is no agreement on clearly defined output spaces and the representations are shallow and largely based on naive geometrical representations.

...read moreread less

256 citations

Proceedings Article•DOI•

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

[...]

Weihao Xia¹, Yujiu Yang¹, Jing-Hao Xue², Baoyuan Wu³•Institutions (3)

Tsinghua University¹, University College London², The Chinese University of Hong Kong³

18 Aug 2021

TL;DR: TediGAN as discussed by the authors uses StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization to produce diverse and high-quality images with an unprecedented resolution at 10242.

...read moreread less

Abstract: In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions. The proposed method consists of three components: StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization. The inversion module maps real images to the latent space of a well-trained StyleGAN. The visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. The instancelevel optimization is for identity preservation in manipulation. Our model can produce diverse and high-quality images with an unprecedented resolution at 10242. Using a control mechanism based on style-mixing, our TediGAN inherently supports image synthesis with multi-modal inputs, such as sketches or semantic labels, with or without instance guidance. To facilitate text-guided multi-modal synthesis, we propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions. Extensive experiments on the introduced dataset demonstrate the superior performance of our proposed method. Code and data are available at https://github.com/weihaox/TediGAN.

...read moreread less

212 citations

Proceedings Article•DOI•

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition

[...]

Stephen Hausler¹, Sourav Garg¹, Ming Xu¹, Michael Milford¹, Tobias Fischer¹ - Show less +1 more•Institutions (1)

Queensland University of Technology¹

01 Jun 2021

TL;DR: Patch-NetVLAD as discussed by the authors combines the advantages of both local and global descriptor methods by deriving patch-level features from NetVLAD residuals, which enables aggregation and matching of deep-learned local features defined over the feature-space grid.

...read moreread less

Abstract: Visual Place Recognition is a challenging task for robotics and autonomous systems, which must deal with the twin problems of appearance and viewpoint change in an always changing world. This paper introduces Patch-NetVLAD, which provides a novel formulation for combining the advantages of both local and global descriptor methods by deriving patch-level features from NetVLAD residuals. Unlike the fixed spatial neighborhood regime of existing local keypoint features, our method enables aggregation and matching of deep-learned local features defined over the feature-space grid. We further introduce a multi-scale fusion of patch features that have complementary scales (i.e. patch sizes) via an integral feature space and show that the fused features are highly invariant to both condition (season, structure, and illumination) and viewpoint (translation and rotation) changes. Patch-NetVLAD achieves state-of-the-art visual place recognition results in computationally limited scenarios, validated on a range of challenging real-world datasets, including winning the Facebook Mapillary Visual Place Recognition Challenge at ECCV2020. It is also adaptable to user requirements, with a speed-optimised version operating over an order of magnitude faster than the state-of-the-art. By combining superior performance with improved computational efficiency in a configurable framework, Patch-NetVLAD is well suited to enhance both stand-alone place recognition capabilities and the overall performance of SLAM systems.

...read moreread less

199 citations

Journal Article•DOI•

SSDMNV2: A real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2.

[...]

Preeti Nagrath¹, Rachna Jain¹, Agam Madan¹, Rohan Arora¹, Piyush Kataria¹, Jude Hemanth² - Show less +2 more•Institutions (2)

Bharati Vidyapeeth's College of Engineering¹, Karunya University²

01 Mar 2021-Sustainable Cities and Society

TL;DR: In this article, the authors proposed an approach using deep learning, TensorFlow, Keras, and OpenCV to detect face masks using Single Shot Multibox Detector as a face detector and MobilenetV2 architecture as a framework for the classifier.

...read moreread less

193 citations

Proceedings Article•DOI•

Facial Expression Recognition in the Wild via Deep Attentive Center Loss

[...]

Amir Hossein Farzaneh¹, Xiaojun Qi¹•Institutions (1)

Utah State University¹

01 Jan 2021

TL;DR: In this article, the authors proposed a Deep Attentive Center Loss (DACL) method to adaptively select a subset of significant feature elements for enhanced discrimination, which integrates an attention mechanism to estimate attention weights correlated with feature importance.

...read moreread less

Abstract: Learning discriminative features for Facial Expression Recognition (FER) in the wild using Convolutional Neural Networks (CNNs) is a non-trivial task due to the significant intra-class variations and inter-class similarities. Deep Metric Learning (DML) approaches such as center loss and its variants jointly optimized with softmax loss have been adopted in many FER methods to enhance the discriminative power of learned features in the embedding space. However, equally supervising all features with the metric learning method might include irrelevant features and ultimately degrade the generalization ability of the learning algorithm. We propose a Deep Attentive Center Loss (DACL) method to adaptively select a subset of significant feature elements for enhanced discrimination. The proposed DACL integrates an attention mechanism to estimate attention weights correlated with feature importance using the intermediate spatial feature maps in CNN as context. The estimated weights accommodate the sparse formulation of center loss to selectively achieve intra-class compactness and inter-class separation for the relevant information in the embedding space. An extensive study on two widely used wild FER datasets demonstrates the superiority of the proposed DACL method compared to state-of-the-art methods.

...read moreread less

137 citations

Journal Article•DOI•

ArcFace: Additive Angular Margin Loss for Deep Face Recognition.

[...]

Jiankang Deng¹, Jia Guo, Jing Yang², Niannan Xue¹, Irene Cotsia³, Stefanos Zafeiriou¹ - Show less +2 more•Institutions (3)

Imperial College London¹, University of Nottingham², University of London³

09 Jun 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as mentioned in this paper proposed an additive angular margin loss (ArcFace), which not only has a clear geometric interpretation, but also significantly enhances the discriminative power.

...read moreread less

Abstract: Recently, a popular line of research in face recognition is adopting margins in the well-established softmax loss function to maximize class separability. In this paper, we first introduce an Additive Angular Margin Loss (ArcFace), which not only has a clear geometric interpretation but also significantly enhances the discriminative power. Since ArcFace is susceptible to the massive label noise, we further propose sub-center ArcFace, in which each class contains K sub-centers and training samples only need to be close to any of the $K$ positive sub-centers. Sub-center ArcFace encourages one dominant sub-class that contains the majority of clean faces and non-dominant sub-classes that include hard or noisy faces. Based on this self-propelled isolation, we boost the performance through automatically purifying raw web faces under massive real-world noise. Besides discriminative feature embedding, we also explore the inverse problem, mapping feature vectors to face images. Without training any additional generator or discriminator, the pre-trained ArcFace model can generate identity-preserved face images for both subjects inside and outside the training data only by using the network gradient and Batch Normalization (BN) priors. Extensive experiments demonstrate that ArcFace can enhance the discriminative feature embedding as well as strengthen the generative face synthesis.

...read moreread less

136 citations

Proceedings Article•DOI•

Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection

[...]

Jiaming Li¹, Hongtao Xie¹, Jiahong Li, Zhongyuan Wang, Yongdong Zhang¹ - Show less +1 more•Institutions (1)

University of Science and Technology of China¹

16 Mar 2021

TL;DR: Wang et al. as discussed by the authors proposed a frequency-aware discriminative feature learning framework, which only compresses intra-class variations of natural faces while boosting inter-class differences in the embedding space.

...read moreread less

Abstract: Face forgery detection is raising ever-increasing interest in computer vision since facial manipulation technologies cause serious worries. Though recent works have reached sound achievements, there are still unignorable problems: a) learned features supervised by softmax loss are separable but not discriminative enough, since softmax loss does not explicitly encourage intra-class compactness and inter-class separability; and b) fixed filter banks and hand-crafted features are insufficient to capture forgery patterns of frequency from diverse inputs. To compensate for such limitations, a novel frequency-aware discriminative feature learning framework is proposed in this paper. Specifically, we design a novel single-center loss (SCL) that only compresses intra-class variations of natural faces while boosting inter-class differences in the embedding space. In such a case, the network can learn more discriminative features with less optimization difficulty. Besides, an adaptive frequency feature generation module is developed to mine frequency clues in a completely data-driven fashion. With the above two modules, the whole framework can learn more discriminative features in an end-to-end manner. Extensive experiments demonstrate the effectiveness and superiority of our framework on three versions of the FF++ dataset.

...read moreread less

111 citations

Journal Article•DOI•

Face Mask Wearing Detection Algorithm Based on Improved YOLO-v4.

[...]

Jimin Yu¹, Wei Zhang¹•Institutions (1)

Chongqing University¹

08 May 2021-Sensors

TL;DR: In this article, an improved CSPDarkNet53 is introduced into the trunk feature extraction network, which reduces the computing cost of the network and improves the learning ability of the model.

...read moreread less

Abstract: To solve the problems of low accuracy, low real-time performance, poor robustness and others caused by the complex environment, this paper proposes a face mask recognition and standard wear detection algorithm based on the improved YOLO-v4. Firstly, an improved CSPDarkNet53 is introduced into the trunk feature extraction network, which reduces the computing cost of the network and improves the learning ability of the model. Secondly, the adaptive image scaling algorithm can reduce computation and redundancy effectively. Thirdly, the improved PANet structure is introduced so that the network has more semantic information in the feature layer. At last, a face mask detection data set is made according to the standard wearing of masks. Based on the object detection algorithm of deep learning, a variety of evaluation indexes are compared to evaluate the effectiveness of the model. The results of the comparations show that the mAP of face mask recognition can reach 98.3% and the frame rate is high at 54.57 FPS, which are more accurate compared with the exiting algorithm.

...read moreread less

111 citations

Journal Article•DOI•

2D-human face recognition using SIFT and SURF descriptors of face’s feature regions

[...]

Surbhi Gupta¹, Kutub Thakur², Munish Kumar³•Institutions (3)

Gokaraju Rangaraju Institute of Engineering and Technology¹, New Jersey City University², Punjab Technical University³

01 Mar 2021-The Visual Computer

TL;DR: The authors have presented the feature-based method for 2D face images, which uses speeded up robust features (SURF) and scale-invariant feature transform (SIFT) for feature extraction and has a maximum recognition accuracy of 99.7%.

...read moreread less

Abstract: Face recognition is the process of identifying people through facial images. It has become vital for security and surveillance applications and required everywhere including institutions, organizations, offices, and social places. There are a number of challenges faced in face recognition which includes face pose, age, gender, illumination, and other variable condition. Another challenge is that the database size for these applications is usually small. So, training and recognition become difficult. Face recognition methods can be divided into two major categories, appearance-based method and feature-based method. In this paper, the authors have presented the feature-based method for 2D face images. speeded up robust features (SURF) and scale-invariant feature transform (SIFT) are used for feature extraction. Five public datasets, namely Yale2B, Face 94, M2VTS, ORL, and FERET, are used for experimental work. Various combinations of SIFT and SURF features with two classification techniques, namely decision tree and random forest, have experimented in this work. A maximum recognition accuracy of 99.7% has been reported by the authors with a combination of SIFT (64-components) and SURF (32-components).

...read moreread less

Proceedings Article•DOI•

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

[...]

Zheng Zhu¹, Guan Huang, Jiankang Deng², Yun Ye, Junjie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Jiwen Lu¹, Dalong Du, Jie Zhou¹ - Show less +7 more•Institutions (2)

Tsinghua University¹, Imperial College London²

06 Mar 2021

TL;DR: Wang et al. as discussed by the authors proposed a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2m identities/42M faces(WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol.

...read moreread less

Abstract: In this paper, we contribute a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol. Firstly, we collect 4M name list and download 260M faces from the Internet. Then, a Cleaning Automatically utilizing Self-Training (CAST) pipeline is devised to purify the tremendous WebFace260M, which is efficient and scalable. To the best of our knowledge, the cleaned WebFace42M is the largest public face recognition training set and we expect to close the data gap between academia and industry. Referring to practical scenarios, Face Recognition Under Inference Time conStraint (FRUITS) protocol and a test set are constructed to comprehensively evaluate face matchers.Equipped with this benchmark, we delve into million-scale face recognition problems. A distributed framework is developed to train face recognition models efficiently without tampering with the performance. Empowered by Web-Face42M, we reduce relative 40% failure rate on the challenging IJB-C set, and rank the 3rd among 430 entries on NIST-FRVT. Even 10% data (WebFace4M) shows superior performance compared with public training set. Furthermore, comprehensive baselines are established on our rich-attribute test set under FRUITS-100ms/500ms/1000ms protocol, including MobileNet, EfficientNet, AttentionNet, ResNet, SENet, ResNeXt and RegNet families. Benchmark website is https://www.face-benchmark.org.

...read moreread less

Posted Content•DOI•

Efficient masked face recognition method during the COVID-19 pandemic

[...]

Walid Hariri¹•Institutions (1)

University of Annaba¹

15 Nov 2021-Signal, Image and Video Processing

TL;DR: This paper proposes a reliable method based on discard masked region and deep learning based features in order to address the problem of masked face recognition process and results show high recognition performance.

...read moreread less

Abstract: The coronavirus disease (COVID-19) is an unparalleled crisis leading to a huge number of casualties and security problems. In order to reduce the spread of coronavirus, people often wear masks to protect themselves. This makes face recognition a very difficult task since certain parts of the face are hidden. A primary focus of researchers during the ongoing coronavirus pandemic is to come up with suggestions to handle this problem through rapid and efficient solutions. In this paper, we propose a reliable method based on occlusion removal and deep learning-based features in order to address the problem of the masked face recognition process. The first step is to remove the masked face region. Next, we apply three pre-trained deep Convolutional Neural Networks (CNN), namely VGG-16, AlexNet, and ResNet-50, and use them to extract deep features from the obtained regions (mostly eyes and forehead regions). The Bag-of-features paradigm is then applied to the feature maps of the last convolutional layer in order to quantize them and to get a slight representation comparing to the fully connected layer of classical CNN. Finally, Multilayer Perceptron (MLP) is applied for the classification process. Experimental results on Real-World-Masked-Face-Dataset show high recognition performance compared to other state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset

[...]

Zhimeng Zhang, Lincheng Li, Yu Ding, Changjie Fan

01 Jun 2021

TL;DR: In this paper, a large in-the-wild high-resolution audio-visual dataset is built and a novel flow-guided talking face generation framework is proposed to synthesize high-definition videos.

...read moreread less

Abstract: One-shot talking face generation should synthesize high visual quality facial videos with reasonable animations of expression and head pose, and just utilize arbitrary driving audio and arbitrary single face image as the source. Current works fail to generate over 256×256 resolution realistic-looking videos due to the lack of an appropriate high-resolution audio-visual dataset, and the limitation of the sparse facial landmarks in providing poor expression details. To synthesize high-definition videos, we build a large in-the-wild high-resolution audio-visual dataset and propose a novel flow-guided talking face generation framework. The new dataset is collected from youtube and consists of about 16 hours 720P or 1080P videos. We leverage the facial 3D morphable model (3DMM) to split the framework into two cascaded modules instead of learning a direct mapping from audio to video. In the first module, we propose a novel animation generator to produce the movements of mouth, eyebrow and head pose simultaneously. In the second module, we transform animation into dense flow to provide more expression details and carefully design a novel flow-guided video generator to synthesize videos. Our method is able to produce high-definition videos and outperforms state-of-the-art works in objective and subjective comparisons*.

...read moreread less

Proceedings Article•DOI•

Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition

[...]

Jiahui She¹, Yibo Hu, Hailin Shi, Jun Wang, Qiu Shen¹, Tao Mei¹ - Show less +2 more•Institutions (1)

Nanjing University¹

01 Apr 2021

TL;DR: DMUE as mentioned in this paper proposes an auxiliary multi-branch learning framework to better mine and describe the latent distribution in the label space, and the pairwise relationship of semantic feature between instances is fully exploited to estimate the ambiguity extent in the instance space.

...read moreread less

Abstract: Due to the subjective annotation and the inherent interclass similarity of facial expressions, one of key challenges in Facial Expression Recognition (FER) is the annotation ambiguity. In this paper, we proposes a solution, named DMUE, to address the problem of annotation ambiguity from two perspectives: the latent Distribution Mining and the pairwise Uncertainty Estimation. For the former, an auxiliary multi-branch learning framework is introduced to better mine and describe the latent distribution in the label space. For the latter, the pairwise relationship of semantic feature between instances are fully exploited to estimate the ambiguity extent in the instance space. The proposed method is independent to the backbone architectures, and brings no extra burden for inference. The experiments are conducted on the popular real-world benchmarks and the synthetic noisy datasets. Either way, the proposed DMUE stably achieves leading performance.

...read moreread less

Journal Article•DOI•

Video-based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms

[...]

Xianye Ben¹, Yi Ren¹, Junping Zhang², Su-Jing Wang³, Kidiyo Kpalma⁴, Weixiao Meng⁵, Yong-Jin Liu⁶ - Show less +3 more•Institutions (6)

Shandong University¹, Fudan University², Chinese Academy of Sciences³, Intelligence and National Security Alliance⁴, Harbin Institute of Technology⁵, Tsinghua University⁶

19 Mar 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as discussed by the authors presented a survey of micro-expression analysis in a cascaded structure, including neuropsychological basis, datasets, features, detection/spotting algorithms, recognition algorithms, applications and evaluation of state of the arts.

...read moreread less

Abstract: Different from the conventional facial expression, micro-expression is an involuntary and transient facial expression, which can reveal a genuine emotion that people attempt to hide. The detection and recognition of micro-expressions are difficult and heavily rely on expert experiences, since micro-expressions are transient and of low intensity. Due to its intrinsic particularity and complexity, micro-expression analysis is attractive but challenging, and recently becomes an active area of research. Although there are many developments in this area, a comprehensive survey that can help researchers to systematically review them is still lacking. In this survey paper, we highlight the key differences between macro- and micro-expressions, and use these differences to guide the research survey of micro-expression analysis in a cascaded structure, including neuropsychological basis, datasets, features, detection/spotting algorithms, recognition algorithms, applications and evaluation of state of the arts. In each aspect, basic techniques, advanced developments and major challenges are addressed and discussed. Furthermore, by considering the limitations in existing micro-expression datasets, we present and release a new dataset called MMEW that has more video samples and more labeled emotion types, and perform a unified comparison of representative recognition methods on MMEW. Finally, some potential research directions are explored and outlined.

...read moreread less

Journal Article•DOI•

Cropping and attention based approach for masked face recognition

[...]

Yande Li¹, Kun Guo¹, Yonggang Lu¹, Li Liu²•Institutions (2)

Lanzhou University¹, Chongqing University²

01 Feb 2021-Applied Intelligence

TL;DR: Wang et al. as mentioned in this paper proposed a new method for masked face recognition by integrating a cropping-based approach with the Convolutional Block Attention Module (CBAM), where the optimal cropping is explored for each case, while the CBAM module is adopted to focus on the regions around eyes.

...read moreread less

Abstract: The global epidemic of COVID-19 makes people realize that wearing a mask is one of the most effective ways to protect ourselves from virus infections, which poses serious challenges for the existing face recognition system. To tackle the difficulties, a new method for masked face recognition is proposed by integrating a cropping-based approach with the Convolutional Block Attention Module (CBAM). The optimal cropping is explored for each case, while the CBAM module is adopted to focus on the regions around eyes. Two special application scenarios, using faces without mask for training to recognize masked faces, and using masked faces for training to recognize faces without mask, have also been studied. Comprehensive experiments on SMFRD, CISIA-Webface, AR and Extend Yela B datasets show that the proposed approach can significantly improve the performance of masked face recognition compared with other state-of-the-art approaches.

...read moreread less

Proceedings Article•DOI•

One Shot Face Swapping on Megapixels

[...]

Yuhao Zhu, Qi Li, Jian Wang, Chengzhong Xu¹, Zhenan Sun - Show less +1 more•Institutions (1)

University of Macau¹

11 May 2021

TL;DR: Zhang et al. as mentioned in this paper proposed the first megapixel level method for one shot face swapping, which organizes face representation hierarchically by the proposed Hierarchical Representation Face Encoder (HieRFE) in an extended latent space to maintain more facial details, rather than compressed representation in previous face swapping methods.

...read moreread less

Abstract: Face swapping has both positive applications such as entertainment, human-computer interaction, etc., and negative applications such as DeepFake threats to politics, economics, etc. Nevertheless, it is necessary to understand the scheme of advanced methods for high-quality face swapping and generate enough and representative face swapping images to train DeepFake detection algorithms. This paper proposes the first Megapixel level method for one shot Face Swapping (or MegaFS for short). Firstly, MegaFS organizes face representation hierarchically by the proposed Hierarchical Representation Face Encoder (HieRFE) in an extended latent space to maintain more facial details, rather than compressed representation in previous face swapping methods. Secondly, a carefully designed Face Transfer Module (FTM) is proposed to transfer the identity from a source image to the target by a non-linear trajectory without explicit feature disentanglement. Finally, the swapped faces can be synthesized by StyleGAN2 with the benefits of its training stability and powerful generative capability. Each part of MegaFS can be trained separately so the requirement of our model for GPU memory can be satisfied for megapixel face swapping. In summary, complete face representation, stable training, and limited memory usage are the three novel contributions to the success of our method. Extensive experiments demonstrate the superiority of MegaFS and the first megapixel level face swapping database is released for research on DeepFake detection and face image editing in the public domain.

...read moreread less

Proceedings Article•DOI•

Towards Real-World Blind Face Restoration with Generative Facial Prior

[...]

Xintao Wang¹, Yu Li¹, Honglun Zhang¹, Ying Shan¹•Institutions (1)

Tencent¹

20 Jun 2021

TL;DR: GFP-GAN as discussed by the authors incorporates a generative facial prior into the face restoration process via spatial feature transform layers, which allows the method to achieve a good balance of realness and fidelity.

...read moreread less

Abstract: Blind face restoration usually relies on facial priors, such as facial geometry prior or reference prior, to restore realistic and faithful details. However, very low-quality inputs cannot offer accurate geometric prior while high-quality references are inaccessible, limiting the applicability in real-world scenarios. In this work, we propose GFP-GAN that leverages rich and diverse priors encapsulated in a pretrained face GAN for blind face restoration. This Generative Facial Prior (GFP) is incorporated into the face restoration process via spatial feature transform layers, which allow our method to achieve a good balance of realness and fidelity. Thanks to the powerful generative facial prior and delicate designs, our GFP-GAN could jointly restore facial details and enhance colors with just a single forward pass, while GAN inversion methods require image-specific optimization at inference. Extensive experiments show that our method achieves superior performance to prior art on both synthetic and real-world datasets.

...read moreread less

Proceedings Article•DOI•

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

[...]

Delian Ruan¹, Yan Yan¹, Shenqi Lai, Zhenhua Chai, Chunhua Shen², Hanzi Wang¹ - Show less +2 more•Institutions (2)

Xiamen University¹, University of Adelaide²

20 Jun 2021

TL;DR: Wang et al. as mentioned in this paper proposed a Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition, which consists of two crucial networks: a Feature decomposition Network (FDN) and a Feature Reconstruction Network (FRN).

...read moreread less

Abstract: In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for la-tent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

...read moreread less

Journal Article•DOI•

Unsupervised Adversarial Domain Adaptation for Cross-Domain Face Presentation Attack Detection

[...]

Guoqing Wang¹, Hu Han¹, Shiguang Shan¹, Xilin Chen¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2021-IEEE Transactions on Information Forensics and Security

TL;DR: This work proposes an unsupervised domain adaptation with disentangled representation (DR-UDA) approach to improve the generalization capability of PAD into new scenarios and shows promisinggeneralization capability in several public-domain face PAD databases.

...read moreread less

Abstract: Face presentation attack detection (PAD) is essential for securing the widely used face recognition systems. Most of the existing PAD methods do not generalize well to unseen scenarios because labeled training data of the new domain is usually not available. In light of this, we propose an unsupervised domain adaptation with disentangled representation (DR-UDA) approach to improve the generalization capability of PAD into new scenarios. DR-UDA consists of three modules, i.e., ML-Net, UDA-Net and DR-Net. ML-Net aims to learn a discriminative feature representation using the labeled source domain face images via metric learning. UDA-Net performs unsupervised adversarial domain adaptation in order to optimize the source domain and target domain encoders jointly, and obtain a common feature space shared by both domains. As a result, the source domain PAD model can be effectively transferred to the unlabeled target domain for PAD. DR-Net further disentangles the features irrelevant to specific domains by reconstructing the source and target domain face images from the common feature space. Therefore, DR-UDA can learn a disentangled representation space which is generative for face images in both domains and discriminative for live vs. spoof classification. The proposed approach shows promising generalization capability in several public-domain face PAD databases.

...read moreread less

Journal Article•DOI•

Facial Expression Recognition Using Local Gravitational Force Descriptor-Based Deep Convolution Neural Networks

[...]

Karnati Mohan¹, Ayan Seal¹, Ondrej Krejcar², Anis Yazidi³•Institutions (3)

Indian Institute of Information Technology, Design and Manufacturing, Jabalpur¹, University of Hradec Králové², University of Oslo³

01 Jan 2021-IEEE Transactions on Instrumentation and Measurement

TL;DR: In this article, a deep learning-based scheme is proposed for identifying the facial expression of a person, which consists of two parts: the former one finds out local features from face images using a local gravitational force descriptor, while, in the latter part, the descriptor is fed into a novel deep convolution neural network (DCNN) model.

...read moreread less

Abstract: An image is worth a thousand words; hence, a face image illustrates extensive details about the specification, gender, age, and emotional states of mind. Facial expressions play an important role in community-based interactions and are often used in the behavioral analysis of emotions. Recognition of automatic facial expressions from a facial image is a challenging task in the computer vision community and admits a large set of applications, such as driver safety, human–computer interactions, health care, behavioral science, video conferencing, cognitive science, and others. In this work, a deep-learning-based scheme is proposed for identifying the facial expression of a person. The proposed method consists of two parts. The former one finds out local features from face images using a local gravitational force descriptor, while, in the latter part, the descriptor is fed into a novel deep convolution neural network (DCNN) model. The proposed DCNN has two branches. The first branch explores geometric features, such as edges, curves, and lines, whereas holistic features are extracted by the second branch. Finally, the score-level fusion technique is adopted to compute the final classification score. The proposed method along with 25 state-of-the-art methods is implemented on five benchmark available databases, namely, Facial Expression Recognition 2013, Japanese Female Facial Expressions, Extended CohnKanade, Karolinska Directed Emotional Faces, and Real-world Affective Faces. The databases consist of seven basic emotions: neutral, happiness, anger, sadness, fear, disgust, and surprise. The proposed method is compared with existing approaches using four evaluation metrics, namely, accuracy, precision, recall, and f1-score. The obtained results demonstrate that the proposed method outperforms all state-of-the-art methods on all the databases.

...read moreread less

Journal Article•DOI•

MIPGAN—Generating Strong and High Quality Morphing Attacks Using Identity Prior Driven GAN

[...]

Haoyu Zhang¹, Sushma Venkatesh¹, Raghavendra Ramachandra¹, Kiran B. Raja¹, Naser Damer², Christoph Busch¹ - Show less +2 more•Institutions (2)

Norwegian University of Science and Technology¹, Fraunhofer Society²

14 Apr 2021

TL;DR: The proposed MIPGAN is derived from the StyleGAN with a newly formulated loss function exploiting perceptual quality and identity factor to generate a high quality morphed facial image with minimal artefacts and with high resolution.

...read moreread less

Abstract: Face morphing attacks target to circumvent Face Recognition Systems (FRS) by employing face images derived from multiple data subjects (e.g., accomplices and malicious actors). Morphed images can be verified against contributing data subjects with a reasonable success rate, given they have a high degree of facial resemblance. The success of morphing attacks is directly dependent on the quality of the generated morph images. We present a new approach for generating strong attacks extending our earlier framework for generating face morphs. We present a new approach using an Identity Prior Driven Generative Adversarial Network, which we refer to as MIPGAN (Morphing through Identity Prior driven GAN) . The proposed MIPGAN is derived from the StyleGAN with a newly formulated loss function exploiting perceptual quality and identity factor to generate a high quality morphed facial image with minimal artefacts and with high resolution. We demonstrate the proposed approach’s applicability to generate strong morphing attacks by evaluating its vulnerability against both commercial and deep learning based Face Recognition System (FRS) and demonstrate the success rate of attacks. Extensive experiments are carried out to assess the FRS’s vulnerability against the proposed morphed face generation technique on three types of data such as digital images, re-digitized (printed and scanned) images, and compressed images after re-digitization from newly generated MIPGAN Face Morph Dataset . The obtained results demonstrate that the proposed approach of morph generation poses a high threat to FRS.

...read moreread less

Journal Article•DOI•

Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild

[...]

Zengqun Zhao¹, Qingshan Liu¹, Shanmin Wang²•Institutions (2)

Nanjing University of Information Science and Technology¹, Nanjing University of Aeronautics and Astronautics²

05 Jul 2021-IEEE Transactions on Image Processing

TL;DR: Zeng et al. as mentioned in this paper proposed a global multi-scale and local attention network (MA-Net) for facial expression recognition in the wild, which consists of three main components: a feature pre-extractor, a multiscale module, and a local attention module.

...read moreread less

Abstract: Facial expression recognition (FER) in the wild received broad concerns in which occlusion and pose variation are two key issues. This paper proposed a global multi-scale and local attention network (MA-Net) for FER in the wild. Specifically, the proposed network consists of three main components: a feature pre-extractor, a multi-scale module, and a local attention module. The feature pre-extractor is utilized to pre-extract middle-level features, the multi-scale module to fuse features with different receptive fields, which reduces the susceptibility of deeper convolution towards occlusion and variant pose, while the local attention module can guide the network to focus on local salient features, which releases the interference of occlusion and non-frontal pose problems on FER in the wild. Extensive experiments demonstrate that the proposed MA-Net achieves the state-of-the-art results on several in-the-wild FER benchmarks: CAER-S, AffectNet-7, AffectNet-8, RAFDB, and SFEW with accuracies of 88.42%, 64.53%, 60.29%, 88.40%, and 59.40% respectively. The codes and training logs are publicly available at https://github.com/zengqunzhao/MA-Net .

...read moreread less

Journal Article•DOI•

Learning Spatial Attention for Face Super-Resolution

[...]

Chaofeng Chen¹, Dihong Gong², Hao Wang², Zhifeng Li², Kwan-Yee K. Wong¹ - Show less +1 more•Institutions (2)

University of Hong Kong¹, Tencent²

01 Jan 2021-IEEE Transactions on Image Processing

TL;DR: SPARNet as mentioned in this paper introduces a spatial attention mechanism to the vanilla residual blocks to adaptively bootstrap features related to the key face structures and pay less attention to those less feature-rich regions.

...read moreread less

Abstract: General image super-resolution techniques have difficulties in recovering detailed face structures when applying to low resolution face images. Recent deep learning based methods tailored for face images have achieved improved performance by jointly trained with additional task such as face parsing and landmark prediction. However, multi-task learning requires extra manually labeled data. Besides, most of the existing works can only generate relatively low resolution face images ( e.g ., $128\times 128$ ), and their applications are therefore limited. In this paper, we introduce a novel SPatial Attention Residual Network (SPARNet) built on our newly proposed Face Attention Units (FAUs) for face super-resolution. Specifically, we introduce a spatial attention mechanism to the vanilla residual blocks. This enables the convolutional layers to adaptively bootstrap features related to the key face structures and pay less attention to those less feature-rich regions. This makes the training more effective and efficient as the key face structures only account for a very small portion of the face image. Visualization of the attention maps shows that our spatial attention network can capture the key face structures well even for very low resolution faces ( e.g ., $16\times 16$ ). Quantitative comparisons on various kinds of metrics (including PSNR, SSIM, identity similarity, and landmark detection) demonstrate the superiority of our method over current state-of-the-arts. We further extend SPARNet with multi-scale discriminators, named as SPARNetHD, to produce high resolution results ( i.e ., $512\times 512$ ). We show that SPARNetHD trained with synthetic data can not only produce high quality and high resolution outputs for synthetically degraded face images, but also show good generalization ability to real world low quality face images. Codes are available at https://github.com/chaofengc/Face-SPARNet .

...read moreread less

Journal Article•DOI•

Accuracy Comparison Across Face Recognition Algorithms: Where Are We on Measuring Race Bias?

[...]

Jacqueline G. Cavazos¹, P. Jonathon Phillips², Carlos D. Castillo³, Alice J. O'Toole¹•Institutions (3)

University of Texas at Dallas¹, National Institute of Standards and Technology², Johns Hopkins University³

01 Jan 2021

TL;DR: In this paper, the authors present the possible underlying factors (data-driven and scenario modeling) and methodological considerations for assessing race bias in face recognition algorithms and conclude that race bias needs to be measured for individual applications and provide a checklist for measuring this bias.

...read moreread less

Abstract: Previous generations of face recognition algorithms differ in accuracy for images of different races (race bias). Here, we present the possible underlying factors (data-driven and scenario modeling) and methodological considerations for assessing race bias in algorithms. We discuss data-driven factors (e.g., image quality, image population statistics, and algorithm architecture), and scenario modeling factors that consider the role of the “user” of the algorithm (e.g., threshold decisions and demographic constraints). To illustrate how these issues apply, we present data from four face recognition algorithms (a previous-generation algorithm and three deep convolutional neural networks, DCNNs) for East Asian and Caucasian faces. First, dataset difficulty affected both overall recognition accuracy and race bias, such that race bias increased with item difficulty. Second, for all four algorithms, the degree of bias varied depending on the identification decision threshold. To achieve equal false accept rates (FARs), East Asian faces required higher identification thresholds than Caucasian faces, for all algorithms. Third, demographic constraints on the formulation of the distributions used in the test, impacted estimates of algorithm accuracy. We conclude that race bias needs to be measured for individual applications and we provide a checklist for measuring this bias in face recognition algorithms.

...read moreread less

Journal Article•DOI•

Facial Expression Recognition with Identity and Emotion Joint Learning

[...]

Ming Li¹, Hao Xu², Xingchang Huang², Zhanmei Song, Xiaolin Liu, Xin Li - Show less +2 more•Institutions (2)

Duke University¹, Sun Yat-sen University²

01 Apr 2021-IEEE Transactions on Affective Computing

TL;DR: An identity and emotion joint learning approach with deep convolutional neural networks (CNNs) to enhance the performance of facial expression recognition (FER) tasks and outperforms the residual network baseline as well as many other state-of-the-art methods.

...read moreread less

Abstract: Different subjects may express a specific expression in different ways due to inter-subject variabilities. In this work, besides training deep-learned facial expression feature (emotional feature), we also consider the influence of latent face identity feature such as the shape or appearance of face. We propose an identity and emotion joint learning approach with deep convolutional neural networks (CNNs) to enhance the performance of facial expression recognition (FER) tasks. First, we learn the emotion and identity features separately using two different CNNs with their corresponding training data. Second, we concatenate these two features together as a deep-learned Tandem Facial Expression (TFE) Feature and feed it to the subsequent fully connected layers to form a new model. Finally, we perform joint learning on the newly merged network using only the facial expression training data. Experimental results show that our proposed approach achieves 99.31 and 84.29 percent accuracy on the CK+ and the FER+ database, respectively, which outperforms the residual network baseline as well as many other state-of-the-art methods.

...read moreread less

Journal Article•DOI•

SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition

[...]

Yaoyao Zhong¹, Weihong Deng¹, Jiani Hu¹, Dongyue Zhao, Xian Li, Dongchao Wen - Show less +2 more•Institutions (1)

Beijing University of Posts and Telecommunications¹

08 Jan 2021-IEEE Transactions on Image Processing

TL;DR: Wang et al. as mentioned in this paper proposed a loss function named sigmoid-constrained hypersphere loss (SFace), which can make a better balance between decreasing the intra-class distances for clean examples and preventing overfitting to label noise.

...read moreread less

Abstract: Deep face recognition has achieved great success due to large-scale training databases and rapidly developing loss functions. The existing algorithms devote to realizing an ideal idea: minimizing the intra-class distance and maximizing the inter-class distance. However, they may neglect that there are also low quality training images which should not be optimized in this strict way. Considering the imperfection of training databases, we propose that intra-class and inter-class objectives can be optimized in a moderate way to mitigate overfitting problem, and further propose a novel loss function, named sigmoid-constrained hypersphere loss (SFace). Specifically, SFace imposes intra-class and inter-class constraints on a hypersphere manifold, which are controlled by two sigmoid gradient re-scale functions respectively. The sigmoid curves precisely re-scale the intra-class and inter-class gradients so that training samples can be optimized to some degree. Therefore, SFace can make a better balance between decreasing the intra-class distances for clean examples and preventing overfitting to the label noise, and contributes more robust deep face recognition models. Extensive experiments of models trained on CASIA-WebFace, VGGFace2, and MS-Celeb-1M databases, and evaluated on several face recognition benchmarks, such as LFW, MegaFace and IJB-C databases, have demonstrated the superiority of SFace.

...read moreread less

Proceedings Article•DOI•

Facial expression and attributes recognition based on multi-task learning of lightweight neural networks

[...]

Andrey V. Savchenko¹•Institutions (1)

National Research University – Higher School of Economics¹

16 Sep 2021

TL;DR: In this article, the multi-task learning of lightweight convolutional neural networks is studied for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins.

...read moreread less

Abstract: In this paper, the multi-task learning of lightweight convolutional neural networks is studied for face identification and classification of facial attributes (age, gender, ethnicity) trained on cropped faces without margins. The necessity to fine-tune these networks to predict facial expressions is highlighted. Several models are presented based on lightweight architectures, such as MobileNet, EfficientNet and RexNet. It was experimentally demonstrated that they lead to near state-of-the-art results in age, gender and race recognition on the UTKFace dataset and emotion classification on the AffectNet dataset. Moreover, it is shown that the usage of the trained models as feature extractors of facial regions in video frames leads to 4.5% higher accuracy than the previously known state-of-the-art single models for the AFEW and the VGAF datasets from the EmotiW challenges.

...read moreread less

Proceedings Article•DOI•

Facial Expression Recognition Using Residual Masking Network

[...]

Luan Pham¹, Tuan Anh Tran¹•Institutions (1)

Vietnam National University, Ho Chi Minh City¹

10 Jan 2021

TL;DR: Wang et al. as discussed by the authors combined the ubiquitous Deep Residual Network and Unet-like architecture to produce a residual masking network, which holds state-of-the-art accuracy on the well-known FER2013 and private VEMO datasets.

...read moreread less

Abstract: Automatic facial expression recognition (FER) has gained much attention due to its applications in human-computer interaction. Among the approaches to improve FER tasks, this paper focuses on deep architecture with the attention mechanism. We propose a novel Masking Idea to boost the performance of CNN in facial expression task. It uses a segmentation network to refine feature maps, enabling the network to focus on relevant information to make correct decisions. In experiments, we combine the ubiquitous Deep Residual Network and Unet-like architecture to produce a Residual Masking Network. The proposed method holds state-of-the-art (SOTA) accuracy on the well-known FER2013 and private VEMO datasets.

...read moreread less

Collapse