scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A novel framework based on Generative Adversarial Networks for pose face augmentation (PFA-GAN) enables a controlled pose synthesis of a new face image from a source face given a driving one while preserving the identity of the source face.
Abstract: In this work, we propose a novel framework based on Generative Adversarial Networks for pose face augmentation (PFA-GAN). It enables a controlled pose synthesis of a new face image from a source face given a driving one while preserving the identity of the source face. We introduce a method for training the framework in a fully self-supervised mode using a large-scale dataset of unconstrained face images. Besides, some augmentation strategies are presented to expand the training set. The face verification experimental results demonstrate the effectiveness of the presented augmentation strategies as all augmented datasets outperform the baseline.

6 citations


Cites methods from "ArcFace: Additive Angular Margin Lo..."

  • ...Both classifiers use ArcFace (Deng et al., 2019) and Focal loss function....

    [...]

Proceedings ArticleDOI
01 Sep 2021
TL;DR: In this article, the authors proposed an efficient face template protection system based on homomorphic encryption by developing a message packing method suitable for the calculation of the squared Euclidean distance between facial features by a single homomorphic multiplication.
Abstract: This paper proposes an efficient face template protection system based on homomorphic encryption. By developing a message packing method suitable for the calculation of the squared Euclidean distance, the proposed system computes the squared Euclidean distance between facial features by a single homomorphic multiplication. Our experimental results show the transaction time of the proposed system is about 14 times faster than that of the existing face template protection system based on homomorphic encryption presented in BIOSIG2020.

6 citations

Journal ArticleDOI
TL;DR: In this paper , a comprehensive study on the recognition of masked faces was conducted by using facial landmarks and synthesizing the facial mask for each face in several benchmark databases with different challenging factors.
Abstract: As we have been seriously hit by the COVID-19 pandemic, wearing a facial mask is a crucial action that we can take for our protection. This paper reports a comprehensive study on the recognition of masked faces. By using facial landmarks, we synthesize the facial mask for each face in several benchmark databases with different challenging factors. The IJB-B and IJB-C databases are selected for evaluating the performance against the variation across pose, illumination and expression (PIE). The FG-Net database is selected for evaluating the performance across age. The SCface is chosen for evaluating the performance on low-resolution images. The MS-1MV2 is exploited as the base training set. We use the ResNet-100 as the feature embedding network connected to state-of-the-art loss functions designed for tackling face recognition. The loss functions considered include the Center Loss, the Marginal Loss, the Angular Softmax Loss, the Large Margin Cosine Loss and the Additive Angular Margin Loss. Both verification and identification are conducted in our evaluation. The performances for recognizing faces with and without the synthetic masks are all evaluated for a complete comparison. The network with the best loss function for recognizing synthetic masked faces is then assessed on a real masked face database, the cleaned RMFRD (c-RMFRD) dataset. Compared with a human user test on the c-RMFRD, the network trained on the synthetic masked faces outperforms human vision for a large gap. Our contributions are fourfold. The first is a comprehensive study for tackling masked face recognition by using state-of-the-art loss functions against various compounding factors. For comparison purpose, the second is another comprehensive study on the recognition of faces without masks by using the same loss functions against the same challenging factors. The third is the verification of the network trained on synthetic masked faces for tackling the real masked face recognition with performance better than human inspectors. The fourth is the highlight on the challenges of masked face recognition and the directions for future research. Our code, trained models and dataset are available via the project GitHub site.

6 citations

Book ChapterDOI
TL;DR: In this article , the personalization of the video-based frame-level facial expression recognition is studied for multi-user systems if a small amount of short videos are available for each user.
Abstract: In this paper, the personalization of the video-based frame-level facial expression recognition is studied for multi-user systems if a small amount of short videos are available for each user. At first, embeddings of each video frame are computed using deep convolutional neural network pre-trained on a large emotional dataset of static images. Next, a dataset of videos is used to train a subject-independent emotion classifier, such as feed-forward neural network or frame attention network. Finally, it is proposed to fine-tune this neural classifier on the videos of each user of interest. As a result, every user is associated with his or her own emotional model. The classifier in a multi-user system is chosen by an appropriate video-based face recognition method. The experimental study with the RAMAS dataset demonstrates the significant (up to 25%) increase in accuracy of the proposed approach when compared to a subject-independent facial expression recognition.

6 citations

Posted Content
TL;DR: The results indicate that the proposed global context guided transformation modules can efficiently improve the learned speaker representations by achieving time-frequency and channel-wise feature recalibration.
Abstract: In this study, we propose the global context guided channel and time-frequency transformations to model the long-range, non-local time-frequency dependencies and channel variances in speaker representations. We use the global context information to enhance important channels and recalibrate salient time-frequency locations by computing the similarity between the global context and local features. The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset, which is a large scale speaker verification corpus collected in the wild. This lightweight block can be easily incorporated into a CNN model with little additional computational costs and effectively improves the speaker verification performance compared to the baseline ResNet-LDE model and the Squeeze&Excitation block by a large margin. Detailed ablation studies are also performed to analyze various factors that may impact the performance of the proposed modules. We find that by employing the proposed L2-tf-GTFC transformation block, the Equal Error Rate decreases from 4.56% to 3.07%, a relative 32.68% reduction, and a relative 27.28% improvement in terms of the DCF score. The results indicate that our proposed global context guided transformation modules can efficiently improve the learned speaker representations by achieving time-frequency and channel-wise feature recalibration.

6 citations


Cites methods from "ArcFace: Additive Angular Margin Lo..."

  • ...Margin based softmax loss functions like Angular Softmax [15], Additive Margin Softmax [16], and recently proposed Additive Angular Margin loss [17] were useful to learn a more discriminative speaker embedding....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations