scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
05 May 2022
TL;DR: Surprisingly, the experiments demonstrate that, although visually more appealing, morphs based on StyleGAN 2 do not pose a significant threat to the state to face recognition systems, as these morphs were outmatched by the simple morphs that are based facial landmarks.
Abstract: Morphing attacks are a threat to biometric systems where the biometric reference in an identity document can be altered. This form of attack presents an important issue in applications relying on identity documents such as border security or access control. Research in generation of face morphs and their detection is developing rapidly, however very few datasets with morphing attacks and open-source detection toolkits are publicly available. This paper bridges this gap by providing two datasets and the corresponding code for four types of morphing attacks: two that rely on facial landmarks based on OpenCV and FaceMorpher, and two that use StyleGAN 2 to generate synthetic morphs. We also conduct extensive experiments to assess the vulnerability of four state-of-the-art face recognition systems, including FaceNet, VGG-Face, ArcFace, and ISV. Surprisingly, the experiments demonstrate that, although visually more appealing, morphs based on StyleGAN 2 do not pose a significant threat to the state to face recognition systems, as these morphs were outmatched by the simple morphs that are based facial landmarks.

8 citations

Proceedings ArticleDOI
Guoli Wang1, Jiaqi Ma2, Qian Zhang, Jiwen Lu1, Jie Zhou1 
20 Jun 2021
TL;DR: Zhang et al. as mentioned in this paper proposed a novel lightweight pseudo facial generation method to relieve the problem of extreme poses without generating any frontal facial image, which can depict the facial contour information and make appropriate modifications to preserve the critical identity information.
Abstract: Face recognition has achieved a great success in recent years, it is still challenging to recognize those facial images with extreme poses. Traditional methods consider it as a domain gap problem. Many of them settle it by generating fake frontal faces from extreme ones, whereas they are tough to maintain the identity information with high computational consumption and uncontrolled disturbances. Our experimental analysis shows a dramatic precision drop with extreme poses. Meanwhile, those extreme poses just exist minor visual differences after small rotations. Derived from this insight, we attempt to relieve such a huge precision drop by making minor changes to the input images without modifying existing discriminators. A novel lightweight pseudo facial generation is proposed to relieve the problem of extreme poses without generating any frontal facial image. It can depict the facial contour information and make appropriate modifications to preserve the critical identity information. Specifically, the proposed method reconstructs pseudo profile faces by minimizing the pixel-wise differences with original profile faces and maintaining the identity consistent information from their corresponding frontal faces simultaneously. The proposed framework can improve existing discriminators and obtain a great promotion on several benchmark datasets.

8 citations

Proceedings ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed the race adaptive margin based face recognition (RamFace) model, which designed under the multi-task learning framework with the race classification as the auxiliary task.
Abstract: Recent studies show that there exist significant racial bias among state-of-the-art (SOTA) face recognition algorithms, i.e., the accuracy for Caucasian is consistently higher than that for other races like African and Asian. To mitigate racial bias, we propose the race adaptive margin based face recognition (RamFace) model, designed under the multi-task learning framework with the race classification as the auxiliary task. The experiments show that the race classification task can enforce the model to learn the racial features and thus improve the discriminability of the extracted feature representations. In addition, a racial bias robust loss function, i.e., race adaptive margin loss, is proposed such that different optimal margins can be automatically derived for different races in training the model, which further mitigates the racial bias. The experimental results show that on RFW dataset, our model not only achieves SOTA face recognition accuracy but also mitigates the racial bias problem. Besides, RamFace is also tested on several public face recognition evaluation benchmarks, i.e., LFW, CPLFW and CALFW, and achieves better performance than the commonly used face recognition methods, which justifies the generalization capability of RamFace.

8 citations

Journal ArticleDOI
TL;DR: This is the first demonstration of the feasibility of deploying camera recording systems and developing machine learning-based workflow analysis solutions for open surgery, particularly in orthopaedics.
Abstract: Safe and efficient surgical training and workflow management play a critical role in clinical competency and ultimately, patient outcomes. Video data in minimally invasive surgery (MIS) have enable...

8 citations


Cites background from "ArcFace: Additive Angular Margin Lo..."

  • ...2019); however, even low-resolution faces can still be recognised when video sequences are available (Deng et al. 2019)....

    [...]

  • ...Solutions to anonymise OR video include reducing image resolutions severely to preserve privacy (Srivastav et al. 2019); however, even low-resolution faces can still be recognised when video sequences are available (Deng et al. 2019)....

    [...]

Proceedings ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a discriminative speaker embedding based on convolutional neural networks (DCNNs), which decomposes the high-dimensional frame-level features into a group of low-dimensional vectors before applying VLAD aggregation, and a mutually complementary assembling loss function is proposed to train the model, which consists of a prototypical loss and a marginal-based softmax loss.
Abstract: The embedding-based speaker verification (SV) technology has witnessed significant progress due to the advances of deep convolutional neural networks (DCNN). However, how to improve the discrimination of speaker embedding in the open world SV task is still the focus of current research in the community. In this paper, we improve the discriminative power of speaker embedding from three-fold: (1) NeXtVLAD is introduced to aggregate frame-level features, which decomposes the high-dimensional frame-level features into a group of low-dimensional vectors before applying VLAD aggregation. (2) A multi-scale aggregation strategy (MSA) assembled with NeXtVLAD is designed with the purpose of fully extract speaker information from the frame-level feature in different hidden layers of DCNN. (3) A mutually complementary assembling loss function is proposed to train the model, which consists of a prototypical loss and a marginal-based softmax loss. Extensive experiments have been conducted on the VoxCeleb-1 dataset, and the experimental results show that our proposed system can obtain significant performance improvements compared with the baseline, and obtains new state-of-the-art results. The source code of this paper is available at https://github.com/LCF2764/Discriminative-Speaker-Embedding.

8 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations