scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Aug 2020
TL;DR: This paper trains neural networks for face morphing attack detection using proposed training schemes, which are based on different alternations of the training data to increase robustness and generality and shows that the improved training forces the networks to develop and use reliable models for all regions of the analyzed image.
Abstract: Artificial neural networks tend to use only what they need for a task. For example, to recognize a rooster, a network might only considers the rooster’s red comb and wattle and ignores the rest of the animal. This makes them vulnerable to attacks on their decision making process and can worsen their generality. Thus, this phenomenon has to be considered during the training of networks, especially in safety and security related applications. In this paper, we propose neural network training schemes, which are based on different alternations of the training data, to increase robustness and generality. Precisely, we limit the amount and position of information available to the neural network for the decision making process and study their effects on the accuracy, generality, and robustness against semantic and black box attacks for the particular example of face morphing attacks. In addition, we exploit layer-wise relevance propagation (LRP) to analyze the differences in the decision making process of the differently trained neural networks. A face morphing attack is an attack on a biometric facial recognition system, where the system is fooled to match two different individuals with the same synthetic face image. Such a synthetic image can be created by aligning and blending images of the two individuals that should be matched with this image. We train neural networks for face morphing attack detection using our proposed training schemes and show that they lead to an improvement of robustness against attacks on neural networks. Using LRP, we show that the improved training forces the networks to develop and use reliable models for all regions of the analyzed image. This redundancy in representation is of crucial importance to security related applications.

35 citations

Journal ArticleDOI
TL;DR: In this article, the authors present a comprehensive survey emphasizing on the demands of speech and vision systems with the view of both hardware and software systems and highlight the efforts taken for future focus and upcoming demands of the intelligent use of speech-vision processes.
Abstract: This is a survey paper that aims to give reviews about that finest architectures of machine learning, the use of algorithms and the applications of the system and speech and vision processes. The current technology poses vast areas of research in the field of architectures and algorithms which are used in machine learning, they can further be used for the planning new ideas and recreation of speech system and vision systems intelligently. The level of personal computing and its commercialization is at an all-time high. The machine learning can used for learning and training using large sets of sensor data and computing through clouds, also not to forget the mobile and embedded technology which is even more sophisticated. The survey is presented by giving a detailed background and the evolving nature of the most used models of deep learning which are used effectively in the vision and speech systems. The survey aims to give a perspective into the large scale research at the industrial level, it also highlights the efforts taken for future focus and upcoming demands of the intelligent use of speech and vision processes. The demand in a strong, robust and high intelligent system is that of lower latency and fidelity to be high. This is mostly seen in hardware devices with limited resources like automobiles, robots and mobile phones. These points are kept in mind and a detail of the major challenges and success rates especially in machine learning with platforms that have limited resources. The restrictions are in memory, life of the battery and the capacity of processing. The conclusion is based on the applications which are fast emerging based on the usage of speech and vision systems. Examples include effective evaluation, smart and quick system transportation and correct medicine prescriptions. This paper promise to deliver a comprehensive survey emphasizing on the demands of speech and vision systems with the view of both hardware and software systems. The technologies which are discussed in machine learning are fast gaining access and aim to revolutionize the areas of research and development in speech and vision systems.

35 citations

Proceedings ArticleDOI
01 Jun 2022
TL;DR: ElasticFace as mentioned in this paper relaxes the fixed penalty margin constrain by using random margin values drawn from a normal distribution in each training iteration to give the decision boundary chances to extract and retract to allow space for flexible class separability learning.
Abstract: Learning discriminative face features plays a major role in building high-performing face recognition models. The recent state-of-the-art face recognition solutions proposed to incorporate a fixed penalty margin on commonly used classification loss function, softmax loss, in the normalized hypersphere to increase the discriminative power of face recognition models, by minimizing the intra-class variation and maximizing the inter-class variation. Marginal penalty softmax losses, such as ArcFace and CosFace, assume that the geodesic distance between and within the different identities can be equally learned using a fixed penalty margin. However, such a learning objective is not realistic for real data with inconsistent inter-and intra-class variation, which might limit the discriminative and generalizability of the face recognition model. In this paper, we relax the fixed penalty margin constrain by proposing elastic penalty margin loss (ElasticFace) that allows flexibility in the push for class separability. The main idea is to utilize random margin values drawn from a normal distribution in each training iteration. This aims at giving the decision boundary chances to extract and retract to allow space for flexible class separability learning. We demonstrate the superiority of our ElasticFace loss over ArcFace and CosFace losses, using the same geometric transformation, on a large set of mainstream benchmarks. From a wider perspective, our ElasticFace has advanced the state-of-the-art face recognition performance on seven out of nine mainstream benchmarks. All training codes, pre-trained models, training logs will be publicly released 1.

35 citations

Proceedings ArticleDOI
01 Jun 2019
TL;DR: In this article, an approach for synthesizing automatically age-progressed facial images in video sequences using deep reinforcement learning is presented. But this method is limited to a single image and cannot synthesize an aged likeness of a face from a single input image.
Abstract: This paper presents a novel approach for synthesizing automatically age-progressed facial images in video sequences using Deep Reinforcement Learning. The proposed method models facial structures and the longitudinal face-aging process of given subjects coherently across video frames. The approach is optimized using a long-term reward, Reinforcement Learning function with deep feature extraction from Deep Convolutional Neural Network. Unlike previous age-progression methods that are only able to synthesize an aged likeness of a face from a single input image, the proposed approach is capable of age-progressing facial likenesses in videos with consistently synthesized facial features across frames. In addition, the deep reinforcement learning method guarantees preservation of the visual identity of input faces after age-progression. Results on videos of our new collected aging face AGFW-v2 database demonstrate the advantages of the proposed solution in terms of both quality of age-progressed faces, temporal smoothness, and cross-age face verification.

34 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: This paper examines security of one of the best public face recognition systems, LResNet100E-IR with ArcFace loss, and proposes a simple method to attack it in the physical world, and suggests creating an adversarial patch that can be printed, added as a face attribute and photographed.
Abstract: Recent works showed the vulnerability of image classifiers to adversarial attacks in the digital domain. However, the majority of attacks involve adding small perturbation to an image to fool the classifier. Unfortunately, such procedures can not be used to conduct a real-world attack, where adding an adversarial attribute to the photo is a more practical approach. In this paper, we study the problem of real-world attacks on face recognition systems. We examine security of one of the best public face recognition systems, LResNet100E-IR with ArcFace loss, and propose a simple method to attack it in the physical world. The method suggests creating an adversarial patch that can be printed, added as a face attribute and photographed; the photo of a person with such attribute is then passed to the classifier such that the classifier's recognized class changes from correct to the desired one. Proposed generating procedure allows projecting adversarial patches not only on different areas of the face, such as nose or forehead but also on some wearable accessory, such as eyeglasses.

34 citations


Cites background from "ArcFace: Additive Angular Margin Lo..."

  • ...In the original paper [1] it is observed that the accuracy in classification problems of this network is comparable to state-of-the-art models....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations