scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: A novel mobile network named SeesawFaceNets, a simple but effective model, is proposed for productively deploying face recognition for mobile devices and is eventually competitive against large-scale deep-networks face recognition on all 5 listed public validation datasets.
Abstract: Deep Convolutional Neural Network (DCNNs) come to be the most widely used solution for most computer vision related tasks, and one of the most important application scenes is face verification. Due to its high-accuracy performance, deep face verification models of which the inference stage occurs on cloud platform through internet plays the key role on most prectical scenes. However, two critical issues exist: First, individual privacy may not be well protected since they have to upload their personal photo and other private information to the online cloud backend. Secondly, either training or inference stage is time-comsuming and the latency may affect customer experience, especially when the internet link speed is not so stable or in remote areas where mobile reception is not so good, but also in cities where building and other construction may block mobile signals. Therefore, designing lightweight networks with low memory requirement and computational cost is one of the most practical solutions for face verification on mobile platform. In this paper, a novel mobile network named SeesawFaceNets, a simple but effective model, is proposed for productively deploying face recognition for mobile devices. Dense experimental results have shown that our proposed model SeesawFaceNets outperforms the baseline MobilefaceNets, with only {\bf66\%}(146M VS 221M MAdds) computational cost, smaller batch size and less training steps, and SeesawFaceNets achieve comparable performance with other SOTA model e.g. mobiface with only {\bf54.2\%}(1.3M VS 2.4M) parameters and {\bf31.6\%}(146M VS 462M MAdds) computational cost, It is also eventually competitive against large-scale deep-networks face recognition on all 5 listed public validation datasets, with {\bf6.5\%}(4.2M VS 65M) parameters and {\bf4.35\%}(526M VS 12G MAdds) computational cost.

6 citations


Cites background from "ArcFace: Additive Angular Margin Lo..."

  • ...With the squeeze-and-excitation optimization[7], [4, 48, 6] achieves slightly better performance on face recognition and image classification tasks....

    [...]

  • ...35% computational cost compared to large-scale deep face recognition network[4]....

    [...]

Journal ArticleDOI
01 Sep 2022-Sensors
TL;DR: This ensemble model could precisely predict the acne severity, number, and position simultaneously, and could be an effective tool to help the patient self-test and assist the doctor in the diagnosis.
Abstract: Acne detection, utilizing prior knowledge to diagnose acne severity, number or position through facial images, plays a very important role in medical diagnoses and treatment for patients with skin problems. Recently, deep learning algorithms were introduced in acne detection to improve detection precision. However, it remains challenging to diagnose acne based on the facial images of patients due to the complex context and special application scenarios. Here, we provide an ensemble neural network composed of two modules: (1) a classification module aiming to calculate the acne severity and number; (2) a localization module aiming to calculate the detection boxes. This ensemble model could precisely predict the acne severity, number, and position simultaneously, and could be an effective tool to help the patient self-test and assist the doctor in the diagnosis.

6 citations

Journal ArticleDOI
TL;DR: DFANet as mentioned in this paper is a dropout-based method used in convolutional layers, which can increase the diversity of surrogate models and obtain ensemble-like effects to improve the transferability of feature-level adversarial examples.
Abstract: Face recognition has achieved great success in the last five years due to the development of deep learning methods. However, deep convolutional neural networks (DCNNs) have been found to be vulnerable to adversarial examples. In particular, the existence of transferable adversarial examples can severely hinder the robustness of DCNNs since this type of attacks can be applied in a fully black-box manner without queries on the target system. In this work, we first investigate the characteristics of transferable adversarial attacks in face recognition by showing the superiority of feature-level methods over label-level methods. Then, to further improve transferability of feature-level adversarial examples, we propose DFANet, a dropout-based method used in convolutional layers, which can increase the diversity of surrogate models and obtain ensemble-like effects. Extensive experiments on state-of-the-art face models with various training databases, loss functions and network architectures show that the proposed method can significantly enhance the transferability of existing attack methods. Finally, by applying DFANet to the LFW database, we generate a new set of adversarial face pairs that can successfully attack four commercial APIs without any queries. This TALFW database is available to facilitate research on the robustness and defense of deep face recognition.

6 citations

Journal ArticleDOI
TL;DR: This work proposes to solve the hard sample issue with a Memory-augmented Progressive Learning network (GaitMPL), including Dynamic Reweighting Progressive Learning module (DRPL) and Global Structure-Aligned Memory bank (GSAM), which reduces the learning difficulty of hard samples by easy-to-hard progressive learning.
Abstract: Gait recognition aims at identifying the pedestrians at a long distance by their biometric gait patterns. It is inherently challenging due to the various covariates and the properties of silhouettes (textureless and colorless), which result in two kinds of pair-wise hard samples: the same pedestrian could have distinct silhouettes (intra-class diversity) and different pedestrians could have similar silhouettes (inter-class similarity). In this work, we propose to solve the hard sample issue with a Memory-augmented Progressive Learning network (GaitMPL), including Dynamic Reweighting Progressive Learning module (DRPL) and Global Structure-Aligned Memory bank (GSAM). Specifically, DRPL reduces the learning difficulty of hard samples by easy-to-hard progressive learning. GSAM further augments DRPL with a structure-aligned memory mechanism, which maintains and models the feature distribution of each ID. Experiments on two commonly used datasets, CASIA-B and OU-MVLP, demonstrate the effectiveness of GaitMPL. On CASIA-B, we achieve the state-of-the-art performance, i.e., 88.0% on the most challenging condition (Clothing) and 93.3% on the average condition, which outperforms the other methods by at least 3.8% and 1.4%, respectively.

6 citations

Journal ArticleDOI
TL;DR: Analysis of images of humpback whale flukes provided by the Kaggle Challenge and spectral cluster analysis of heatmaps shows that characteristics automatically determined by a neural network correspond to those that are considered important by a whale expert.
Abstract: . Interpretable and explainable machine learning have proven to be promising approaches to verify the quality of a data-driven model in general as well as to obtain more information about the quality of certain observations in practise. In this paper, we use these approaches for an application in the marine sciences to support the monitoring of whales. Whale population monitoring is an important element of whale conservation, where the identification of whales plays an important role in this process, for example to trace the migration of whales over time and space. Classical approaches use photographs and a manual mapping with special focus on the shape of the whale flukes and their unique pigmentation. However, this is not feasible for comprehensive monitoring. Machine learning methods, especially deep neural networks, have shown that they can efficiently solve the automatic observation of a large number of whales. Despite their success for many different tasks such as identification, further potentials such as interpretability and their benefits have not yet been exploited. Our main contribution is an analysis of interpretation tools, especially occlusion sensitivity maps, and the question of how the gained insights can help a whale researcher. For our analysis, we use images of humpback whale flukes provided by the Kaggle Challenge ”Humpback Whale Identification”. By means of spectral cluster analysis of heatmaps, which indicate which parts of the image are important for a decision, we can show that the they can be grouped in a meaningful way. Moreover, it appears that characteristics automatically determined by a neural network correspond to those that are considered important by a whale expert.

6 citations


Cites methods from "ArcFace: Additive Angular Margin Lo..."

  • ...The optimization of the parameters is done by minimizing a weighted sum of three losses: triplet loss (Weinberger, Saul, 2009), ArcFace loss (Deng et al., 2019), and focal loss (Lin et al., 2017)....

    [...]

  • ...The optimization of the parameters is done by minimizing a weighted sum of three losses: triplet loss (Weinberger, Saul, 2009), ArcFace loss (Deng et al., 2019), and focal loss (Lin et al....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations