scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors proposed a conceptually simple and intuitive learning objective function, i.e., additive margin softmax, for face verification, which is more intuitive and interpretable.
Abstract: In this letter, we propose a conceptually simple and intuitive learning objective function, i.e., additive margin softmax, for face verification. In general, face verification tasks can be viewed as metric learning problems, even though lots of face verification models are trained in classification schemes. It is possible when a large-margin strategy is introduced into the classification model to encourage intraclass variance minimization. As one alternative, angular softmax has been proposed to incorporate the margin. In this letter, we introduce another kind of margin to the softmax loss function, which is more intuitive and interpretable. Experiments on LFW and MegaFace show that our algorithm performs better when the evaluation criteria are designed for very low false alarm rate.

936 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: An integrated OLTR algorithm is developed that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world.
Abstract: Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes. OLTR must handle imbalanced classification, few-shot learning, and open-set recognition in one integrated algorithm, whereas existing classification approaches focus only on one aspect and deliver poorly over the entire class spectrum. The key challenges are how to share visual knowledge between head and tail classes and how to reduce confusion between tail and open classes. We develop an integrated OLTR algorithm that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. Our so-called dynamic meta-embedding combines a direct image feature and an associated memory feature, with the feature norm indicating the familiarity to known classes. On three large-scale OLTR datasets we curate from object-centric ImageNet, scene-centric Places, and face-centric MS1M data, our method consistently outperforms the state-of-the-art. Our code, datasets, and models enable future OLTR research and are publicly available at \url{https://liuziwei7.github.io/projects/LongTail.html}.

780 citations


Cites methods from "ArcFace: Additive Angular Margin Lo..."

  • ...MS1M-LT: To create a long-tailed version of the MS1MArcFace dataset [14, 8], we sample images for each identity with a probability proportional to the image numbers of each identity....

    [...]

Proceedings ArticleDOI
14 Jun 2020
TL;DR: MaskGAN as mentioned in this paper proposes MaskGAN to enable diverse and interactive face manipulation by learning style mapping between a free-form user modified mask and a target image, enabling diverse generation results.
Abstract: Facial image manipulation has achieved great progress in recent years. However, previous methods either operate on a predefined set of face attributes or leave users little freedom to interactively manipulate images. To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation. Our key insight is that semantic masks serve as a suitable intermediate representation for flexible face manipulation with fidelity preservation. MaskGAN has two main components: 1) Dense Mapping Network (DMN) and 2) Editing Behavior Simulated Training (EBST). Specifically, DMN learns style mapping between a free-form user modified mask and a target image, enabling diverse generation results. EBST models the user editing behavior on the source mask, making the overall framework more robust to various manipulated inputs. Specifically, it introduces dual-editing consistency as the auxiliary supervision signal. To facilitate extensive studies, we construct a large-scale high-resolution face dataset with fine-grained mask annotations named CelebAMask-HQ. MaskGAN is comprehensively evaluated on two challenging tasks: attribute transfer and style copy, demonstrating superior performance over other state-of-the-art methods. The code, models, and dataset are available at https://github.com/switchablenorms/CelebAMask-HQ.

727 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: A novel single-shot, multi-level face localisation method, named RetinaFace, which unifies face box prediction, 2D facial landmark localisation and 3D vertices regression under one common target: point regression on the image plane.
Abstract: Though tremendous strides have been made in uncontrolled face detection, accurate and efficient 2D face alignment and 3D face reconstruction in-the-wild remain an open challenge. In this paper, we present a novel single-shot, multi-level face localisation method, named RetinaFace, which unifies face box prediction, 2D facial landmark localisation and 3D vertices regression under one common target: point regression on the image plane. To fill the data gap, we manually annotated five facial landmarks on the WIDER FACE dataset and employed a semi-automatic annotation pipeline to generate 3D vertices for face images from the WIDER FACE, AFLW and FDDB datasets. Based on extra annotations, we propose a mutually beneficial regression target for 3D face reconstruction, that is predicting 3D vertices projected on the image plane constrained by a common 3D topology. The proposed 3D face reconstruction branch can be easily incorporated, without any optimisation difficulty, in parallel with the existing box and 2D landmark regression branches during joint training. Extensive experimental results show that RetinaFace can simultaneously achieve stable face detection, accurate 2D face alignment and robust 3D face reconstruction while being efficient through single-shot inference.

683 citations


Cites background from "ArcFace: Additive Angular Margin Lo..."

  • ...expression [64] and age [41, 39]) and facial identity recognition [18, 12, 56]....

    [...]

Proceedings ArticleDOI
14 May 2020
TL;DR: The proposed ECAPA-TDNN architecture significantly outperforms state-of-the-art TDNN based systems on the Voxceleb test sets and the 2019 VoxCeleb Speaker Recognition Challenge.
Abstract: Current speaker verification techniques rely on a neural network to extract speaker representations. The successful x-vector architecture is a Time Delay Neural Network (TDNN) that applies statistics pooling to project variable-length utterances into fixed-length speaker characterizing embeddings. In this paper, we propose multiple enhancements to this architecture based on recent trends in the related fields of face verification and computer vision. Firstly, the initial frame layers can be restructured into 1-dimensional Res2Net modules with impactful skip connections. Similarly to SE-ResNet, we introduce Squeeze-and-Excitation blocks in these modules to explicitly model channel interdependencies. The SE block expands the temporal context of the frame layer by rescaling the channels according to global properties of the recording. Secondly, neural networks are known to learn hierarchical features, with each layer operating on a different level of complexity. To leverage this complementary information, we aggregate and propagate features of different hierarchical levels. Finally, we improve the statistics pooling module with channel-dependent frame attention. This enables the network to focus on different subsets of frames during each of the channel’s statistics estimation. The proposed ECAPA-TDNN architecture significantly outperforms state-of-the-art TDNN based systems on the VoxCeleb test sets and the 2019 VoxCeleb Speaker Recognition Challenge.

617 citations


Cites background or methods from "ArcFace: Additive Angular Margin Lo..."

  • ...All systems are trained using AAM softmax [3, 23] with a margin of 0....

    [...]

  • ...The rising popularity of the x-vector system has resulted in significant architectural improvements and optimized training procedures [3] over the original approach....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: The MegaFace dataset is assembled, both for identification and verification performance, and performance with respect to pose and a persons age is evaluated, as a function of training data size (#photos and #people).
Abstract: Recent face recognition experiments on a major benchmark (LFW [15]) show stunning performance–a number of algorithms achieve near to perfect score, surpassing human recognition rates. In this paper, we advocate evaluations at the million scale (LFW includes only 13K photos of 5K people). To this end, we have assembled the MegaFace dataset and created the first MegaFace challenge. Our dataset includes One Million photos that capture more than 690K different individuals. The challenge evaluates performance of algorithms with increasing numbers of "distractors" (going from 10 to 1M) in the gallery set. We present both identification and verification performance, evaluate performance with respect to pose and a persons age, and compare as a function of training data size (#photos and #people). We report results of state of the art and baseline algorithms. The MegaFace dataset, baseline code, and evaluation scripts, are all publicly released for further experimentations1.

841 citations

Proceedings Article
19 Jun 2016
TL;DR: A generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features and which not only can adjust the desired margin but also can avoid overfitting is proposed.
Abstract: Cross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks.

769 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: An approach to building face datasets that starts with detecting faces in images returned from searches for public figures on the Internet, followed by discarding those not belonging to each queried person, and is releasing the FaceScrub dataset.
Abstract: Large face datasets are important for advancing face recognition research, but they are tedious to build, because a lot of work has to go into cleaning the huge amount of raw data. To facilitate this task, we describe an approach to building face datasets that starts with detecting faces in images returned from searches for public figures on the Internet, followed by discarding those not belonging to each queried person. We formulate the problem of identifying the faces to be removed as a quadratic programming problem, which exploits the observations that faces of the same person should look similar, have the same gender, and normally appear at most once per image. Our results show that this method can reliably clean a large dataset, leading to a considerable reduction in the work needed to build it. Finally, we are releasing the FaceScrub dataset that was created using this approach. It consists of 141,130 faces of 695 public figures and can be obtained from http://vintage.winklerbros.net/facescrub.html.

622 citations


"ArcFace: Additive Angular Margin Lo..." refers methods in this paper

  • ...The MegaFace dataset [12] includes 1M images of 690K different individuals as the gallery set and 100K photos of 530 unique individuals from FaceScrub [21] as the probe set....

    [...]

Proceedings ArticleDOI
07 Mar 2016
TL;DR: The aim of this data set is to isolate the factor of pose variation in terms of extreme poses like profile, where many features are occluded, along with other `in the wild' variations to suggest that there is a gap between human performance and automatic face recognition methods for large pose variations in unconstrained images.
Abstract: We have collected a new face data set that will facilitate research in the problem of frontal to profile face verification ‘in the wild’. The aim of this data set is to isolate the factor of pose variation in terms of extreme poses like profile, where many features are occluded, along with other ‘in the wild’ variations. We call this data set the Celebrities in Frontal-Profile (CFP) data set. We find that human performance on Frontal-Profile verification in this data set is only slightly worse (94.57% accuracy) than that on Frontal-Frontal verification (96.24% accuracy). However we evaluated many state-of-the-art algorithms, including Fisher Vector, Sub-SML and a Deep learning algorithm. We observe that all of them degrade more than 10% from Frontal-Frontal to Frontal-Profile verification. The Deep learning implementation, which performs comparable to humans on Frontal-Frontal, performs significantly worse (84.91% accuracy) on Frontal-Profile. This suggests that there is a gap between human performance and automatic face recognition methods for large pose variation in unconstrained images.

618 citations


"ArcFace: Additive Angular Margin Lo..." refers background in this paper

  • ...During training, we explore efficient face verification datasets (e.g. LFW [10], CFP-FP [28], AgeDB-30 [19]) to check the improvement from different settings....

    [...]

  • ...LFW [10] 5,749 13,233 CFP-FP [28] 500 7,000 AgeDB-30 [19] 568 16,488 CPLFW [44] 5,749 11,652 CALFW [45] 5,749 12,174 YTF [38] 1,595 3,425...

    [...]

  • ...In Figure 7, we illustrate the angle distributions (predicted by ArcFace model trained on MS1MV2 with ResNet100) of both positive and negative pairs on LFW, CFP-FP, AgeDB-30, YTF, CPLFW and CALFW....

    [...]

  • ...As the baseline we have chosen the softmax loss and we have observed performance drop on CFP-FP and AgeDB-30 after weight and feature normalisation....

    [...]

  • ...pose variations [28, 44] and age gaps [19, 45]) and large-scale test scenarios (e....

    [...]

Proceedings Article
15 Feb 2017
TL;DR: It is found that both label smoothing and the confidence penalty improve state-of-the-art models across benchmarks without modifying existing hyperparameters, suggesting the wide applicability of these regularizers.
Abstract: We systematically explore regularizing neural networks by penalizing low entropy output distributions. We show that penalizing low entropy output distributions, which has been shown to improve exploration in reinforcement learning, acts as a strong regularizer in supervised learning. Furthermore, we connect a maximum entropy based confidence penalty to label smoothing through the direction of the KL divergence. We exhaustively evaluate the proposed confidence penalty and label smoothing on 6 common benchmarks: image classification (MNIST and Cifar-10), language modeling (Penn Treebank), machine translation (WMT'14 English-to-German), and speech recognition (TIMIT and WSJ). We find that both label smoothing and the confidence penalty improve state-of-the-art models across benchmarks without modifying existing hyperparameters, suggesting the wide applicability of these regularizers.

617 citations