scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed an efficient hybrid parallel deep learning model (HPM) for intrusion detection based on margin learning, which constructs two parallel CNN architectures and fuses the spatial features obtained through full convolution.
Abstract: With the rapid development of network technology, a variety of new malicious attacks appear while attack methods are constantly updated. As the attackers exploit the vulnerabilities of popular third-party components to invade target websites, further improving the classification accuracy of malicious network traffic is the key to improving the performance of abnormal traffic detection. Existing intrusion detection systems may suffer from incomplete feature extraction and low classification accuracy. Thus, this paper proposes an efficient hybrid parallel deep learning model (HPM) for intrusion detection based on margin learning. First, HPM constructs two parallel CNN architectures and fuses the spatial features obtained through full convolution. Secondly, the temporal information of the fused features is parsed separately using two parallel LSTMs. Finally, the extracted spatial-temporal features are fed into the CosMargin classifier for classification detection after global convolution and global pooling. Besides, this paper proposes an improved traffic feature extraction method, which not only reduces redundant features but also speeds up the convergence speed of the network. In the experiment, our HPM has achieved 99% detection accuracy of each malicious class, ranging from 5%–10% improvement with other models, which demonstrates the superiority of our proposed model.

16 citations

Journal ArticleDOI
TL;DR: A novel iterative dynamic generic learning (IDGL) method, where the labeled enrolment database and the unlabeled query set are fed into a dynamic label feedback network for learning and the accuracy of the estimated labels is improved iteratively by virtue of the steadily enhanced prototypes.
Abstract: This article focuses on a new and practical problem in single-sample per person face recognition (SSPP FR), i.e., SSPP FR with a contaminated biometric enrolment database (SSPP-ce FR), where the SSPP-based enrolment database is contaminated by nuisance facial variations in the wild, such as poor lightings, expression change, and disguises (e.g., wearing sunglasses, hat, and scarf). In SSPP-ce FR, the most popular generic learning methods will suffer serious performance degradation because the prototype plus variation (P+V) model used in these methods is no longer suitable in such scenarios. The reasons are twofold. First, the contaminated enrolment samples could yield bad prototypes to represent the persons. Second, the generated variation dictionary is simply based on the subtraction of the average face from generic samples of the same person and cannot well depict the intrapersonal variations. To address the SSPP-ce FR problem, we propose a novel iterative dynamic generic learning (IDGL) method, where the labeled enrolment database and the unlabeled query set are fed into a dynamic label feedback network for learning. Specifically, IDGL first recovers the prototypes for the contaminated enrolment samples via a semisupervised low-rank representation (SSLRR) framework and learns a representative variation dictionary by extracting the “sample-specific” corruptions from an auxiliary generic set. Then, it puts them into the P+V model to estimate labels for query samples. Subsequently, the estimated labels will be used as feedback to modify the SSLRR, thus updating new prototypes for the next round of P+V-based label estimation. With the dynamic learning network, the accuracy of the estimated labels is improved iteratively by virtue of the steadily enhanced prototypes. Experiments on various benchmark face data sets have demonstrated the superiority of IDGL over state-of-the-art counterparts.

16 citations


Cites background or methods from "ArcFace: Additive Angular Margin Lo..."

  • ...We first compare our IDGL using the state-of-the-art Light CNN (CNN-29 model) [65] and InsightFace [37] features, i.e., IDGL+LightCNN-29 and IDGL+InsightFace, with four recent deep learning-based methods, including DeepID [32], VGG-face [31], center loss-based CNN [34], and joint and collaborative representation with local adaptive convolution feature (JCR-ACF) [35], on the unconstrained LFW data set....

    [...]

  • ...The same situation applies to IDGL+InsightFace and NN+InsightFace....

    [...]

  • ...We first compare our IDGL using the state-of-the-art Light CNN (CNN-29 model) [65] and InsightFace [37] features, i....

    [...]

  • ...For reference, we also present the results of the nearest neighbor classier using the two deep learning-based features, i.e., NN+LightCNN-29 and NN+InsightFace....

    [...]

  • ...We also leverage the NN+LightCNN-29 and NN+InsightFace as two baseline methods....

    [...]

Proceedings ArticleDOI
TL;DR: In this article, the authors explore the tension between sighted and blind people with wearable cameras, taking into account camera visibility, in-person versus remote experience, and extracted visual information.
Abstract: Blind people have limited access to information about their surroundings, which is important for ensuring one's safety, managing social interactions, and identifying approaching pedestrians. With advances in computer vision, wearable cameras can provide equitable access to such information. However, the always-on nature of these assistive technologies poses privacy concerns for parties that may get recorded. We explore this tension from both perspectives, those of sighted passersby and blind users, taking into account camera visibility, in-person versus remote experience, and extracted visual information. We conduct two studies: an online survey with MTurkers (N=206) and an in-person experience study between pairs of blind (N=10) and sighted (N=40) participants, where blind participants wear a working prototype for pedestrian detection and pass by sighted participants. Our results suggest that both of the perspectives of users and bystanders and the several factors mentioned above need to be carefully considered to mitigate potential social tensions.

16 citations

Journal ArticleDOI
TL;DR: This paper formally analyze the quantization error and proposes a simple yet effective quantization system for heatmap regression that encodes the fractional part of numerical coordinates into the ground truth heatmap using a probabilistic approach during training and decodes the predicted numerical coordinates from a set of activation points during testing.
Abstract: Heatmap regression has become the mainstream methodology for deep learning-based semantic landmark localization. Though heatmap regression is robust to large variations in pose, illumination, and occlusion, it usually suffers from a sub-pixel localization problem. Specifically, considering that the activation point indices in heatmaps are always integers, quantization error thus appears when using heatmaps as the representation of numerical coordinates. Previous methods to overcome the sub-pixel localization problem usually rely on high-resolution heatmaps. As a result, there is always a trade-off between achieving localization accuracy and computational cost. In this paper, we formally analyze the quantization error and propose a simple yet effective quantization system. The proposed quantization system induced by the randomized rounding operation 1) encodes the fractional part of numerical coordinates into the ground truth heatmap using a probabilistic approach during training; and 2) decodes the predicted numerical coordinates from a set of activation points during testing. We prove that the proposed quantization system for heatmap regression is unbiased and lossless. Experimental results on popular facial landmark localization datasets (WFLW, 300W, COFW, and AFLW) and human pose estimation datasets (MPII and COCO) demonstrate the effectiveness of the proposed method for efficient and accurate semantic landmark localization.

15 citations


Cites background from "ArcFace: Additive Angular Margin Lo..."

  • ...For example, semantic landmark localization can be used to register correspondences between spatial positions and semantics (semantic alignment), which is extremely useful in many visual recognition tasks such as face recognition [8], [11] and person re-identification [12], [13]....

    [...]

Book ChapterDOI
23 Aug 2020
TL;DR: This paper proposes a fair face recognition system with low bias by reducing the influence of gender and skin colour by adding multiple preprocessing methods to improve the dual shot face detector for obtaining target face from a given test image.
Abstract: Racial bias is an important issue in biometrics, while has not been thoroughly studied in deep face recognition. By reducing the influence of gender and skin colour, this paper proposes a fair face recognition system with low bias. First, multiple preprocessing methods are added to improve the dual shot face detector for obtaining target face from a given test image. Then, a data re-sampling approach is employed to balance the data distribution and reduce the bias based on the analysis of training data. Moreover, multiple data enhancement methods are used to increase the accuracy performance. Finally, a linear-combination strategy is adopted to benefit from mutil-model fusion. ChaLearn Looking at People Fair Face Recognition challenge is supported by ECCV 2020. Our team (ustc-nelslip) ranked 1st in the development stage and 2nd in the test stage of this challenge. The code is available at https://github.com/HaoSir/ECCV-2020-Fair-Face-Recognition-challenge_2nd_place_solution-ustc-nelslip-.

15 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations