scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI
TL;DR: In this article , the authors provide an overview of morphing attack detection algorithms and metrics to measure and compare their performance, and state-of-the-art detection methods are evaluated in a comprehensive cross-database experiments considering various realistic image post-processing.
Abstract: Abstract Morphing attacks pose a serious threat to face recognition systems, especially in the border control scenario. In order to guarantee a secure operation of face recognition algorithms in the future, it is necessary to be able to reliably detect morphed facial images and thus be able to reject them during enrolment or verification. This chapter provides an overview of morphing attack detection algorithms and metrics to measure and compare their performance. Different concepts of morphing attack detection are introduced and state-of-the-art detection methods are evaluated in a comprehensive cross-database experiments considering various realistic image post-processings.

3 citations

Proceedings ArticleDOI
18 Sep 2022
TL;DR: A general and efficient norm-constrained score-level ensemble method which jointly processes the scores extracted from ASV and CM subsystems, improving robustness to both zero-effect imposters and spoof-ing attacks is investigated.
Abstract: In this paper, we present the Elevoc systems submitted to the Spoofing Aware Speaker Verification Challenge (SASVC) 2022. Our submissions focus on bridge the gap between the automatic speaker verification (ASV) and countermeasure (CM) systems. We investigate a general and efficient norm-constrained score-level ensemble method which jointly processes the scores extracted from ASV and CM subsystems, improving robustness to both zero-effect imposters and spoof-ing attacks. Furthermore, we explore that the ensemble system can provide better performance when both ASV and CM subsystems are optimized. Experimental results show that our primary system yields 0.45% SV-EER, 0.26% SPF-EER and 0.37% SASV-EER, and obtains more than 96.08%, 66.67% and 94.19% relative improvements over the best performing baseline systems on the SASVC 2022 evaluation set. All of our code and pre-trained models weights are publicly available and reproducible 1 .

3 citations

Proceedings ArticleDOI
30 May 2021
TL;DR: In this article, an ensemble of efficient single-class instance detectors capable of fast and incremental adaptation to new object sets is proposed, which can be obtained within less than 40 minutes on a consumer GPU while only a small percentage of the existing detection models need to be updated.
Abstract: Object instance detection is a highly relevant task to several robotic applications such as automated order picking, or household and hospital assistance robots. In these applications, a holistic scene labeling is often not required whereas it is sufficient to find a certain object type of interest, e.g. for picking it up. At the same time, large and continuously changing object sets are characteristic in such applications, requiring efficient model update capabilities from the object detector. Today’s monolithic multi-class detectors do not fulfill this criterion for fast and flexible model updates.This paper introduces InstanceNet, an ensemble of efficient single-class instance detectors capable of fast and incremental adaptation to new object sets. Due to a dynamic sampling-based training strategy, accurate detection models for new objects can be obtained within less than 40 minutes on a consumer GPU while only a small percentage of the existing detection models needs to be updated in a very efficient manner. The new detector has been thoroughly evaluated on the basis of a novel dataset of 100 grocery store objects.

3 citations

Proceedings ArticleDOI
01 Sep 2021
TL;DR: In this paper, the authors presented a new face database consisting of 400 pairs of doppelganger images and evaluated two state-of-the-art face recognition systems on said database and other public datasets, including the Disguised Faces in The Wild (DFW) database.
Abstract: Lookalikes, a.k.a. doppelgangers, increase the probability of false matches in a facial recognition system, in contrast to random face image pairs selected for non-mated comparison trials. In order to analyse and improve the robustness of automated face recognition, datasets of doppelganger face image pairs are needed. In this work, we present a new face database consisting of 400 pairs of doppelganger images. Subsequently, two state-of-the-art face recognition systems are evaluated on said database and other public datasets, including the Disguised Faces in The Wild (DFW) database. It is found that the collected image pairs yield very high similarity scores resulting in a significant increase of false match rates. To facilitate reproducible research and future experiments in this field, the dataset is made available.

3 citations

Proceedings ArticleDOI
19 Jul 2020
TL;DR: This work proposes a method for reducing computational costs by enabling a single capsule to represent multiple object classes in CapsNet, and incorporates the ArcFace distance learning method in the error function.
Abstract: The Capsule Network (CapsNet) is a deep learning model proposed for image classification that is robust to pose of change of objects in images. A capsule is a vector representing the position, size and presence of an object. However, with CapsNet, the number of capsules increases, depending on the number of classification classes, and learning is computationally expensive. Thus, we propose a method for reducing computational costs by enabling a single capsule to represent multiple object classes. To learn the distance between classes, we incorporate the ArcFace distance learning method in the error function. In a preliminary experiment, the distribution of capsules was visualised by principal component analysis to demonstrate the validity of the proposed method. Using the MNIST and CIFAR-10 datasets, as well as an the affine transformed dataset, we compare the accuracy and learning time of the original CapsNet and proposed method. The results demonstrate that accuracy is improved by 2.74% on the CIFAR-10 dataset, and the learning time is reduced by more than 19% in both datasets.

3 citations


Cites methods from "ArcFace: Additive Angular Margin Lo..."

  • ...Four experiments were conducted; comparison experiment with various hyperparameters was conducted to observe the change in accuracy rate caused by varying the ArcFace hyperparameters s and m; comparison experiment with conventional CapsNet confirmed the proposed method’s validity; visualisation experiment of the super-capsule demonstrated that the super-capsule of the proposed method distributed in whole feature space; image reconstruction experiment presented that CpasNet is not suitable for image reconstruction; The experiments were conducted using the MNIST, CIFAR10, and their deformed datasets....

    [...]

  • ...TABLE I, TABLE II present the change in accuracy rate caused by varying the ArcFace hyperparameters s and m....

    [...]

  • ...In the proposed method, we employ the ArcFace [7] metric learning loss function to calculate the similarity between a class representative vector and a single super-capsule....

    [...]

  • ...With this change, the margin loss is not available; thus, we also propose a new loss function based on ArcFace, a type of metric learning....

    [...]

  • ...9 presents an overview of ArcFace....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations