scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this article , a new method for masked face recognition is proposed that combines a cropping-based approach (upper half of the face) with an improved VGG-16 architecture.
Abstract: With the recent COVID-19 pandemic, wearing masks has become a necessity in our daily lives. People are encouraged to wear masks to protect themselves from the outside world and thus from infection with COVID-19. The presence of masks raised serious concerns about the accuracy of existing facial recognition systems since most of the facial features are obscured by the mask. To address these challenges, a new method for masked face recognition is proposed that combines a cropping-based approach (upper half of the face) with an improved VGG-16 architecture. The finest features from the un-occluded facial region are extracted using a transfer learned VGG-16 model (Forehead and eyes). The optimal cropping ratio is investigated to give an enhanced feature representation for recognition. To avoid the overhead of bias, the obtained feature vector is mapped into a lower-dimensional feature representation using a Random Fourier Feature extraction module. Comprehensive experiments on the Georgia Tech Face Dataset, Head Pose Image Dataset, and Face Dataset by Robotics Lab show that the proposed approach outperforms other state-of-the-art approaches for masked face recognition.

4 citations

Posted Content
TL;DR: The geographic performance differentials—differences in false acceptance and false rejection rates across different countries—when comparing selfies against photos from ID documents are studied and sampling strategies are used to mitigate these differentials.
Abstract: As face recognition algorithms become more accurate and get deployed more widely, it becomes increasingly important to ensure that the algorithms work equally well for everyone. We study the geographic performance differentials-differences in false acceptance and false rejection rates across different countries-when comparing selfies against photos from ID documents. We show how to mitigate geographic performance differentials using sampling strategies despite large imbalances in the dataset. Using vanilla domain adaptation strategies to fine-tune a face recognition CNN on domain-specific doc-selfie data improves the performance of the model on such data, but, in the presence of imbalanced training data, also significantly increases the demographic bias. We then show how to mitigate this effect by employing sampling strategies to balance the training procedure.

4 citations


Cites background or methods from "ArcFace: Additive Angular Margin Lo..."

  • ...Recently, large-margin classification losses such as ArcFace [4] and SphereFace [11] were shown to lead to better results on publicly available datasets than triplet loss, especially when combined with large-scale classification and weight-imprinting [14]....

    [...]

  • ...Following [4] we utilize for the embedding network a ResNet-100 architecture with a BN [8]-Dropout [19]-BN-FC structure after the last convolutional layer....

    [...]

  • ...The authors showed how to improve the performance of ArcFace by fine-tuning on their dataset....

    [...]

  • ...The baseline ResNet-100 model was trained on the publicly available MS1MV2 dataset and achieves 99.77% on the LFW benchmark [7] and 98.47% on the MegaFace [9] benchmark, but it performs poorly when evaluated on selfiedoc pairs....

    [...]

  • ...The network is first trained on MS1MV2 [4], a semiautomatically refined version of the MS-Celeb-1M dataset [6]....

    [...]

Proceedings ArticleDOI
07 Jan 2020
TL;DR: This work study different open-source implementations of the typical components of state-of-the-art face recognition pipelines, including face detection, feature extraction and classification, and propose a recognition system integrating the most suitable methods for their utilization in assistant robots.
Abstract: Assistive robots collaborating with people demand strong Human-Robot interaction capabilities. In this way, recognizing the person the robot has to interact with is paramount to provide a personalized service and reach a satisfactory end-user experience. To this end, face recognition: a non-intrusive, automatic mechanism of identification using biometric identifiers from an user's face, has gained relevance in the recent years, as the advances in machine learning and the creation of huge public datasets have considerably improved the state-of-the-art performance. In this work we study different open-source implementations of the typical components of state-of-the-art face recognition pipelines, including face detection, feature extraction and classification, and propose a recognition system integrating the most suitable methods for their utilization in assistant robots. Concretely, for face detection we have considered MTCNN, OpenCV's DNN, and OpenPose, while for feature extraction we have analyzed InsightFace and Facenet. We have made public an implementation of the proposed recognition framework, ready to be used by any robot running the Robot Operating System (ROS). The methods in the spotlight have been compared in terms of accuracy and performance in common benchmark datasets, namely FDDB and LFW, to aid the choice of the final system implementation, which has been tested in a real robotic platform.

4 citations


Cites background or methods from "ArcFace: Additive Angular Margin Lo..."

  • ...For selecting the components of the system we have carried out an analysis of the most prominent state-of-the-art methods with available open-source implementations, including MTCNN [32], OpenCV’s DNN [2], and OpenPose [4] for face detection, and InsightFace [6] and Facenet [3] for feature extraction....

    [...]

  • ...83% in just 3 years on the LFW repository [6, 12, 28]....

    [...]

  • ...InsightFace presents itself as the most accurate and obvious choice for unconstrained FR, as it provides additional margin between classes....

    [...]

  • ...The early achievements of ML, marked by the success of object recognition in the ImageNet challenge [5, 15], have encouraged multiple advances in the field of FR such as: the creation of huge public datasets [3, 10, 14], new loss functions applicable to FR [6, 17, 25, 30], or the use of multiple network architectures [11, 26, 27]....

    [...]

  • ...The system was tested using MTCNN as face detector and InsightFace as feature extractor....

    [...]

Proceedings ArticleDOI
31 Oct 2022
TL;DR: In this article , the authors propose data hallucination teaching (DHT) where the teacher can generate input data intelligently based on labels, the learner's status and the target concept.
Abstract: We consider the problem of iterative machine teaching, where a teacher sequentially provides examples based on the status of a learner under a discrete input space (i.e., a pool of finite samples), which greatly limits the teacher's capability. To address this issue, we study iterative teaching under a continuous input space where the input example (i.e., image) can be either generated by solving an optimization problem or drawn directly from a continuous distribution. Specifically, we propose data hallucination teaching (DHT) where the teacher can generate input data intelligently based on labels, the learner's status and the target concept. We study a number of challenging teaching setups (e.g., linear/neural learners in omniscient and black-box settings). Extensive empirical results verify the effectiveness of DHT.

4 citations

Proceedings ArticleDOI
05 May 2022
TL;DR: The AdaTriplet loss – an extension of TL whose gradients adapt to difficulty levels of negative samples, and the AutoMargin method – a technique to adjust hyperparameters of margin-based losses such as TL and the authors' proposed loss dynamically are introduced.
Abstract: This paper tackles the challenge of forensic medical image matching (FMIM) using deep neural networks (DNNs). FMIM is a particular case of content-based image retrieval (CBIR). The main challenge in FMIM compared to the general case of CBIR, is that the subject to whom a query image belongs may be affected by aging and progressive degenerative disorders, making it difficult to match data on a subject level. CBIR with DNNs is generally solved by minimizing a ranking loss, such as Triplet loss (TL), computed on image representations extracted by a DNN from the original data. TL, in particular, operates on triplets: anchor, positive (similar to anchor) and negative (dissimilar to anchor). Although TL has been shown to perform well in many CBIR tasks, it still has limitations, which we identify and analyze in this work. In this paper, we introduce (i) the AdaTriplet loss -- an extension of TL whose gradients adapt to different difficulty levels of negative samples, and (ii) the AutoMargin method -- a technique to adjust hyperparameters of margin-based losses such as TL and our proposed loss dynamically. Our results are evaluated on two large-scale benchmarks for FMIM based on the Osteoarthritis Initiative and Chest X-ray-14 datasets. The codes allowing replication of this study have been made publicly available at \url{https://github.com/Oulu-IMEDS/AdaTriplet}.

4 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations