scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI
23 Aug 2020
TL;DR: In this article, the authors proposed a method for face recognition that uses a neural network with an unblurred image at first, and then trains the network with a blurred image, using the features of the neural network trained with the unblurring image as an initial value.
Abstract: Face recognition and privacy protection are closely related. A high-quality facial image is required to achieve a high accuracy in face recognition; however, this undermines the privacy of the person being photographed. From the perspective of confidentiality, storing facial images as raw data is a problem. If a low-quality facial image is used, to protect user privacy, the accuracy of recognition decreases. In this paper, we propose a method for face recognition that solves these problems. We train a neural network with an unblurred image at first, and then train the neural network with a blurred image, using the features of the neural network trained with the unblurred image, as an initial value. This makes it possible to train features that are similar to the features trained with the neural network using a high-quality image. This enables us to perform face recognition without compromising user privacy. Our method consists of a neural network for face feature extraction, which extracts suitable features for face recognition from a blurred facial image, and a face recognition neural network. After pretraining both networks, we fine-tune them in an end-to-end manner. In experiments, the proposed method achieved accuracy comparable to that of conventional face recognition methods, which take as input unblurred face images from simulations and from images captured by our camera system.

3 citations

Proceedings ArticleDOI
13 Apr 2021
TL;DR: Zhang et al. as discussed by the authors proposed a detection based method, DeepACC, to locate and fine classify chromosomes simultaneously based on the whole metaphase image, which makes full use of prior knowledge that chromosomes usually appear in pairs.
Abstract: Chromosome classification is an important but difficult and tedious task in karyotyping. Previous methods only classify manually segmented single chromosome, which is far from clinical practice. In this work, we propose a detection based method, DeepACC, to locate and fine classify chromosomes simultaneously based on the whole metaphase image. We firstly introduce the Additive Angular Margin Loss to enhance the discriminative power of the model. To alleviate batch effects, we transform decision boundary of each class case-by-case through a siamese network which make full use of priori knowledges that chromosomes usually appear in pairs. Furthermore, we take the clinically seven group criteria as a prior-knowledge and design an additional Group Inner-Adjacency Loss to further reduce inter-class similarities. A private metaphase image dataset from clinical laboratory are collected and labelled to evaluate the performance. Results show that the new design brings encouraging performance gains comparing to the state-of-the-art baseline models.

3 citations

Journal ArticleDOI
TL;DR: In this paper , a state-of-the-art deep convolutional neural network for face recognition was used to measure the facial similarity of a large sample of people with the evaluators.
Abstract: Abstract The appraisal of trustworthiness from facial appearance of a stranger is critical for successful social interaction. Although self-resemblance is considered a significant factor affecting the perception of trustworthiness, research is yet to be conducted on whether this theory is applicable to natural unfamiliar faces in real life. We examined this aspect by using a state-of-the-art deep convolutional neural network for face recognition to measure the facial similarity of a large sample of people with the evaluators. We found that the more they resembled the rater, the more trustworthy they were evaluated if they were of the same sex as the rater. Contrarily, when the stranger was of the opposite sex, self-resemblance did not affect trustworthiness ratings. These results demonstrate that self-resemblance is an important factor affecting our social judgments of especially same-sex people in real life.

3 citations

Proceedings ArticleDOI
01 Jun 2022
TL;DR: In this article , the generalized margin-based softmax loss function is decomposed into two computational graphs and a constant, and a general searching framework built upon the evolutionary algorithm is proposed to search for the loss function efficiently.
Abstract: Person re-identification is a hot topic in computer vision, and the loss function plays a vital role in improving the discrimination of the learned features. However, most existing models utilize the hand-crafted loss functions, which are usually sub-optimal and challenging to be designed. In this paper, we propose a novel method, AutoLoss-GMS, to search the better loss function in the space of generalized margin-based softmax loss function for person reidentification automatically. Specifically, the generalized margin-based softmax loss function is first decomposed into two computational graphs and a constant. Then a general searching framework built upon the evolutionary algorithm is proposed to search for the loss function efficiently. The computational graph is constructed with a forward method, which can construct much richer loss function forms than the backward method used in existing works. In addition to the basic in-graph mutation operations, the cross-graph mutation operation is designed to further improve the offspring's diversity. The loss-rejection protocol, equivalence-check strategy and the predictor-based promising-loss chooser are developed to improve the search efficiency. Finally, experimental results demonstrate that the searched loss functions can achieve state-of-the-art performance and be transferable across different models and datasets in person re-identification.

3 citations

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper reported an extensive study on latent variable evolution (LVE), a method commonly used to generate master faces and showed that simulated presentation attacks using generated master faces generally preserved the false matching ability of their original digital forms, thus demonstrating that the existence of master faces poses an actual threat.
Abstract: Face authentication is now widely used, especially on mobile devices, rather than authentication using a personal identification number or an unlock pattern, due to its convenience. It has thus become a tempting target for attackers using a presentation attack. Traditional presentation attacks use facial images or videos of the victim. Previous work has proven the existence of master faces, i.e., faces that match multiple enrolled templates in face recognition systems, and their existence extends the ability of presentation attacks. In this paper, we report an extensive study on latent variable evolution (LVE), a method commonly used to generate master faces. An LVE algorithm was run under various scenarios and with more than one database and/or face recognition system to identify the properties of master faces and to clarify under which conditions strong master faces can be generated. On the basis of analysis, we hypothesize that master faces originate in dense areas in the embedding spaces of face recognition systems. Last but not least, simulated presentation attacks using generated master faces generally preserved the false matching ability of their original digital forms, thus demonstrating that the existence of master faces poses an actual threat.

3 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations