scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
21 Apr 2020
TL;DR: This work explores this tension from both perspectives, those of sighted passersby and blind users, taking into account camera visibility, in-person versus remote experience, and extracted visual information.
Abstract: Blind people have limited access to information about their surroundings, which is important for ensuring one's safety, managing social interactions, and identifying approaching pedestrians. With advances in computer vision, wearable cameras can provide equitable access to such information. However, the always-on nature of these assistive technologies poses privacy concerns for parties that may get recorded. We explore this tension from both perspectives, those of sighted passersby and blind users, taking into account camera visibility, in-person versus remote experience, and extracted visual information. We conduct two studies: an online survey with MTurkers (N=206) and an in-person experience study between pairs of blind (N=10) and sighted (N=40) participants, where blind participants wear a working prototype for pedestrian detection and pass by sighted participants. Our results suggest that both of the perspectives of users and bystanders and the several factors mentioned above need to be carefully considered to mitigate potential social tensions.

26 citations


Cites methods from "ArcFace: Additive Angular Margin Lo..."

  • ...ng.” Implementation The server application was written in Python with the Flask framework and run on a GPU-powered Ubuntu 18.04 Linux system. For face detection and alignment, we employed Insightface [8] built upon the Multi-task Cascaded Convolutional Networks (MTCNNs) [41] that detects faces. This model was composed of several models that not only detect face embedding (512-dimension data) but also...

    [...]

Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this article, a false positive rate penalty loss was proposed to increase the consistency of instance False Positive Rate (FPR) by adding an additional penalty term, which is in proportion to the ratio of instance FPR overall FPR, into the denominator of the softmax-based loss.
Abstract: Demographic bias is a significant challenge in practical face recognition systems. Existing methods heavily rely on accurate demographic annotations. However, such annotations are usually unavailable in real scenarios. Moreover, these methods are typically designed for a specific demographic group and are not general enough. In this paper, we propose a false positive rate penalty loss, which mitigates face recognition bias by increasing the consistency of instance False Positive Rate (FPR). Specifically, we first define the instance FPR as the ratio between the number of the non-target similarities above a unified threshold and the total number of the non-target similarities. The unified threshold is estimated for a given total FPR. Then, an additional penalty term, which is in proportion to the ratio of instance FPR overall FPR, is introduced into the denominator of the softmax-based loss. The larger the instance FPR, the larger the penalty. By such unequal penalties, the instance FPRs are supposed to be consistent. Compared with the previous debiasing methods, our method requires no demographic annotations. Thus, it can mitigate the bias among demographic groups divided by various attributes, and these attributes are not needed to be previously predefined during training. Extensive experimental results on popular benchmarks demonstrate the superiority of our method over state-of-the-art competitors. Code and pre-trained models are available at https://github.com/xkx0430/FairnessFR.

26 citations

Posted Content
TL;DR: The Google Landmarks Dataset v2 (GLDv2) as mentioned in this paper is a large-scale, fine-grained instance recognition and image retrieval dataset for the domain of human-made and natural landmarks.
Abstract: While image retrieval and instance recognition techniques are progressing rapidly, there is a need for challenging datasets to accurately measure their performance -- while posing novel challenges that are relevant for practical applications. We introduce the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels. Its test set consists of 118k images with ground truth annotations for both the retrieval and recognition tasks. The ground truth construction involved over 800 hours of human annotator work. Our new dataset has several challenging properties inspired by real world applications that previous datasets did not consider: An extremely long-tailed class distribution, a large fraction of out-of-domain test photos and large intra-class variability. The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos. We provide baseline results for both recognition and retrieval tasks based on state-of-the-art methods as well as competitive results from a public challenge. We further demonstrate the suitability of the dataset for transfer learning by showing that image embeddings trained on it achieve competitive retrieval performance on independent datasets. The dataset images, ground-truth and metric scoring code are available at this https URL.

26 citations

Posted Content
TL;DR: A novel model-based gait recognition method, JointsGait, which extracts gait information from human body joints using gait graph convolutional networks constructed by 18 2-D joints, which are less interfered by external factors.
Abstract: Gait, as one of unique biometric features, has the advantage of being recognized from a long distance away, can be widely used in public security. Considering 3D pose estimation is more challenging than 2D pose estimation in practice , we research on using 2D joints to recognize gait in this paper, and a new model-based gait recognition method JointsGait is put forward to extract gait information from 2D human body joints. Appearance-based gait recognition algorithms are prevalent before. However, appearance features suffer from external factors which can cause drastic appearance variations, e.g. clothing. Unlike previous approaches, JointsGait firstly extracted spatio-temporal features from 2D joints using gait graph convolutional networks, which are less interfered by external factors. Secondly, Joints Relationship Pyramid Mapping (JRPM) are proposed to map spatio-temporal gait features into a discriminative feature space with biological advantages according to the relationship of human joints when people are walking at various scales. Finally, we design a fusion loss strategy to help the joints features to be insensitive to cross-view. Our method is evaluated on two large datasets, Kinect Gait Biometry Dataset and CASIA-B. On Kinect Gait Biometry Dataset database, JointsGait only uses corresponding 2D coordinates of joints, but achieves satisfactory recognition accuracy compared with those model-based algorithms using 3D joints. On CASIA-B database, the proposed method greatly outperforms advanced model-based methods in all walking conditions, even performs superior to state-of-art appearance-based methods when clothing seriously affect people's appearance. The experimental results demonstrate that JointsGait achieves the state-of-art performance despite the low dimensional feature (2D body joints) and is less affected by the view variations and clothing variation.

26 citations


Cites background from "ArcFace: Additive Angular Margin Lo..."

  • ...Arcface loss[36], as improvement of softmax loss, is proposed to solve face recognition....

    [...]

Journal ArticleDOI
TL;DR: This paper addresses the problem of face recognition using convolutional neural networks for daily attendance taking using data augmentation through geometric transformation, image brightness changes, and the application of different filter operations, and determines the best data Augmentation method based on orthogonal experiments.
Abstract: Class attendance is an important means in the management of university students. Using face recognition is one of the most effective techniques for taking daily class attendance. Recently, many face recognition algorithms via deep learning have achieved promising results with large-scale labeled samples. However, due to the difficulties of collecting samples, face recognition using convolutional neural networks (CNNs) for daily attendance taking remains a challenging problem. Data augmentation can enlarge the samples and has been applied to the small sample learning. In this paper, we address this problem using data augmentation through geometric transformation, image brightness changes, and the application of different filter operations. In addition, we determine the best data augmentation method based on orthogonal experiments. Finally, the performance of our attendance method is demonstrated in a real class. Compared with PCA and LBPH methods with data augmentation and VGG-16 network, the accuracy of our proposed method can achieve 86.3%. Additionally, after a period of collecting more data, the accuracy improves to 98.1%.

26 citations


Cites background from "ArcFace: Additive Angular Margin Lo..."

  • ...The authors of [17] propose an Additive Angular Margin Loss function that can effectively enhance the discriminative power of feature embeddings learned via CNNs for face recognition....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations