scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
19 Oct 2022
TL;DR: This work introduces a two-stage cluster and aggregate paradigm, Cluster and Aggregate, that can both scale to large N and maintain the ability to perform sequential inference with order invariance and shows the proposed CAFace outperforms baselines on unconstrained face datasets such as IJB-B andIJB-S.
Abstract: Feature fusion plays a crucial role in unconstrained face recognition where inputs (probes) comprise of a set of $N$ low quality images whose individual qualities vary. Advances in attention and recurrent modules have led to feature fusion that can model the relationship among the images in the input set. However, attention mechanisms cannot scale to large $N$ due to their quadratic complexity and recurrent modules suffer from input order sensitivity. We propose a two-stage feature fusion paradigm, Cluster and Aggregate, that can both scale to large $N$ and maintain the ability to perform sequential inference with order invariance. Specifically, Cluster stage is a linear assignment of $N$ inputs to $M$ global cluster centers, and Aggregation stage is a fusion over $M$ clustered features. The clustered features play an integral role when the inputs are sequential as they can serve as a summarization of past features. By leveraging the order-invariance of incremental averaging operation, we design an update rule that achieves batch-order invariance, which guarantees that the contributions of early image in the sequence do not diminish as time steps increase. Experiments on IJB-B and IJB-S benchmark datasets show the superiority of the proposed two-stage paradigm in unconstrained face recognition. Code and pretrained models are available in https://github.com/mk-minchul/caface

5 citations

Posted Content
TL;DR: Zhang et al. as mentioned in this paper proposed the Unknown Identity Rejection (UIR) loss to utilize the unlabeled data to improve the discriminative performance of face recognition.
Abstract: Face recognition has advanced considerably with the availability of large-scale labeled datasets. However, how to further improve the performance with the easily accessible unlabeled dataset remains a challenge. In this paper, we propose the novel Unknown Identity Rejection (UIR) loss to utilize the unlabeled data. We categorize identities in unconstrained environment into the known set and the unknown set. The former corresponds to the identities that appear in the labeled training dataset while the latter is its complementary set. Besides training the model to accurately classify the known identities, we also force the model to reject unknown identities provided by the unlabeled dataset via our proposed UIR loss. In order to 'reject' faces of unknown identities, centers of the known identities are forced to keep enough margin from centers of unknown identities which are assumed to be approximated by the features of their samples. By this means, the discriminativeness of the face representations can be enhanced. Experimental results demonstrate that our approach can provide obvious performance improvement by utilizing the unlabeled data.

5 citations

Proceedings ArticleDOI
01 Dec 2018
TL;DR: This paper proposes to use Instance-Batch-Normalization (IBN) block to improve cross-domain FR performance and shows that the proposed IBN-CNN models made state-of-the-art performance on MegaFace evaluation.
Abstract: Domain adaptation is one of the major challenges for face recognition (FR). Most large-scale FR training datasets are built from massive images crawled from the Internet, while in practical applications face images come from specific scenarios. Especially, for applications like ID card verification, registered face images are taken in controlled environments while probe face images are not. There are different distributions between source domain and target domain. In this paper, we propose to use Instance-Batch-Normalization (IBN) block to improve cross-domain FR performance. A million-scale cross-domain test set named IDCard-Scene-1M is used for evaluation. CNN models are trained with an improved loss function which we call L2-ASoftmax Loss. Without using any data from the target domain, IBN-block increased recall rates (@FPR = 10−6) on IDCard-Scene-1M by 1 percentage point for different CNN models. Besides, experiments show that the proposed IBN-CNN models trained with L2-ASoftmax Loss made state-of-the-art performance on MegaFace evaluation.

5 citations


Cites background or methods from "ArcFace: Additive Angular Margin Lo..."

  • ...ArcFace (also called Insightface) [6] improved A-Softmax by normalizing the features and choosing reasonable margin...

    [...]

  • ...For example, several studies ([17], [6], [32]) have achieved 99....

    [...]

  • ...Cosineface[32] and ArcFace (also called Insightface) [6] improved A-Softmax by normalizing the features and choosing reasonable margin to enhance discrimination....

    [...]

  • ...In this paper, we use AgeDB-30 provided by DeepInsight [6] as test data....

    [...]

  • ...In the MegaFace’s FaceScrub evaluation, we use the cleaned test list released by DeepInsight [6]....

    [...]

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a heterogeneous task decoupling (HTD) framework for object detection, which utilizes a Progressive Graph (PGraph) module and a Border-aware Adaptation (BA) module for task-decoupling.
Abstract: Decoupling the sibling head has recently shown great potential in relieving the inherent task-misalignment problem in two-stage object detectors. However, existing works design similar structures for the classification and regression, ignoring task-specific characteristics and feature demands. Besides, the shared knowledge that may benefit the two branches is neglected, leading to potential excessive decoupling and semantic inconsistency. To address these two issues, we propose Heterogeneous task decoupling (HTD) framework for object detection, which utilizes a Progressive Graph (PGraph) module and a Border-aware Adaptation (BA) module for task-decoupling. Specifically, we first devise a Semantic Feature Aggregation (SFA) module to aggregate global semantics with image-level supervision, serving as the shared knowledge for the task-decoupled framework. Then, the PGraph module performs progressive graph reasoning, including local spatial aggregation and global semantic interaction, to enhance semantic representations of region proposals for classification. The proposed BA module integrates multi-level features adaptively, focusing on the low-level border activation to obtain representations with spatial and border perception for regression. Finally, we utilize the aggregated knowledge from SFA to keep the instance-level semantic consistency (ISC) of decoupled frameworks. Extensive experiments demonstrate that HTD outperforms existing detection works by a large margin, and achieves single-model 50.4%AP and 33.2% AP s on COCO test-dev set using ResNet-101-DCN backbone, which is the best entry among state-of-the-arts under the same configuration. Our code is available at https://github.com/CityU-AIM-Group/HTD .

5 citations

Proceedings ArticleDOI
12 Oct 2020
TL;DR: A novel Privacy-sensitive Objects Pixelation (PsOP) framework for automatic personal privacy filtering during live video streaming and can significantly reduce the over-pixelation ratio in privacy-sensitive object pixelation.
Abstract: With the prevailing of live video streaming, establishing an online pixelation method for privacy-sensitive objects is an urgency. Caused by the inaccurate detection of privacy-sensitive objects, simply migrating the tracking-by-detection structure applied in offline pixelation into the online form will incur problems in target initialization, drifting, and over-pixelation. To cope with the inevitable but impacting detection issue, we propose a novel Privacy-sensitive Objects Pixelation (PsOP) framework for automatic personal privacy filtering during live video streaming. Leveraging pre-trained detection networks, our PsOP is extendable to any potential privacy-sensitive objects pixelation. Employing the embedding networks and the proposed Positioned Incremental Affinity Propagation (PIAP) clustering algorithm as the backbone, our PsOP unifies the pixelation of discriminating and indiscriminating pixelation objects through trajectories generation. In addition to the pixelation accuracy boosting, experiment results on the streaming video data we built show that the proposed PsOP can significantly reduce the over-pixelation ratio in privacy-sensitive object pixelation.

5 citations


Cites methods from "ArcFace: Additive Angular Margin Lo..."

  • ...Listed at the bottom line of Table 3, the boxed PsOP substitutes the detection and embedding network with more advanced PrimidBox [19], ArcFace [5]....

    [...]

  • ...Table 2: Pixelation results on collected live video streaming dataset Method SOPA↑ SOPA↑ SOPA↑ SOPA↑ SOPP↑ SOPP↑ SOPP↑ SOPP↑ Entire Dataset (𝐻𝑆) (𝐿𝑆) (𝐻𝑁 ) (𝐿𝑁 ) (𝐻𝑆) (𝐿𝑆) (𝐻𝑁 ) (𝐿𝑁 ) MP↑(frames) OPR↓ YouTube [9] 0.45 0.42 0.56 0.49 0.53 0.47 0.77 0.63 238 0.56 Azure [13] 0.43 0.47 0.54 0.53 0.50 0.53 0.70 0.68 203 0.54 KCF [10] 0.35 0.32 0.38 0.31 0.41 0.40 0.44 0.40 113 0.64 ECO [4] 0.27 0.28 0.34 0.31 0.37 0.39 0.41 0.40 148 0.59 PsOP (MTCNN+CosFace) 0.63 0.60 0.71 0.66 0.80 0.77 0.86 0.85 362 0.34 PsOP (Pyramid+ArcFace) 0.63 0.62 0.71 0.65 0.80 0.79 0.86 0.85 387 0.34 "↑"&"↓"s.tand for the higher the better and the lower the better respectively....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations