scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

15 Jun 2019-pp 4690-4699
TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: Barista is presented, an automated toolflow that provides seamless integration of FPGAs into the training of CNNs with the popular deep learning framework Caffe, providing the necessary infrastructure for further research and development.
Abstract: As the complexity of deep learning (DL) models increases, their compute requirements increase accordingly Deploying a Convolutional Neural Network (CNN) involves two phases: training and inference With the inference task typically taking place on resource-constrained devices, a lot of research has explored the field of low-power inference on custom hardware accelerators On the other hand, training is both more compute- and memory-intensive and is primarily performed on power-hungry GPUs in large-scale data centres CNN training on FPGAs is a nascent field of research This is primarily due to the lack of tools to easily prototype and deploy various hardware and/or algorithmic techniques for power-efficient CNN training This work presents Barista, an automated toolflow that provides seamless integration of FPGAs into the training of CNNs within the popular deep learning framework Caffe To the best of our knowledge, this is the only tool that allows for such versatile and rapid deployment of hardware and algorithms for the FPGA-based training of CNNs, providing the necessary infrastructure for further research and development

6 citations


Cites background from "ArcFace: Additive Angular Margin Lo..."

  • ...Convolutional Neural Networks (CNNs) are one of the primary components across a wide variety of AI tasks, from face recognition [1] to drone navigation [2]....

    [...]

Proceedings ArticleDOI
06 May 2021
TL;DR: In this paper, the authors present a new dataset of makeup presentation attacks with the purpose of impersonation and identity concealment, focusing on seemingly highly skilled makeup artists, and a vulnerability assessment of face recognition with respect to probe images contained in the collected dataset is conducted on state-of-theart open source and commercial off-the-shelf facial recognition systems.
Abstract: Facial appearance can be substantially altered through the application of facial cosmetics. In addition to the widespread, socially acceptable, and in some cases even expected use for the purpose of beautification, facial cosmetics can be abused to launch so-called makeup presentation attacks. Thus far, the potential of such attack instruments has generally been claimed to be relatively low based on experimental evaluations on available datasets. This paper presents a new dataset of such attacks with the purpose of impersonation and identity concealment. The images have been collected from online sources, concentrating on seemingly highly skilled makeup artists. A vulnerability assessment of face recognition with respect to probe images contained in the collected dataset is conducted on state-of-the-art open source and commercial off-the-shelf facial recognition systems with a standardised methodology and metrics. The obtained results are especially striking for the impersonation attacks: the obtained attack success chance of almost 70% at a fixed decision threshold corresponding to 0.1% false match rate is significantly higher than results previously reported in the scientific literature.

6 citations

Book ChapterDOI
01 Jan 2022
TL;DR: Yin et al. as mentioned in this paper proposed a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities, i.e., high-resolution video generation, disentangled control by driving video or audio, and flexible face editing.
Abstract: One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image, driven by a video or an audio segment. In this work, we provide a solution from a novel perspective that differs from existing frameworks. We first investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties. Upon the observation, we propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities, i.e., high-resolution video generation, disentangled control by driving video or audio, and flexible face editing. Our framework elevates the resolution of the synthesized talking face to 1024 $$\times $$ 1024 for the first time, even though the training dataset has a lower resolution. Moreover, our framework allows two types of facial editing, i.e., global editing via GAN inversion and intuitive editing via 3D morphable models. Comprehensive experiments show superior video quality and flexible controllability over state-of-the-art methods. Code is available at https://github.com/FeiiYin/StyleHEAT.

6 citations

Proceedings ArticleDOI
01 Jun 2022
TL;DR: This paper improves the backbone FR network by introducing a 3D face reconstruction loss with two auxiliary networks and proposes to enhance face recognition with a bypass of self-supervised 3D reconstruction, which enforces the neural backbone to focus on the Identity-related depth and albedo information while neglects the identity-irrelevant pose and illumination information.
Abstract: Attributed to both the development of deep networks and abundant data, automatic face recognition (FR) has quickly reached human-level capacity in the past few years. However, the FR problem is not perfectly solved in case of uncontrolled illumination and pose. In this paper, we propose to enhance face recognition with a bypass of self-supervised 3D reconstruction, which enforces the neural backbone to focus on the identity-related depth and albedo information while neglects the identity-irrelevant pose and illumination information. Specifically, inspired by the physical model of image formation, we improve the backbone FR network by introducing a 3D face reconstruction loss with two auxiliary networks. The first one estimates the pose and illumination from the input face image while the second one decodes the canonical depth and albedo from the intermediate feature of the FR backbone network. The whole network is trained in end-to-end manner with both classic face identification loss and the loss of 3D face reconstruction with the physical parameters. In this way, the self-supervised reconstruction acts as a regularization that enables the recognition network to understand faces in 3D view, and the learnt features are forced to encode more information of canonical facial depth and albedo, which is more intrinsic and beneficial to face recognition. Extensive experimental results on various face recognition benchmarks show that, without any cost of extra annotations and computations, our method outperforms state-of-the-art ones. Moreover, the learnt representations can also well generalize to other face-related downstream tasks such as the facial attribute recognition with limited labeled data.

6 citations

Journal ArticleDOI
TL;DR: In this paper , the authors proposed the concept of applying technical indicators (TIs) to the LSTM-attention time series model for stock price prediction, which reached a maximum accuracy of 68.83% in the accuracy of stock trend prediction.
Abstract: With the development of the Internet, information on the stock market has gradually become transparent, and stock information is easy to obtain. For investors, investment performance depends on the amount of capital and effective trading strategies. The analysis tool commonly used by investors and securities analysts is technical analysis (TA). Technical analysis is the study of past and current financial market information, and a large amount of statistical data is used to predict price trends and determine trading strategies. Technical indicators (TIs) are a type of technical analysis that summarizes possible future trends of stock prices based on historical statistical data to assist investors in making decisions. The stock price trend is a typical time series data with special characteristics such as trend, seasonality, and periodicity. In recent years, time series deep neural networks (DNNs) have demonstrated their powerful performance in machine translation, speech processing, and natural language processing fields. This research proposes the concept of attention-based BiLSTM (AttBiLSTM) applied to trading strategy design and verified the effectiveness of a variety of TIs, including stochastic oscillator, RSI, BIAS, W%R, and MACD. This research also proposes two trading strategies that suitable for DNN, combining with TIs and verifying their effectiveness. The main contributions of this research are as follows: (1) As our best knowledge, this is the first research to propose the concept of applying TIs to the LSTM-attention time series model for stock price prediction. (2) This study introduces five well-known TIs, which reached a maximum of 68.83% in the accuracy of stock trend prediction. (3) This research introduces the concept of exporting the probability of the deep model to the trading strategy. On the backtest of TPE0050, the experimental results reached the highest return on investment of 42.74%. (4) This research concludes from an empirical point of view that technical analysis combined with time series deep neural network has significant effects in stock price prediction and return on investment.

6 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Posted Content
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

10,447 citations