scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Crafting A Panoptic Face Presentation Attack Detector

TL;DR: This paper designs a deep learning based panoptic algorithm for detection of both digital and physical presentation attacks using Cross Asymmetric Loss Function (CALF) and shows superior performance in three scenarios: ubiquitous environment, individual databases, and cross-attack/cross-database.
Abstract: With the advancements in technology and growing popularity of facial photo editing in the social media landscape, tools such as face swapping and face morphing have become increasingly accessible to the general public. It opens up the possibilities for different kinds of face presentation attacks, which can be taken advantage of by impostors to gain unauthorized access of a biometric system. Moreover, the wide availability of 3D printers has caused a shift from print attacks to 3D mask attacks. With increasing types of attacks, it is necessary to come up with a generic and ubiquitous algorithm with a panoptic view of these attacks, and can detect a spoofed image irrespective of the method used. The key contribution of this paper is designing a deep learning based panoptic algorithm for detection of both digital and physical presentation attacks using Cross Asymmetric Loss Function (CALF). The performance is evaluated for digital and physical attacks in three scenarios: ubiquitous environment, individual databases, and cross-attack/cross-database. Experimental results showcase the superior performance of the proposed presentation attack detection algorithm.
Citations
More filters
Posted Content
TL;DR: A capsule network that can detect various kinds of attacks, from presentation attacks using printed images and replayed videos to attacks using fake videos created using deep learning, uses many fewer parameters than traditional convolutional neural networks with similar performance.
Abstract: The revolution in computer hardware, especially in graphics processing units and tensor processing units, has enabled significant advances in computer graphics and artificial intelligence algorithms. In addition to their many beneficial applications in daily life and business, computer-generated/manipulated images and videos can be used for malicious purposes that violate security systems, privacy, and social trust. The deepfake phenomenon and its variations enable a normal user to use his or her personal computer to easily create fake videos of anybody from a short real online video. Several countermeasures have been introduced to deal with attacks using such videos. However, most of them are targeted at certain domains and are ineffective when applied to other domains or new attacks. In this paper, we introduce a capsule network that can detect various kinds of attacks, from presentation attacks using printed images and replayed videos to attacks using fake videos created using deep learning. It uses many fewer parameters than traditional convolutional neural networks with similar performance. Moreover, we explain, for the first time ever in the literature, the theory behind the application of capsule networks to the forensics problem through detailed analysis and visualization.

109 citations


Cites methods from "Crafting A Panoptic Face Presentati..."

  • ...Other methods have been developed that use the available CNN architectures with customized components and were trained on spoofing databases [40, 41, 15, 42, 43]....

    [...]

Journal ArticleDOI
TL;DR: A new framework for PAD is proposed using a one-class classifier, where the representation used is learned with a Multi-Channel Convolutional Neural Network (MCCNN) and a novel loss function is introduced, which forces the network to learn a compact embedding for bonafide class while being far from the representation of attacks.
Abstract: Face recognition has evolved as a widely used biometric modality. However, its vulnerability against presentation attacks poses a significant security threat. Though presentation attack detection (PAD) methods try to address this issue, they often fail in generalizing to unseen attacks. In this work, we propose a new framework for PAD using a one-class classifier, where the representation used is learned with a Multi-Channel Convolutional Neural Network ( MCCNN ). A novel loss function is introduced, which forces the network to learn a compact embedding for bonafide class while being far from the representation of attacks. A one-class Gaussian Mixture Model is used on top of these embeddings for the PAD task. The proposed framework introduces a novel approach to learn a robust PAD system from bonafide and available (known) attack classes. This is particularly important as collecting bonafide data and simpler attacks are much easier than collecting a wide variety of expensive attacks. The proposed system is evaluated on the publicly available WMCA multi-channel face PAD database, which contains a wide variety of 2D and 3D attacks. Further, we have performed experiments with MLFP and SiW-M datasets using RGB channels only. Superior performance in unseen attack protocols shows the effectiveness of the proposed approach. Software, data, and protocols to reproduce the results are made available publicly.

77 citations


Cites methods from "Crafting A Panoptic Face Presentati..."

  • ...[27] trained an Alexnet model with a combination of cross-entropy and focal losses....

    [...]

Journal ArticleDOI
03 Apr 2020
TL;DR: Different ways in which the robustness of a face recognition algorithm is challenged, which can severely affect its intended working are summarized.
Abstract: Face recognition algorithms have demonstrated very high recognition performance, suggesting suitability for real world applications Despite the enhanced accuracies, robustness of these algorithms against attacks and bias has been challenged This paper summarizes different ways in which the robustness of a face recognition algorithm is challenged, which can severely affect its intended working Different types of attacks such as physical presentation attacks, disguise/makeup, digital adversarial attacks, and morphing/tampering using GANs have been discussed We also present a discussion on the effect of bias on face recognition models and showcase that factors such as age and gender variations affect the performance of modern algorithms The paper also presents the potential reasons for these challenges and some of the future research directions for increasing the robustness of face recognition models

53 citations


Cites background from "Crafting A Panoptic Face Presentati..."

  • ...Given the vulnerabilities, it is our belief that future research should focus primarily on developing (i) robust PAD algorithms and (ii) universal detectors (Mehta et al. 2019) capable of handling multiple attacks....

    [...]

Journal ArticleDOI
Shan Jia1, Xin Li2, Chuanbo Hu2, Guodong Guo2, Zhengquan Xu1 
TL;DR: This work proposes a novel anti-spoofing method, based on factorized bilinear coding of multiple color channels (namely MC\_FBC), that achieves the state-of-the-art performance on both the authors' own WFFD and other face spoofing databases under various intra-database and inter-database testing scenarios.
Abstract: We have witnessed rapid advances in both face presentation attack models and presentation attack detection (PAD) in recent years. When compared with widely studied 2D face presentation attacks, 3D face spoofing attacks are more challenging because face recognition systems are more easily confused by the 3D characteristics of materials similar to real faces. In this work, we tackle the problem of detecting these realistic 3D face presentation attacks and propose a novel anti-spoofing method from the perspective of fine-grained classification. Our method, based on factorized bilinear coding of multiple color channels (namely MC_FBC), targets at learning subtle fine-grained differences between real and fake images. By extracting discriminative and fusing complementary information from RGB and YCbCr spaces, we have developed a principled solution to 3D face spoofing detection. A large-scale wax figure face database (WFFD) with both images and videos has also been collected as super realistic attacks to facilitate the study of 3D face presentation attack detection. Extensive experimental results show that our proposed method achieves the state-of-the-art performance on both our own WFFD and other face spoofing databases under various intra-database and inter-database testing scenarios.

36 citations


Cites methods from "Crafting A Panoptic Face Presentati..."

  • ...Existing methods tried to explore the difference between the real face skin and 3D fake face materials based on the reflectance properties using multispectral imaging [7], [8], texture analysis [9], [10], deep features [11], [12], or liveness cues [13], [14]....

    [...]

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a two head contraction expansion convolutional neural network (CNN) architecture for robust presentation attack detection, which consists of raw image and edge enhanced image to learn discriminating features for binary classification.

10 citations

References
More filters
Journal ArticleDOI
Tsung-Yi Lin1, Priya Goyal1, Ross Girshick1, Kaiming He1, Piotr Dollár1 
TL;DR: Focal loss as discussed by the authors focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training, which improves the accuracy of one-stage detectors.
Abstract: The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron .

5,734 citations

Proceedings Article
04 Dec 2006
TL;DR: These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
Abstract: Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

4,385 citations

Posted Content
TL;DR: This report shows how to examine the training validation/test loss function for subtle clues of underfitting and overfitting and suggests guidelines for moving toward the optimal balance point and discusses how to increase/decrease the learning rate/momentum to speed up training.
Abstract: Although deep learning has produced dazzling successes for applications of image, speech, and video processing in the past few years, most trainings are with suboptimal hyper-parameters, requiring unnecessarily long training times. Setting the hyper-parameters remains a black art that requires years of experience to acquire. This report proposes several efficient ways to set the hyper-parameters that significantly reduce training time and improves performance. Specifically, this report shows how to examine the training validation/test loss function for subtle clues of underfitting and overfitting and suggests guidelines for moving toward the optimal balance point. Then it discusses how to increase/decrease the learning rate/momentum to speed up training. Our experiments show that it is crucial to balance every manner of regularization for each dataset and architecture. Weight decay is used as a sample regularizer to show how its optimal value is tightly coupled with the learning rates and momentums. Files to help replicate the results reported here are available.

723 citations


"Crafting A Panoptic Face Presentati..." refers methods in this paper

  • ...To train the proposed presentation attack detector, we use the AlexNet architecture with cyclic learning rates [27] and stochastic gradient descent (SGD) optimization with warm restarts and differential learning rates across the layers....

    [...]

Proceedings ArticleDOI
Zhiwei Zhang1, Junjie Yan1, Sifei Liu1, Zhen Lei1, Dong Yi1, Stan Z. Li1 
06 Aug 2012
TL;DR: A face antispoofing database which covers a diverse range of potential attack variations, and a baseline algorithm is given for comparison, which explores the high frequency information in the facial region to determine the liveness.
Abstract: Face antispoofing has now attracted intensive attention, aiming to assure the reliability of face biometrics. We notice that currently most of face antispoofing databases focus on data with little variations, which may limit the generalization performance of trained models since potential attacks in real world are probably more complex. In this paper we release a face antispoofing database which covers a diverse range of potential attack variations. Specifically, the database contains 50 genuine subjects, and fake faces are made from the high quality records of the genuine faces. Three imaging qualities are considered, namely the low quality, normal quality and high quality. Three fake face attacks are implemented, which include warped photo attack, cut photo attack and video attack. Therefore each subject contains 12 videos (3 genuine and 9 fake), and the final database contains 600 video clips. Test protocol is provided, which consists of 7 scenarios for a thorough evaluation from all possible aspects. A baseline algorithm is also given for comparison, which explores the high frequency information in the facial region to determine the liveness. We hope such a database can serve as an evaluation platform for future researches in the literature.

680 citations


"Crafting A Panoptic Face Presentati..." refers background or methods in this paper

  • ...[30] is a standard physical attack database consisting of 600 video samples from the warped photo, cut photo, and video attacks in three qualities: low, normal, and high....

    [...]

  • ...Majority of the existing anti-spoof methods involve extraction of discriminating features to analyze the face texture, such as Haralick texture features, local binary pattern (LBP), partial least square (PLS), and difference of Gaussian (DoG) [1, 2, 3, 20, 30]....

    [...]

Proceedings ArticleDOI
26 Dec 2007
TL;DR: A real-time liveness detection approach against photograph spoofing in face recognition, by recognizing spontaneous eyeblinks, which is a non-intrusive manner, which outperforms the cascaded Adaboost and HMM in task of eyeblink detection.
Abstract: We present a real-time liveness detection approach against photograph spoofing in face recognition, by recognizing spontaneous eyeblinks, which is a non-intrusive manner. The approach requires no extra hardware except for a generic webcamera. Eyeblink sequences often have a complex underlying structure. We formulate blink detection as inference in an undirected conditional graphical framework, and are able to learn a compact and efficient observation and transition potentials from data. For purpose of quick and accurate recognition of the blink behavior, eye closity, an easily-computed discriminative measure derived from the adaptive boosting algorithm, is developed, and then smoothly embedded into the conditional model. An extensive set of experiments are presented to show effectiveness of our approach and how it outperforms the cascaded Adaboost and HMM in task of eyeblink detection.

611 citations


"Crafting A Panoptic Face Presentati..." refers background in this paper

  • ...Dynamic anti-spoofing techniques mostly target blinking [13, 18], motion magnification [6], or liveness detection [8, 28], given a sequence of frames....

    [...]