scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Crafting A Panoptic Face Presentation Attack Detector

TL;DR: This paper designs a deep learning based panoptic algorithm for detection of both digital and physical presentation attacks using Cross Asymmetric Loss Function (CALF) and shows superior performance in three scenarios: ubiquitous environment, individual databases, and cross-attack/cross-database.
Abstract: With the advancements in technology and growing popularity of facial photo editing in the social media landscape, tools such as face swapping and face morphing have become increasingly accessible to the general public. It opens up the possibilities for different kinds of face presentation attacks, which can be taken advantage of by impostors to gain unauthorized access of a biometric system. Moreover, the wide availability of 3D printers has caused a shift from print attacks to 3D mask attacks. With increasing types of attacks, it is necessary to come up with a generic and ubiquitous algorithm with a panoptic view of these attacks, and can detect a spoofed image irrespective of the method used. The key contribution of this paper is designing a deep learning based panoptic algorithm for detection of both digital and physical presentation attacks using Cross Asymmetric Loss Function (CALF). The performance is evaluated for digital and physical attacks in three scenarios: ubiquitous environment, individual databases, and cross-attack/cross-database. Experimental results showcase the superior performance of the proposed presentation attack detection algorithm.
Citations
More filters
Posted Content
TL;DR: A capsule network that can detect various kinds of attacks, from presentation attacks using printed images and replayed videos to attacks using fake videos created using deep learning, uses many fewer parameters than traditional convolutional neural networks with similar performance.
Abstract: The revolution in computer hardware, especially in graphics processing units and tensor processing units, has enabled significant advances in computer graphics and artificial intelligence algorithms. In addition to their many beneficial applications in daily life and business, computer-generated/manipulated images and videos can be used for malicious purposes that violate security systems, privacy, and social trust. The deepfake phenomenon and its variations enable a normal user to use his or her personal computer to easily create fake videos of anybody from a short real online video. Several countermeasures have been introduced to deal with attacks using such videos. However, most of them are targeted at certain domains and are ineffective when applied to other domains or new attacks. In this paper, we introduce a capsule network that can detect various kinds of attacks, from presentation attacks using printed images and replayed videos to attacks using fake videos created using deep learning. It uses many fewer parameters than traditional convolutional neural networks with similar performance. Moreover, we explain, for the first time ever in the literature, the theory behind the application of capsule networks to the forensics problem through detailed analysis and visualization.

109 citations


Cites methods from "Crafting A Panoptic Face Presentati..."

  • ...Other methods have been developed that use the available CNN architectures with customized components and were trained on spoofing databases [40, 41, 15, 42, 43]....

    [...]

Journal ArticleDOI
TL;DR: A new framework for PAD is proposed using a one-class classifier, where the representation used is learned with a Multi-Channel Convolutional Neural Network (MCCNN) and a novel loss function is introduced, which forces the network to learn a compact embedding for bonafide class while being far from the representation of attacks.
Abstract: Face recognition has evolved as a widely used biometric modality. However, its vulnerability against presentation attacks poses a significant security threat. Though presentation attack detection (PAD) methods try to address this issue, they often fail in generalizing to unseen attacks. In this work, we propose a new framework for PAD using a one-class classifier, where the representation used is learned with a Multi-Channel Convolutional Neural Network ( MCCNN ). A novel loss function is introduced, which forces the network to learn a compact embedding for bonafide class while being far from the representation of attacks. A one-class Gaussian Mixture Model is used on top of these embeddings for the PAD task. The proposed framework introduces a novel approach to learn a robust PAD system from bonafide and available (known) attack classes. This is particularly important as collecting bonafide data and simpler attacks are much easier than collecting a wide variety of expensive attacks. The proposed system is evaluated on the publicly available WMCA multi-channel face PAD database, which contains a wide variety of 2D and 3D attacks. Further, we have performed experiments with MLFP and SiW-M datasets using RGB channels only. Superior performance in unseen attack protocols shows the effectiveness of the proposed approach. Software, data, and protocols to reproduce the results are made available publicly.

77 citations


Cites methods from "Crafting A Panoptic Face Presentati..."

  • ...[27] trained an Alexnet model with a combination of cross-entropy and focal losses....

    [...]

Journal ArticleDOI
03 Apr 2020
TL;DR: Different ways in which the robustness of a face recognition algorithm is challenged, which can severely affect its intended working are summarized.
Abstract: Face recognition algorithms have demonstrated very high recognition performance, suggesting suitability for real world applications Despite the enhanced accuracies, robustness of these algorithms against attacks and bias has been challenged This paper summarizes different ways in which the robustness of a face recognition algorithm is challenged, which can severely affect its intended working Different types of attacks such as physical presentation attacks, disguise/makeup, digital adversarial attacks, and morphing/tampering using GANs have been discussed We also present a discussion on the effect of bias on face recognition models and showcase that factors such as age and gender variations affect the performance of modern algorithms The paper also presents the potential reasons for these challenges and some of the future research directions for increasing the robustness of face recognition models

53 citations


Cites background from "Crafting A Panoptic Face Presentati..."

  • ...Given the vulnerabilities, it is our belief that future research should focus primarily on developing (i) robust PAD algorithms and (ii) universal detectors (Mehta et al. 2019) capable of handling multiple attacks....

    [...]

Journal ArticleDOI
Shan Jia1, Xin Li2, Chuanbo Hu2, Guodong Guo2, Zhengquan Xu1 
TL;DR: This work proposes a novel anti-spoofing method, based on factorized bilinear coding of multiple color channels (namely MC\_FBC), that achieves the state-of-the-art performance on both the authors' own WFFD and other face spoofing databases under various intra-database and inter-database testing scenarios.
Abstract: We have witnessed rapid advances in both face presentation attack models and presentation attack detection (PAD) in recent years. When compared with widely studied 2D face presentation attacks, 3D face spoofing attacks are more challenging because face recognition systems are more easily confused by the 3D characteristics of materials similar to real faces. In this work, we tackle the problem of detecting these realistic 3D face presentation attacks and propose a novel anti-spoofing method from the perspective of fine-grained classification. Our method, based on factorized bilinear coding of multiple color channels (namely MC_FBC), targets at learning subtle fine-grained differences between real and fake images. By extracting discriminative and fusing complementary information from RGB and YCbCr spaces, we have developed a principled solution to 3D face spoofing detection. A large-scale wax figure face database (WFFD) with both images and videos has also been collected as super realistic attacks to facilitate the study of 3D face presentation attack detection. Extensive experimental results show that our proposed method achieves the state-of-the-art performance on both our own WFFD and other face spoofing databases under various intra-database and inter-database testing scenarios.

36 citations


Cites methods from "Crafting A Panoptic Face Presentati..."

  • ...Existing methods tried to explore the difference between the real face skin and 3D fake face materials based on the reflectance properties using multispectral imaging [7], [8], texture analysis [9], [10], deep features [11], [12], or liveness cues [13], [14]....

    [...]

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a two head contraction expansion convolutional neural network (CNN) architecture for robust presentation attack detection, which consists of raw image and edge enhanced image to learn discriminating features for binary classification.

10 citations

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Journal ArticleDOI
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

37,861 citations


"Crafting A Panoptic Face Presentati..." refers methods in this paper

  • ...In the proposed architecture, the FC layer is replaced by a two-class SVM classifier [9] to classify the input image as real or spoof....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

13,037 citations

Proceedings ArticleDOI
Tsung-Yi Lin1, Priya Goyal2, Ross Girshick2, Kaiming He2, Piotr Dollár2 
07 Aug 2017
TL;DR: This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Abstract: The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors.

12,161 citations

Proceedings ArticleDOI
07 Jul 2001
TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algo- rithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a "cascade" which allows back- ground regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection perfor- mance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

10,592 citations


"Crafting A Panoptic Face Presentati..." refers methods in this paper

  • ...Images are cropped using the Viola-Jones face detection algorithm [29] and resized to 224 × 224 pixels....

    [...]