scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

CHIF: Convoluted Histogram Image Features for Detecting Silicone Mask based Face Presentation Attack

TL;DR: This research proposes a computationally efficient solution by utilizing the power of CNN filters, and texture encoding for silicone mask based presentation attacks by binarizing the image region after convolving the region with the filters learned via CNN operations.
Abstract: Face recognition algorithms are generally vulnerable towards presentation attacks ranging from cost-effective ways such as print and replay to sophisticated mediums such as silicone masks. Carefully designed silicone masks have real-life face texture once wore and can exhibit facial motions; thereby making them challenging to detect. In the literature, while several algorithms have been developed for detecting print and replay based attacks, limited work has been done for detecting silicone mask-based attack. In this research, we propose a computationally efficient solution by utilizing the power of CNN filters, and texture encoding for silicone mask based presentation attacks. The proposed framework operates on the principle of binarizing the image region after convolving the region with the filters learned via CNN operations. On the challenging silicon mask face presentation attack database (SMAD), the proposed feature descriptor shows 3.8% lower error rate than the state-of-the-art algorithms.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , a comprehensive review of recent advances in deep learning based face anti-spoofing (FAS) is presented, which covers several novel and insightful components: 1) besides supervision with binary label (e.g., ‘0’ for bonafide vs. ‘1' for PAs), also investigate recent methods with pixel-wise supervision, and 2) in addition to traditional intra-dataset evaluation, collect and analyze the latest methods specially designed for domain generalization and open-set FAS; and 3) besides commercial RGB camera, summarize the deep learning applications under multi-modal (i.e. light field and flash) sensors.
Abstract: Face anti-spoofing (FAS) has lately attracted increasing attention due to its vital role in securing face recognition systems from presentation attacks (PAs). As more and more realistic PAs with novel types spring up, early-stage FAS methods based on handcrafted features become unreliable due to their limited representation capacity. With the emergence of large-scale academic datasets in the recent decade, deep learning based FAS achieves remarkable performance and dominates this area. However, existing reviews in this field mainly focus on the handcrafted features, which are outdated and uninspiring for the progress of FAS community. In this paper, to stimulate future research, we present the first comprehensive review of recent advances in deep learning based FAS. It covers several novel and insightful components: 1) besides supervision with binary label (e.g., ‘0’ for bonafide vs. ‘1’ for PAs), we also investigate recent methods with pixel-wise supervision (e.g., pseudo depth map); 2) in addition to traditional intra-dataset evaluation, we collect and analyze the latest methods specially designed for domain generalization and open-set FAS; and 3) besides commercial RGB camera, we summarize the deep learning applications under multi-modal (e.g., depth and infrared) or specialized (e.g., light field and flash) sensors. We conclude this survey by emphasizing current open issues and highlighting potential prospects.

9 citations

Posted Content
TL;DR: This research has proposed a deep learning-based network termed as MixNet to detect presentation attacks in cross-database and unseen attack settings and shows the effectiveness of the proposed algorithm.
Abstract: The non-intrusive nature and high accuracy of face recognition algorithms have led to their successful deployment across multiple applications ranging from border access to mobile unlocking and digital payments. However, their vulnerability against sophisticated and cost-effective presentation attack mediums raises essential questions regarding its reliability. In the literature, several presentation attack detection algorithms are presented; however, they are still far behind from reality. The major problem with existing work is the generalizability against multiple attacks both in the seen and unseen setting. The algorithms which are useful for one kind of attack (such as print) perform unsatisfactorily for another type of attack (such as silicone masks). In this research, we have proposed a deep learning-based network termed as \textit{MixNet} to detect presentation attacks in cross-database and unseen attack settings. The proposed algorithm utilizes state-of-the-art convolutional neural network architectures and learns the feature mapping for each attack category. Experiments are performed using multiple challenging face presentation attack databases such as SMAD and Spoof In the Wild (SiW-M) databases. Extensive experiments and comparison with existing state of the art algorithms show the effectiveness of the proposed algorithm.

6 citations

Proceedings ArticleDOI
10 Jan 2021
TL;DR: In this article, the authors proposed a deep learning-based network termed as MixNet to detect presentation attacks in cross-database and unseen attack settings, which utilizes state-of-the-art convolutional neural network architectures and learns the feature mapping for each attack category.
Abstract: The non-intrusive nature and high accuracy of face recognition algorithms have led to their successful deployment across multiple applications ranging from border access to mobile unlocking and digital payments. However, their vulnerability against sophisticated and cost-effective presentation attack mediums raises essential questions regarding its reliability. In the literature, several presentation attack detection algorithms are presented; however, they are still far behind from reality. The major problem with existing work is the generalizability against multiple attacks both in the seen and unseen setting. The algorithms which are useful for one kind of attack (such as print) perform unsatisfactorily for another type of attack (such as silicone masks). In this research, we have proposed a deep learning-based network termed as MixNet to detect presentation attacks in cross-database and unseen attack settings. The proposed algorithm utilizes state-of-the-art convolutional neural network architectures and learns the feature mapping for each attack category. Experiments are performed using multiple challenging face presentation attack databases such as SMAD and Spoof In the Wild (SiW-M) databases. Extensive experiments and comparison with existing state of the art algorithms show the effectiveness of the proposed algorithm.

5 citations

Posted Content
TL;DR: A comprehensive review of recent advances in deep learning-based face anti-spoofing can be found in this article, which covers several novel and insightful components: 1) besides the traditional binary label (e.g., '0' for bonafide vs. '1' for PAs), they also investigate recent methods with pixel-wise supervision, and 2) in addition to traditional intra-dataset evaluation, they collect and analyze the latest methods specially designed for domain generalization and open-set FAS; and 3) besides commercial RGB camera, they summarize
Abstract: Face anti-spoofing (FAS) has lately attracted increasing attention due to its vital role in securing face recognition systems from presentation attacks (PAs). As more and more realistic PAs with novel types spring up, traditional FAS methods based on handcrafted features become unreliable due to their limited representation capacity. With the emergence of large-scale academic datasets in the recent decade, deep learning based FAS achieves remarkable performance and dominates this area. However, existing reviews in this field mainly focus on the handcrafted features, which are outdated and uninspiring for the progress of FAS community. In this paper, to stimulate future research, we present the first comprehensive review of recent advances in deep learning based FAS. It covers several novel and insightful components: 1) besides supervision with binary label (e.g., '0' for bonafide vs. '1' for PAs), we also investigate recent methods with pixel-wise supervision (e.g., pseudo depth map); 2) in addition to traditional intra-dataset evaluation, we collect and analyze the latest methods specially designed for domain generalization and open-set FAS; and 3) besides commercial RGB camera, we summarize the deep learning applications under multi-modal (e.g., depth and infrared) or specialized (e.g., light field and flash) sensors. We conclude this survey by emphasizing current open issues and highlighting potential prospects.

4 citations

Journal ArticleDOI
TL;DR: This article presents a unified PAD algorithm for different kinds of attacks such as printed photos, a replay of video, 3D masks, silicone masks, and wax faces, which utilizes a combination of wavelet decomposed raw input images from sensor and face region data to detect whether the input image is bonafide or attacked.
Abstract: Presentation attack detection (PAD) algorithms have become an integral requirement for the secure usage of face recognition systems. As face recognition algorithms and applications increase from constrained to unconstrained environments and in multispectral scenarios, presentation attack detection algorithms must also increase their scope and effectiveness. It is important to realize that the PAD algorithms are not only effective for one environment or condition but rather be generalizable to a multitude of variabilities that are presented to a face recognition algorithm. With this motivation, as the first contribution, the article presents a unified PAD algorithm for different kinds of attacks such as printed photos, a replay of video, 3D masks, silicone masks, and wax faces. The proposed algorithm utilizes a combination of wavelet decomposed raw input images from sensor and face region data to detect whether the input image is bonafide or attacked. The second contribution of the article is the collection of a large presentation attack database in the NIR spectrum, containing images from individuals of two ethnicities. The database contains 500 print attack videos which comprise approximately 1,00,000 frames collectively in the NIR spectrum. Extensive evaluation of the algorithm on NIR images as well as visible spectrum images obtained from existing benchmark databases shows that the proposed algorithm yields state-of-the-art results and surpassed several complex and state-of-the-art algorithms. For instance, on benchmark datasets, namely CASIA-FASD, Replay-Attack, and MSU-MFSD, the proposed algorithm achieves a maximum error of 0.92% which is significantly lower than state-of-the-art attack detection algorithms.

1 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"CHIF: Convoluted Histogram Image Fe..." refers methods in this paper

  • ...The decimal value from the binary string is calculated using the following equation: CHIF8 = 8∑ i=1 s(x, y)× 2i (4) In this research, we have experimented with the first and second layer filters (i.e., rich in edge information [39]) from pre-trained1 VGG16, VGG19 [33], GoogLeNet [34], ResNet50, ResNet101 [14], and VGG-Face [30] for computing the CHIF image descriptor....

    [...]

  • ..., rich in edge information [39]) from pre-trained1 VGG16, VGG19 [33], GoogLeNet [34], ResNet50, ResNet101 [14], and VGG-Face [30] for computing the CHIF image descriptor....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations


"CHIF: Convoluted Histogram Image Fe..." refers background in this paper

  • ...Finally, Figure 5 shows a t-SNE [22] plot in which we compare frames of a video with real faces and frames of a video with silicone mask faces....

    [...]