scispace - formally typeset
Proceedings ArticleDOI

Face anti-spoofing with multifeature videolet aggregation

01 Dec 2016-pp 1035-1040

...read more


Citations
More filters
Proceedings ArticleDOI

[...]

TL;DR: A novel two-stream CNN-based approach for face anti-spoofing is proposed, by extracting the local features and holistic depth maps from the face images, which facilitate CNN to discriminate the spoof patches independent of the spatial face areas.
Abstract: The face image is the most accessible biometric modality which is used for highly accurate face recognition systems, while it is vulnerable to many different types of presentation attacks. Face anti-spoofing is a very critical step before feeding the face image to biometric systems. In this paper, we propose a novel two-stream CNN-based approach for face anti-spoofing, by extracting the local features and holistic depth maps from the face images. The local features facilitate CNN to discriminate the spoof patches independent of the spatial face areas. On the other hand, holistic depth map examine whether the input image has a face-like depth. Extensive experiments are conducted on the challenging databases (CASIA-FASD, MSU-USSA, and Replay Attack), with comparison to the state of the art.

227 citations


Cites methods from "Face anti-spoofing with multifeatur..."

  • [...]

Journal ArticleDOI

[...]

TL;DR: This paper introduces the first-of-its-kind silicone mask attack database which contains 130 real and attacked videos to facilitate research in developing presentation attack detection algorithms for this challenging scenario.
Abstract: In movies, film stars portray another identity or obfuscate their identity with the help of silicone/latex masks. Such realistic masks are now easily available and are used for entertainment purposes. However, their usage in criminal activities to deceive law enforcement and automatic face recognition systems is also plausible. Therefore, it is important to guard biometrics systems against such realistic presentation attacks. This paper introduces the first-of-its-kind silicone mask attack database which contains 130 real and attacked videos to facilitate research in developing presentation attack detection algorithms for this challenging scenario. Along with silicone mask, there are several other presentation attack instruments that are explored in literature. The next contribution of this research is a novel multilevel deep dictionary learning-based presentation attack detection algorithm that can discern different kinds of attacks. An efficient greedy layer by layer training approach is formulated to learn the deep dictionaries followed by SVM to classify an input sample as genuine or attacked. Experimental are performed on the proposed SMAD database, some samples with real world silicone mask attacks, and four existing presentation attack databases, namely, replay-attack, CASIA-FASD, 3DMAD, and UVAD. The results show that the proposed algorithm yields better performance compared with state-of-the-art algorithms, in both intra-database and cross-database experiments.

125 citations


Cites background or methods from "Face anti-spoofing with multifeatur..."

  • [...]

  • [...]

  • [...]

  • [...]

Posted Content

[...]

TL;DR: In this article, a CNN architecture with proper constraints and supervisions is proposed to overcome the problem of having no ground truth for the decomposition of a spoof face into a spoof noise and a live face.
Abstract: Many prior face anti-spoofing works develop discriminative models for recognizing the subtle differences between live and spoof faces. Those approaches often regard the image as an indivisible unit, and process it holistically, without explicit modeling of the spoofing process. In this work, motivated by the noise modeling and denoising algorithms, we identify a new problem of face de-spoofing, for the purpose of anti-spoofing: inversely decomposing a spoof face into a spoof noise and a live face, and then utilizing the spoof noise for classification. A CNN architecture with proper constraints and supervisions is proposed to overcome the problem of having no ground truth for the decomposition. We evaluate the proposed method on multiple face anti-spoofing databases. The results show promising improvements due to our spoof noise modeling. Moreover, the estimated spoof noise provides a visualization which helps to understand the added spoof noise by each spoof medium.

103 citations

Book ChapterDOI

[...]

08 Sep 2018
TL;DR: A CNN architecture with proper constraints and supervisions is proposed to overcome the problem of having no ground truth for the decomposition of face de-spoofing, and the results show promising improvements due to the spoof noise modeling.
Abstract: Many prior face anti-spoofing works develop discriminative models for recognizing the subtle differences between live and spoof faces. Those approaches often regard the image as an indivisible unit, and process it holistically, without explicit modeling of the spoofing process. In this work, motivated by the noise modeling and denoising algorithms, we identify a new problem of face de-spoofing, for the purpose of anti-spoofing: inversely decomposing a spoof face into a spoof noise and a live face, and then utilizing the spoof noise for classification. A CNN architecture with proper constraints and supervisions is proposed to overcome the problem of having no ground truth for the decomposition. We evaluate the proposed method on multiple face anti-spoofing databases. The results show promising improvements due to our spoof noise modeling. Moreover, the estimated spoof noise provides a visualization which helps to understand the added spoof noise by each spoof medium.

95 citations


Cites background from "Face anti-spoofing with multifeatur..."

  • [...]

Proceedings ArticleDOI

[...]

14 Jun 2020
TL;DR: Yu et al. as discussed by the authors proposed a frame level FAS method based on Central Difference Convolution (CDC), which is able to capture intrinsic detailed patterns via aggregating both intensity and gradient information.
Abstract: Face anti-spoofing (FAS) plays a vital role in face recognition systems. Most state-of-the-art FAS methods 1) rely on stacked convolutions and expert-designed network, which is weak in describing detailed fine-grained information and easily being ineffective when the environment varies (e.g., different illumination), and 2) prefer to use long sequence as input to extract dynamic features, making them difficult to deploy into scenarios which need quick response. Here we propose a novel frame level FAS method based on Central Difference Convolution (CDC), which is able to capture intrinsic detailed patterns via aggregating both intensity and gradient information. A network built with CDC, called the Central Difference Convolutional Network (CDCN), is able to provide more robust modeling capacity than its counterpart built with vanilla convolution. Furthermore, over a specifically designed CDC search space, Neural Architecture Search (NAS) is utilized to discover a more powerful network structure (CDCN++), which can be assembled with Multiscale Attention Fusion Module (MAFM) for further boosting performance. Comprehensive experiments are performed on six benchmark datasets to show that 1) the proposed method not only achieves superior performance on intra-dataset testing (especially 0.2% ACER in Protocol-1 of OULU-NPU dataset), 2) it also generalizes well on cross-dataset testing (particularly 6.5% HTER from CASIA-MFSD to Replay-Attack datasets). The codes are available at https://github.com/ZitongYu/CDCN.

75 citations


References
More filters
Journal ArticleDOI

[...]

TL;DR: A novel approach for recognizing DTs is proposed and its simplifications and extensions to facial image analysis are also considered and both the VLBP and LBP-TOP clearly outperformed the earlier approaches.
Abstract: Dynamic texture (DT) is an extension of texture to the temporal domain. Description and recognition of DTs have attracted growing attention. In this paper, a novel approach for recognizing DTs is proposed and its simplifications and extensions to facial image analysis are also considered. First, the textures are modeled with volume local binary patterns (VLBP), which are an extension of the LBP operator widely used in ordinary texture analysis, combining motion and appearance. To make the approach computationally simple and easy to extend, only the co-occurrences of the local binary patterns on three orthogonal planes (LBP-TOP) are then considered. A block-based method is also proposed to deal with specific dynamic events such as facial expressions in which local information and its spatial locations should also be taken into account. In experiments with two DT databases, DynTex and Massachusetts Institute of Technology (MIT), both the VLBP and LBP-TOP clearly outperformed the earlier approaches. The proposed block-based method was evaluated with the Cohn-Kanade facial expression database with excellent results. The advantages of our approach include local processing, robustness to monotonic gray-scale changes, and simple computation

2,352 citations


"Face anti-spoofing with multifeatur..." refers background in this paper

  • [...]

Journal ArticleDOI

[...]

TL;DR: The inherent strengths of biometrics-based authentication are outlined, the weak links in systems employing biometric authentication are identified, and new solutions for eliminating these weak links are presented.
Abstract: Because biometrics-based authentication offers several advantages over other authentication methods, there has been a significant surge in the use of biometrics for user authentication in recent years. It is important that such biometrics-based authentication systems be designed to withstand attacks when employed in security-critical applications, especially in unattended remote applications such as e-commerce. In this paper we outline the inherent strengths of biometrics-based authentication, identify the weak links in systems employing biometrics-based authentication, and present new solutions for eliminating some of these weak links. Although, for illustration purposes, fingerprint authentication is used throughout, our analysis extends to other biometrics-based methods.

1,601 citations


"Face anti-spoofing with multifeatur..." refers background in this paper

  • [...]

[...]

01 Jan 2009
TL;DR: This thesis builds a human-assisted motion annotation system to obtain ground-truth motion, missing in the literature, for natural video sequences, and proposes SIFT flow, a new framework for image parsing by transferring the metadata information from the images in a large database to an unknown query image.
Abstract: The focus of motion analysis has been on estimating a flow vector for every pixel by matching intensities. In my thesis, I will explore motion representations beyond the pixel level and new applications to which these representations lead. I first focus on analyzing motion from video sequences. Traditional motion analysis suffers from the inappropriate modeling of the grouping relationship of pixels and from a lack of ground-truth data. Using layers as the interface for humans to interact with videos, we build a human-assisted motion annotation system to obtain ground-truth motion, missing in the literature, for natural video sequences. Furthermore, we show that with the layer representation, we can detect and magnify small motions to make them visible to human eyes. Then we move to a contour presentation to analyze the motion for textureless objects under occlusion. We demonstrate that simultaneous boundary grouping and motion analysis can solve challenging data, where the traditional pixel-wise motion analysis fails. In the second part of my thesis, I will show the benefits of matching local image structures instead of intensity values. We propose SIFT flow that establishes dense, semantically meaningful correspondence between two images across scenes by matching pixel-wise SIFT features. Using SIFT flow, we develop a new framework for image parsing by transferring the metadata information, such as annotation, motion and depth, from the images in a large database to an unknown query image. We demonstrate this framework using new applications such as predicting motion from a single image and motion synthesis via object transfer. Based on SIFT flow, we introduce a nonparametric scene parsing system using label transfer, with very promising experimental results suggesting that our system outperforms state-of-the-art techniques based on training classifiers. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

862 citations


"Face anti-spoofing with multifeatur..." refers methods in this paper

  • [...]

Proceedings Article

[...]

27 Sep 2012
TL;DR: This paper inspects the potential of texture features based on Local Binary Patterns (LBP) and their variations on three types of attacks: printed photographs, and photos and videos displayed on electronic screens of different sizes and concludes that LBP show moderate discriminability when confronted with a wide set of attack types.
Abstract: Spoofing attacks are one of the security traits that biometric recognition systems are proven to be vulnerable to. When spoofed, a biometric recognition system is bypassed by presenting a copy of the biometric evidence of a valid user. Among all biometric modalities, spoofing a face recognition system is particularly easy to perform: all that is needed is a simple photograph of the user. In this paper, we address the problem of detecting face spoofing attacks. In particular, we inspect the potential of texture features based on Local Binary Patterns (LBP) and their variations on three types of attacks: printed photographs, and photos and videos displayed on electronic screens of different sizes. For this purpose, we introduce REPLAY-ATTACK, a novel publicly available face spoofing database which contains all the mentioned types of attacks. We conclude that LBP, with ∼15% Half Total Error Rate, show moderate discriminability when confronted with a wide set of attack types.

582 citations


Additional excerpts

  • [...]

Proceedings ArticleDOI

[...]

20 Jun 2009
TL;DR: This paper proposes to represent each frame of a video using a histogram of oriented optical flow (HOOF) and to recognize human actions by classifying HOOF time-series, and proposes a generalization of the Binet-Cauchy kernels to nonlinear dynamical systems (NLDS) whose output lives in a non-Euclidean space.
Abstract: System theoretic approaches to action recognition model the dynamics of a scene with linear dynamical systems (LDSs) and perform classification using metrics on the space of LDSs, e.g. Binet-Cauchy kernels. However, such approaches are only applicable to time series data living in a Euclidean space, e.g. joint trajectories extracted from motion capture data or feature point trajectories extracted from video. Much of the success of recent object recognition techniques relies on the use of more complex feature descriptors, such as SIFT descriptors or HOG descriptors, which are essentially histograms. Since histograms live in a non-Euclidean space, we can no longer model their temporal evolution with LDSs, nor can we classify them using a metric for LDSs. In this paper, we propose to represent each frame of a video using a histogram of oriented optical flow (HOOF) and to recognize human actions by classifying HOOF time-series. For this purpose, we propose a generalization of the Binet-Cauchy kernels to nonlinear dynamical systems (NLDS) whose output lives in a non-Euclidean space, e.g. the space of histograms. This can be achieved by using kernels defined on the original non-Euclidean space, leading to a well-defined metric for NLDSs. We use these kernels for the classification of actions in video sequences using (HOOF) as the output of the NLDS. We evaluate our approach to recognition of human actions in several scenarios and achieve encouraging results.

552 citations