scispace - formally typeset
Open AccessProceedings ArticleDOI

Counter-measures to photo attacks in face recognition: A public database and a baseline

TLDR
This work introduces the publicly available PRINT-ATTACK database and exemplifies how to use its companion protocol with a motion-based algorithm that detects correlations between the person's head movements and the scene context to compare to other counter-measure techniques.
Abstract
A common technique to by-pass 2-D face recognition systems is to use photographs of spoofed identities. Unfortunately, research in counter-measures to this type of attack have not kept-up - even if such threats have been known for nearly a decade, there seems to exist no consensus on best practices, techniques or protocols for developing and testing spoofing-detectors for face recognition. We attribute the reason for this delay, partly, to the unavailability of public databases and protocols to study solutions and compare results. To this purpose we introduce the publicly available PRINT-ATTACK database and exemplify how to use its companion protocol with a motion-based algorithm that detects correlations between the person's head movements and the scene context. The results are to be used as basis for comparison to other counter-measure techniques. The PRINT-ATTACK database contains 200 videos of real-accesses and 200 videos of spoof attempts using printed photographs of 50 different identities.

read more

Content maybe subject to copyright    Report

Counter-Measures to Photo Attacks in Face Recognition: a public
database and a baseline
Andr´e Anjos and S´ebastien Marcel
Idiap Research Institute
Centre du Parc - rue Marconi 19
CH-1920 Martigny, Suisse
{andre.anjos,sebastien.marcel}@idiap.ch
Abstract
A common technique to by-pass 2-D face recognition
systems is to use photographs of spoofed identities. Un-
fortunately, research in counter-measures to this type
of attack have not kept-up - even if such threats have
been known for nearly a decade, there seems to exist no
consensus on best practices, techniques or protocols for
developing and testing spoofing-detectors for face recog-
nition. We attribute the reason for this delay, partly, to
the unavailability of public databases and protocols to
study solutions and compare results. To this purpose
we introduce the publicly available PRINT-ATTACK
database and exemplify how to use its companion pro-
tocol with a motion-based algorithm that detects corre-
lations between the person’s head movements and the
scene context. The results are to be used as basis for
comparison to other counter-measure techniques. The
PRINT-ATTACK database contains 200 videos of real-
accesses and 200 videos of spoof attempts using printed
photographs of 50 different identities.
1. Introduction
Identity theft is a concern that prevents the main-
stream adoption of biometrics as de facto form of
identification in commercial systems [1]. Contrary to
password-protected systems, our biometric information
is widely available and extremely easy to sample. It
suffices a small search on the internet to unveil pre-
labelled samples from users at specialized websites such
as Flickr or Facebook. Images can also be easily cap-
tured at distance without previous consent. Users can-
not trust that these samples will not be dishonestly
used to assume their identity before biometric recogni-
tion systems.
In this work we are particularly concerned with di-
rect [2] print-attacks to unimodal 2-D (visual spectra)
face-recognition systems
1
. These so-called spoofing at-
tempts [3] are direct attacks to the input sensors of the
biometric system. Attackers in this case are assumed
not to have access to the internals of the recognition
system and manage to penetrate by only displaying
printed photos of the attacked identity to the input
camera. This type of attack is therefore very easy to
reproduce and has great potential to succeed [4].
Despite the fact that solutions exist for spoof pre-
vention using multi-modal techniques [5, 6, 7, 8], it
is our belief that research for counter-measures solely
based on unimodal 2-D imagery has not yet reached a
matured state. There seems to exist no consensus on
best practices and techniques to be deployed on attack
detection using non-intrusive methods. The number
of publications on the subject is small. A missing key
to this puzzle is the lack of standard databases to test
counter-measures, followed by a set of protocols to eval-
uate performance and allow for objective comparison.
This work introduces a publicly available database,
protocols and a baseline technique to evaluate counter-
measures to spoofing attacks in face recognition sys-
tems. The remaining of this text is organized as fol-
lows: Section 2 discusses the current state-of-the-art in
anti-spoofing for 2-D face recognition systems. Sec-
tion 3 describes the PRINT-ATTACK anti-spoofing
database and defines protocols for its usage. Section 4
defines a baseline technique that can be used for com-
parison with other algorithms. Section 5 reports on the
experimental setup and results. Finally, Section 6 con-
cludes and discusses possible extensions of this work.
1
We will refer to such systems simply as face recognition sys-
tems from this point onwards.

2. Literature Survey
Face recognition systems are known to respond
weakly to attacks for a long time [9, 4] and are eas-
ily spoofed using a simple photograph of the enrolled
person’s face, which may be displayed in hard-copy or
on a screen. In this short survey, we focus on methods
that present counter-measures to such kind of attacks.
Anti-spoofing for 2-D face recognition systems can
be coarsely classified in 3 categories with respect to the
clues used for attack detection: motion, texture anal-
ysis and liveness detection. In motion analysis one
is interested in detecting clues generated when two-
dimensional counterfeits are presented to the system
input camera, for example photos or video clips. Pla-
nar objects will move significantly differently from real
human faces which are 3-D objects, in many cases and
such deformation patterns can be used for spoof de-
tection. For example, [10] explores the Lambertian re-
flectance model to derive differences between the 2-D
images of the face presented during an attack and a real
(3-D) face, in real-access attempts. It does so by de-
riving an equation that estimates the latent reflectance
information that exists on images captured in both sce-
narios using either a variational retinex-based method
or a far simpler difference-of-gaussians [11] based ap-
proach similar to [12]. This is the first work on litera-
ture to propose a publicly available database specif-
ically tailored towards the development of spoofing
counter-measures. [13] present a technique to evalu-
ate liveness based on a short sequence of images using
a binary detector that evaluates the trajectories of se-
lected parts of the face presented to the input sensor
using a simplified optical flow analysis followed by an
heuristic classifier. The same authors introduce in [14]
a method for fusing scores from different experts sys-
tems that observe, concurrently, the 3-D face motion
scheme introduced on the previous work and liveness
properties such as eye-blinks or mouth movements. [15]
the authors propose a method to detect attacks pro-
duced with planar media (such as paper or screens)
using motion estimation by optical flow.
Texture analysis counter-measures take advantage
of texture patterns that may look unnatural when ex-
ploring the input image data. Examples of detectable
texture patterns are printing failures or overall image
blur. [12] describes a method for print-attack detec-
tion by exploiting differences in the 2-D Fourier spec-
tra comparing the hard-copies of client faces and real-
accesses. The method will work well for down-sampled
photos of the attacked identity, but is likely to fail for
higher-quality samples. In [16] the author proposes a
method to detect spoofing attacks using printed photos
by analyzing the micro-textures present on the paper
using a linear SVM classifier [17]. One limitation of this
method is that the input image needs to be reasonably
sharp.
Liveness detection tries to capture signs of life from
the user images by analysing spontaneous movements
that cannot be detected in photographs, such as eye-
blinks. [18] and [19] bring a real-time liveness detec-
tion specifically against photo-spoofing using (sponta-
neous) eye-blinks which are supposed to occur once ev-
ery 2-4 seconds in humans. The system developed uses
an undirected conditional random field framework to
model the eye-blinking that relaxes the independence
assumption of generative modelling and state depen-
dence limitations from hidden Markov modelling. A
later work by the same authors [20] augment the num-
ber of counter-measures deployed to include a scene
context matching that helps preventing video-spoofing
in stationary face-recognition systems.
3. The PRINT-ATTACK Database
The PRINT-ATTACK biometric (face) database
2
consists of short video recordings of both real-access
and attack attempts to 50 different identities. To create
the dataset each person recorded a number of videos
at 2 different stationary conditions:
controlled: In this case the background of the
scene is uniform and the light of a fluorescent lamp
illuminates the scene;
adverse: In this case the background of the scene
is non-uniform and day-light illuminates the scene.
Under these two different conditions, people were
asked to sit down in front of a custom acquisition sys-
tem built on an Apple 13-inch MacBook laptop and
capture two video sequences with a resolution of 320
by 240 pixels (QVGA), at 25 frames-per-second and of
15 seconds each (375 frames). Videos were recorded
using Apple’s Quicktime format (MOV files).
The laptop is positioned on the top of a short sup-
port (15 cm) so that faces are captured as they look
up-front. The acquisition operator launches the cap-
turing program and asks the person to look into the
laptop camera as they would normally do waiting for a
recognition system to do its task. The program shows a
reproduction of the current image being captured and,
overlaid, the output of a face-detector used to guide
the person during the session. In this particular setup,
faces are detected using a cascade of classifiers based on
a variant of Local Binary Patterns (LBP) [21] referred
as Modified Census Transform (MCT) [22]. The face-
detector helps the user self-adjusting the distance from
2
http://www.idiap.ch/dataset/printattack

Figure 1. Example hard-copies of client high-resolution pic-
tures.
the laptop camera and making sure that a face can be
detected at all times during the acquisition. After ac-
quisition was finished, the operator would still verify
the videos did not contain problems by visual inspec-
tion and proceed to acquire the next video.
3.1. Collecting samples and generating the attacks
Under the same illumination and background set-
tings used for real-access video clips, the acquisition
operator took two high-resolution pictures of each per-
son using a 12.1 megapixel Canon PowerShot SX150
IS camera that would be used as basis for the spoofing
attempts. People were asked to cooperate in with this
part of the acquisition so as to maximize the chances of
an attack to succeed. They were asked to look up-front
such as in the acquisition of the real-access attempts.
To realize the attacks, hard copies of the digital
photographs were printed on plain A4 paper using a
Triumph-Adler DCC 2520 color laser printer. Figure 1
shows some examples of printed copies. The left col-
umn contains samples taken from the controlled sce-
nario, while the right column shows samples from the
adverse scenario.
Using such images, the operator generates the at-
tacks by displaying the printouts of each client to the
same acquisition setup used for sampling the real-client
accesses. Video clips of about 10 seconds are captured
for each spoof attempt, in two different attack modes:
hand-based attacks: in this mode, the operator
holds the prints using their own hands;
fixed-support attacks: the operator glues the client
prints to the wall so they don’t move during the
spoof attempt.
The first set of (hand-based) attacks show a shak-
ing behavior that can be observed when people hold
photographs of spoofed identities in front of cameras
and that, sometimes, can trick eye-blinking detectors
[19]. It differs from the second set that is completely
static and should be easier for liveness-based counter-
measures to spoofing.
3.2. Performance Figures
A spoofing detection system is subject to two types
of errors, either the real access is rejected (false rejec-
tion) or an attack is accepted (false acceptance). In
order to measure the performance of a spoofing detec-
tion system, we use the Half Total Error Rate (HTER),
which combines the False Rejection Rate (FRR) and
the False Acceptance Rate (FAR) and is defined as:
HT ER(τ, D) =
F AR(τ, D ) + F RR(τ, D)
2
[%] (1)
where D denotes the used dataset. Since both the
FAR and the FRR depends on the threshold τ , they
are strongly related to each other: increasing the FAR
will reduce the FRR and vice-versa. For this reason,
results are often presented using either Receiver Oper-
ating Characteristic (ROC) or Detection-Error Trade-
off (DET) [23] curves, which basically plots the FAR
versus the FRR for different values of the threshold.
Another widely used measure to summarise the perfor-
mance of a system is the Equal Error Rate (EER), de-
fined as the point along the ROC or DET curve where
the FAR equals the FRR.
3.3. Protocols
The set of 400 videos (200 real-accesses and 200 at-
tacks) is decomposed into 3 subsets allowing for train-
ing, development and testing of binary classifiers. Iden-
tities for each subset were chosen randomly but do not
overlap, i.e. people that are on one of the subsets do
not appear in any other set. This choice guarantees
that specific behavior (such as eye-blinking patterns or
head-poses) are not picked up by detectors and final
systems generalize well.
Moreover, each print-attack subset can be further
sub-classified into two groups that split the attack-
ing support used during the acquisition (hand-based
or fixed-support). Counter-measures developed using
this database should report error figures that consider
both separated and aggregated grouping, from which
it is possible to understand which types of attacks are
better handled by the proposed method. Table 1 sum-
marizes the number of videos taken for both real-access

Type Train Devel. Test Total
Real-access 60 60 80 200
Print-attack 30+30 30+30 40+40 100+100
Total 120 120 160 400
Table 1. Number of videos in each database subset. Num-
bers displayed as sums indicate the amount of hand-based
and fixed-support attacks available in each subset when rel-
evant.
and print-attack attempts and how they are split in the
different subsets and groups.
It is recommended that training and development
samples are used to learn classifiers how to discrimi-
nate. One trivial example is to use the training set for
training the classifier itself and the development data
to estimate when to stop training. A second possibil-
ity, which may generalize less well, is to merge both
training and development sets, using the merged set as
training data and to formulate a stop criteria. Finally,
the test set should be solely used to report error rates
and performance curves. If a single number is desired,
a threshold τ should be chosen at the development set
and the HTER reported using the test set data. As
means of uniformizing reports, we recommend choos-
ing the threshold τ on the EER at the development
set.
We now define a baseline technique that can be used
as comparison point for future work developed using
this database, exemplifying how error should be re-
ported.
4. The Proposed Counter-Measure
Motion-based algorithms for anti-spoofing typically
use complex methods such as Optical Flow estimators
to extract deformation patterns from the image being
analyzed. Nevertheless, for stationary recognition sys-
tems, another far simpler clue can be effectively used
to distinguish between real-accesses and attacks: the
relative movement intensity between the face and the
scene background. In the case of an attack, using a
photograph or a video-clip, it should be possible to ob-
serve a high-correlation between the total amount of
movement in these two regions of interest (RoI).
4.1. Feature Extraction
For this baseline technique we ignore the move-
ment direction and focus on intensity only. The to-
tal motion in the RoI is calculated using simple gray-
scaled frame-difference and an area-based normaliza-
tion technique that removes differences in size so dif-
ferent face/background regions remain comparable as
50 100 150 200
0
1
2
3
4
5
6
(a) M, real-access
face
background
50 100 150 200
Frames
0
1
2
3
4
5
6
(b) M, attack
face
background
Figure 2. Motion M
D
calculated as function of time in a
typical real-access (a) and an attack (b).
shown in Equation 2. M
D
represents the motion co-
efficient for the given RoI support (x, y) D, with a
given support area of S
D
. Effectively, M
D
represents
the average absolute gray-scale difference between two
consecutive images (I
t
and I
t1
) in the video stream.
In the case of the face, the support region is provided
by same face detector used during the acquisition. The
background is computed by making D the whole im-
age and subtracting the part relative to the face prior
to averaging. Noise arriving from the face localisation
is avoided by considering the face region not to move
between two consecutive images.
M
D
=
1
S
D
X
(x,y)∈D
|I
t
(D) I
t1
(D)| (2)
The calculation of M
D
, even considering both RoIs,
can be implemented in a very efficient manner allow-
ing the variable to be computed for every two images
in the sequence being observed. Figure 2 shows the
evolution of M
D
for both face and background in two
scenarios: a real-access (a) and an attack (b). As it
can be observed, the motion variations exhibit greater
correlation in the case of an attack. Also note that the
M
D
signal for an attack seems to exhibit more vari-
ations in time, characterized by the amount of signal
energy and higher-frequency components.
4.2. Classification
To input the motion coefficients into a classifier and
avoid the variability in time, we extract 5 quantities
that describe the signal pattern for windows of N non-
overlapping images. The 5 quantities are the minimum
of the signal in the window, the maximum, the average,

the standard deviation and the ratio R between the
spectra sum for all non-DC components components
and the DC component itself taking as basis the N -
point Fourier transform of the signal at the window
(see Equation 3).
R =
P
N
i=1
|FFT
i
|
|FFT
0
|
(3)
These quantities allow for a trained classifier to eval-
uate the degree of synchronized shaking within the
scene, during the period of time defined by N . If
there is no movement (fixed support attack) or too
much movement (hand-based attack), the input data
is likely to come from a spoof attempt. Normal ac-
cesses will exhibit decorrelated movement between the
two RoIs as normal users move independently from the
background.
4.3. Temporal Processing
In order to combine the time information with that
of the window-based classifier, we accumulate the out-
put over time for every block of N frames and apply
a very simple binary decision scheme using a majority-
wins approach. For every output the threshold τ de-
fined at the EER on the development set is applied
and, if the output is greater than τ we label it as a
real-access, with a value of 1. Otherwise, we apply a
label of 0. After a number M of decisions have been
collected, we average the values attributed in each win-
dow and check if such a value exceeds 0.5. If that is
the case, by majority of votes, we determine the video
comes from a real-access, otherwise, a spoof attempt.
5. Exp eriments
For this work, the window size N has been arbitrar-
ily fixed at 20. This value represents roughly a second
of activity and allows the counter measure to be applied
in discrete moments when integrated into a face recog-
nition framework. After the calculation of M
D
, the in-
put signal is broken into 20-point non-overlapping win-
dows and fed to a multi-layer perceptron (MLP) classi-
fier [24] with 5 hidden neurons, matching the number of
inputs, and a single output node. Tries to increase the
number of neurons on the hidden layer did not show
better generalization and increases the probability of
over-fitting. Reducing the number of neurons in such
a layer showed performance degradation.
The network is trained using a resilient back-
propagation algorithm [25] and exclusively using the
training set video sequences. To avoid over-fitting and
improve generalization, the development set is used to
Development Test
Support FAR FRR FAR FRR HTER
Hand 10.91% 10.93% 6.82% 7.71% 7.27%
Fixed 10.30% 10.28% 14.77% 7.29% 11.03%
All 10.61% 10.65% 10.45% 7.50% 8.98%
Table 2. Summary of results by analyzing shaking behavior
on print attacks.
0.5
1
2
5
10
20
40
60
80
90
95
98
99
99.5
99.8
99.9
99.95
99.98
99.99
False Rejection Rate [in %]
0.001
0.002
0.005
0.01
0.02
0.05
0.1
0.2
0.5
1
2
5
10
20
False Acceptance Rate [in %]
DET curve (motion-based spoofing counter-measure)
Test set
Developement set
Figure 3. DET curves for the classifier leading to results in
Table 2. The curves are traced using all data available in
the respective sets (hand + fixed-support).
stop the training procedure as soon as the squared-
output error on such a set reaches its first minimum.
After training, a threshold is chosen on the equal-error
rate (EER) using the development set and, based on
such a value, the test set is used to evaluate the final
performance of the classifier.
5.1. Results
Table 2 summarizes the best results for the print-
attack development and test set classification. Figure 3
shows the DET curve for both the test and develop-
ment sets taking as base the same classifier.
Naturally, for the training procedure, the MLP
weights are initialized randomly. To assure stable con-
vergence we repeated the training procedure several
times (> 10), verifying equivalent minima is reached for
the squared error and similar generalization is achieved
by the MLP network. Other MLP’s, trained using the
same parameters, achieve similar results, with differ-
ence of only 1 percent on the test HTER.

Citations
More filters
Proceedings ArticleDOI

Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

TL;DR: In this article, the authors introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs, which increases the average minimum number of features that need to be modified to create adversarial examples by about 800%.
Journal ArticleDOI

Face Spoof Detection With Image Distortion Analysis

TL;DR: An efficient and rather robust face spoof detection algorithm based on image distortion analysis (IDA) that outperforms the state-of-the-art methods in spoof detection and highlights the difficulty in separating genuine and spoof faces, especially in cross-database and cross-device scenarios.
Proceedings Article

On the effectiveness of local binary patterns in face anti-spoofing

TL;DR: This paper inspects the potential of texture features based on Local Binary Patterns (LBP) and their variations on three types of attacks: printed photographs, and photos and videos displayed on electronic screens of different sizes and concludes that LBP show moderate discriminability when confronted with a wide set of attack types.
Proceedings ArticleDOI

A face antispoofing database with diverse attacks

TL;DR: A face antispoofing database which covers a diverse range of potential attack variations, and a baseline algorithm is given for comparison, which explores the high frequency information in the facial region to determine the liveness.
Posted Content

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

TL;DR: The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN, and analytically investigates the generalizability and robustness properties granted by the use of defensive Distillation when training DNNs.
References
More filters
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Book

Pattern Recognition and Machine Learning (Information Science and Statistics)

TL;DR: Looking for competent reading resources?
Proceedings ArticleDOI

A direct adaptive method for faster backpropagation learning: the RPROP algorithm

TL;DR: A learning algorithm for multilayer feedforward networks, RPROP (resilient propagation), is proposed that performs a local adaptation of the weight-updates according to the behavior of the error function to overcome the inherent disadvantages of pure gradient-descent.
Book ChapterDOI

Face Recognition with Local Binary Patterns

TL;DR: A novel approach to face recognition which considers both shape and texture information to represent face images and the simplicity of the proposed method allows for very fast feature extraction.
Proceedings Article

The DET Curve in Assessment of Detection Task Performance

TL;DR: The DET Curve is introduced as a means of representing performance on detection tasks that involve a tradeoff of error types and why it is likely to produce approximately linear curves.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What contributions have the authors mentioned in the paper "Counter-measures to photo attacks in face recognition: a public database and a baseline" ?

The authors attribute the reason for this delay, partly, to the unavailability of public databases and protocols to study solutions and compare results. To this purpose the authors introduce the publicly available PRINT-ATTACK database and exemplify how to use its companion protocol with a motion-based algorithm that detects correlations between the person ’ s head movements and the scene context. 

Motion-based algorithms for anti-spoofing typically use complex methods such as Optical Flow estimators to extract deformation patterns from the image being analyzed. 

A missing key to this puzzle is the lack of standard databases to test counter-measures, followed by a set of protocols to evaluate performance and allow for objective comparison. 

Examples of detectable texture patterns are printing failures or overall image blur. [12] describes a method for print-attack detection by exploiting differences in the 2-D Fourier spectra comparing the hard-copies of client faces and realaccesses. 

To input the motion coefficients into a classifier and avoid the variability in time, the authors extract 5 quantities that describe the signal pattern for windows of N nonoverlapping images. 

In order to combine the time information with that of the window-based classifier, the authors accumulate the output over time for every block of N frames and apply a very simple binary decision scheme using a majoritywins approach. 

If there is no movement (fixed support attack) or too much movement (hand-based attack), the input data is likely to come from a spoof attempt. 

MD = 1SD ∑ (x,y)∈D |It(D)− It−1(D)| (2)The calculation of MD, even considering both RoIs, can be implemented in a very efficient manner allowing the variable to be computed for every two images in the sequence being observed. 

One variable often disregarded in research is the motion pattern introduced by the attacker, while displaying the device with the photograph of the client face being attacked. 

A spoofing detection system is subject to two types of errors, either the real access is rejected (false rejection) or an attack is accepted (false acceptance). 

This is the first work on literature to propose a publicly available database specifically tailored towards the development of spoofing counter-measures. [13] present a technique to evaluate liveness based on a short sequence of images using a binary detector that evaluates the trajectories of selected parts of the face presented to the input sensor using a simplified optical flow analysis followed by an heuristic classifier. 

The authors again confirm their expectations that the system would work better for hand-based attacks by observing that the HTER on that subset is always smaller or equal to the values on the fixed-support column.