What contributions have the authors mentioned in the paper "Counter-measures to photo attacks in face recognition: a public database and a baseline" ?

The authors attribute the reason for this delay, partly, to the unavailability of public databases and protocols to study solutions and compare results. To this purpose the authors introduce the publicly available PRINT-ATTACK database and exemplify how to use its companion protocol with a motion-based algorithm that detects correlations between the person ’ s head movements and the scene context.

How many quantities are used to describe the signal pattern for windows of N nonoverlapping images?

To input the motion coefficients into a classifier and avoid the variability in time, the authors extract 5 quantities that describe the signal pattern for windows of N nonoverlapping images.

How do the authors combine the time information with the binary decision scheme?

In order to combine the time information with that of the window-based classifier, the authors accumulate the output over time for every block of N frames and apply a very simple binary decision scheme using a majoritywins approach.

How many images can be used to calculate the MD?

MD = 1SD ∑ (x,y)∈D |It(D)− It−1(D)| (2)The calculation of MD, even considering both RoIs, can be implemented in a very efficient manner allowing the variable to be computed for every two images in the sequence being observed.

What is the common variable in research?

One variable often disregarded in research is the motion pattern introduced by the attacker, while displaying the device with the photograph of the client face being attacked.

How does the HTER on the test set compare to the values on the fixed-support column?

The authors again confirm their expectations that the system would work better for hand-based attacks by observing that the HTER on that subset is always smaller or equal to the values on the fixed-support column.

(Open Access) Counter-measures to photo attacks in face recognition: A public database and a baseline (2011) | André Anjos

Q: What is the method for detecting spoofing?

Motion-based algorithms for anti-spoofing typically use complex methods such as Optical Flow estimators to extract deformation patterns from the image being analyzed.

Q: What is the likely source of the input data coming from a spoof attempt?

If there is no movement (fixed support attack) or too much movement (hand-based attack), the input data is likely to come from a spoof attempt.

Q: What is the definition of a spoofing detection system?

A spoofing detection system is subject to two types of errors, either the real access is rejected (false rejection) or an attack is accepted (false acceptance).

Counter-Measures to Photo Attacks in Face Recognition: a public

database and a baseline

Andr´e Anjos and S´ebastien Marcel

Idiap Research Institute

Centre du Parc - rue Marconi 19

CH-1920 Martigny, Suisse

{andre.anjos,sebastien.marcel}@idiap.ch

Abstract

A common technique to by-pass 2-D face recognition

systems is to use photographs of spoofed identities. Un-

fortunately, research in counter-measures to this type

of attack have not kept-up - even if such threats have

been known for nearly a decade, there seems to exist no

consensus on best practices, techniques or protocols for

developing and testing spooﬁng-detectors for face recog-

nition. We attribute the reason for this delay, partly, to

the unavailability of public databases and protocols to

study solutions and compare results. To this purpose

we introduce the publicly available PRINT-ATTACK

database and exemplify how to use its companion pro-

tocol with a motion-based algorithm that detects corre-

lations between the person’s head movements and the

scene context. The results are to be used as basis for

comparison to other counter-measure techniques. The

PRINT-ATTACK database contains 200 videos of real-

accesses and 200 videos of spoof attempts using printed

photographs of 50 diﬀerent identities.

1. Introduction

Identity theft is a concern that prevents the main-

stream adoption of biometrics as de facto form of

identiﬁcation in commercial systems [1]. Contrary to

password-protected systems, our biometric information

is widely available and extremely easy to sample. It

suﬃces a small search on the internet to unveil pre-

labelled samples from users at specialized websites such

as Flickr or Facebook. Images can also be easily cap-

tured at distance without previous consent. Users can-

not trust that these samples will not be dishonestly

used to assume their identity before biometric recogni-

tion systems.

In this work we are particularly concerned with di-

rect [2] print-attacks to unimodal 2-D (visual spectra)

face-recognition systems

. These so-called spooﬁng at-

tempts [3] are direct attacks to the input sensors of the

biometric system. Attackers in this case are assumed

not to have access to the internals of the recognition

system and manage to penetrate by only displaying

printed photos of the attacked identity to the input

camera. This type of attack is therefore very easy to

reproduce and has great potential to succeed [4].

Despite the fact that solutions exist for spoof pre-

vention using multi-modal techniques [5, 6, 7, 8], it

is our belief that research for counter-measures solely

based on unimodal 2-D imagery has not yet reached a

matured state. There seems to exist no consensus on

best practices and techniques to be deployed on attack

detection using non-intrusive methods. The number

of publications on the subject is small. A missing key

to this puzzle is the lack of standard databases to test

counter-measures, followed by a set of protocols to eval-

uate performance and allow for objective comparison.

This work introduces a publicly available database,

protocols and a baseline technique to evaluate counter-

measures to spooﬁng attacks in face recognition sys-

tems. The remaining of this text is organized as fol-

lows: Section 2 discusses the current state-of-the-art in

anti-spooﬁng for 2-D face recognition systems. Sec-

tion 3 describes the PRINT-ATTACK anti-spooﬁng

database and deﬁnes protocols for its usage. Section 4

deﬁnes a baseline technique that can be used for com-

parison with other algorithms. Section 5 reports on the

experimental setup and results. Finally, Section 6 con-

cludes and discusses possible extensions of this work.

We will refer to such systems simply as face recognition sys-

tems from this point onwards.

2. Literature Survey

Face recognition systems are known to respond

weakly to attacks for a long time [9, 4] and are eas-

ily spoofed using a simple photograph of the enrolled

person’s face, which may be displayed in hard-copy or

on a screen. In this short survey, we focus on methods

that present counter-measures to such kind of attacks.

Anti-spooﬁng for 2-D face recognition systems can

be coarsely classiﬁed in 3 categories with respect to the

clues used for attack detection: motion, texture anal-

ysis and liveness detection. In motion analysis one

is interested in detecting clues generated when two-

dimensional counterfeits are presented to the system

input camera, for example photos or video clips. Pla-

nar objects will move signiﬁcantly diﬀerently from real

human faces which are 3-D objects, in many cases and

such deformation patterns can be used for spoof de-

tection. For example, [10] explores the Lambertian re-

ﬂectance model to derive diﬀerences between the 2-D

images of the face presented during an attack and a real

(3-D) face, in real-access attempts. It does so by de-

riving an equation that estimates the latent reﬂectance

information that exists on images captured in both sce-

narios using either a variational retinex-based method

or a far simpler diﬀerence-of-gaussians [11] based ap-

proach similar to [12]. This is the ﬁrst work on litera-

ture to propose a publicly available database specif-

ically tailored towards the development of spooﬁng

counter-measures. [13] present a technique to evalu-

ate liveness based on a short sequence of images using

a binary detector that evaluates the trajectories of se-

lected parts of the face presented to the input sensor

using a simpliﬁed optical ﬂow analysis followed by an

heuristic classiﬁer. The same authors introduce in [14]

a method for fusing scores from diﬀerent experts sys-

tems that observe, concurrently, the 3-D face motion

scheme introduced on the previous work and liveness

properties such as eye-blinks or mouth movements. [15]

the authors propose a method to detect attacks pro-

duced with planar media (such as paper or screens)

using motion estimation by optical ﬂow.

Texture analysis counter-measures take advantage

of texture patterns that may look unnatural when ex-

ploring the input image data. Examples of detectable

texture patterns are printing failures or overall image

blur. [12] describes a method for print-attack detec-

tion by exploiting diﬀerences in the 2-D Fourier spec-

tra comparing the hard-copies of client faces and real-

accesses. The method will work well for down-sampled

photos of the attacked identity, but is likely to fail for

higher-quality samples. In [16] the author proposes a

method to detect spooﬁng attacks using printed photos

by analyzing the micro-textures present on the paper

using a linear SVM classiﬁer [17]. One limitation of this

method is that the input image needs to be reasonably

sharp.

Liveness detection tries to capture signs of life from

the user images by analysing spontaneous movements

that cannot be detected in photographs, such as eye-

blinks. [18] and [19] bring a real-time liveness detec-

tion speciﬁcally against photo-spooﬁng using (sponta-

neous) eye-blinks which are supposed to occur once ev-

ery 2-4 seconds in humans. The system developed uses

an undirected conditional random ﬁeld framework to

model the eye-blinking that relaxes the independence

assumption of generative modelling and state depen-

dence limitations from hidden Markov modelling. A

later work by the same authors [20] augment the num-

ber of counter-measures deployed to include a scene

context matching that helps preventing video-spooﬁng

in stationary face-recognition systems.

3. The PRINT-ATTACK Database

The PRINT-ATTACK biometric (face) database

consists of short video recordings of both real-access

and attack attempts to 50 diﬀerent identities. To create

the dataset each person recorded a number of videos

at 2 diﬀerent stationary conditions:

• controlled: In this case the background of the

scene is uniform and the light of a ﬂuorescent lamp

illuminates the scene;

• adverse: In this case the background of the scene

is non-uniform and day-light illuminates the scene.

Under these two diﬀerent conditions, people were

asked to sit down in front of a custom acquisition sys-

tem built on an Apple 13-inch MacBook laptop and

capture two video sequences with a resolution of 320

by 240 pixels (QVGA), at 25 frames-per-second and of

15 seconds each (375 frames). Videos were recorded

using Apple’s Quicktime format (MOV ﬁles).

The laptop is positioned on the top of a short sup-

port (∼15 cm) so that faces are captured as they look

up-front. The acquisition operator launches the cap-

turing program and asks the person to look into the

laptop camera as they would normally do waiting for a

recognition system to do its task. The program shows a

reproduction of the current image being captured and,

overlaid, the output of a face-detector used to guide

the person during the session. In this particular setup,

faces are detected using a cascade of classiﬁers based on

a variant of Local Binary Patterns (LBP) [21] referred

as Modiﬁed Census Transform (MCT) [22]. The face-

detector helps the user self-adjusting the distance from

http://www.idiap.ch/dataset/printattack

Figure 1. Example hard-copies of client high-resolution pic-

tures.

the laptop camera and making sure that a face can be

detected at all times during the acquisition. After ac-

quisition was ﬁnished, the operator would still verify

the videos did not contain problems by visual inspec-

tion and proceed to acquire the next video.

3.1. Collecting samples and generating the attacks

Under the same illumination and background set-

tings used for real-access video clips, the acquisition

operator took two high-resolution pictures of each per-

son using a 12.1 megapixel Canon PowerShot SX150

IS camera that would be used as basis for the spooﬁng

attempts. People were asked to cooperate in with this

part of the acquisition so as to maximize the chances of

an attack to succeed. They were asked to look up-front

such as in the acquisition of the real-access attempts.

To realize the attacks, hard copies of the digital

photographs were printed on plain A4 paper using a

Triumph-Adler DCC 2520 color laser printer. Figure 1

shows some examples of printed copies. The left col-

umn contains samples taken from the controlled sce-

nario, while the right column shows samples from the

adverse scenario.

Using such images, the operator generates the at-

tacks by displaying the printouts of each client to the

same acquisition setup used for sampling the real-client

accesses. Video clips of about 10 seconds are captured

for each spoof attempt, in two diﬀerent attack modes:

• hand-based attacks: in this mode, the operator

holds the prints using their own hands;

• ﬁxed-support attacks: the operator glues the client

prints to the wall so they don’t move during the

spoof attempt.

The ﬁrst set of (hand-based) attacks show a shak-

ing behavior that can be observed when people hold

photographs of spoofed identities in front of cameras

and that, sometimes, can trick eye-blinking detectors

[19]. It diﬀers from the second set that is completely

static and should be easier for liveness-based counter-

measures to spooﬁng.

3.2. Performance Figures

A spooﬁng detection system is subject to two types

of errors, either the real access is rejected (false rejec-

tion) or an attack is accepted (false acceptance). In

order to measure the performance of a spooﬁng detec-

tion system, we use the Half Total Error Rate (HTER),

which combines the False Rejection Rate (FRR) and

the False Acceptance Rate (FAR) and is deﬁned as:

HT ER(τ, D) =

F AR(τ, D ) + F RR(τ, D)

[%] (1)

where D denotes the used dataset. Since both the

FAR and the FRR depends on the threshold τ , they

are strongly related to each other: increasing the FAR

will reduce the FRR and vice-versa. For this reason,

results are often presented using either Receiver Oper-

ating Characteristic (ROC) or Detection-Error Trade-

oﬀ (DET) [23] curves, which basically plots the FAR

versus the FRR for diﬀerent values of the threshold.

Another widely used measure to summarise the perfor-

mance of a system is the Equal Error Rate (EER), de-

ﬁned as the point along the ROC or DET curve where

the FAR equals the FRR.

3.3. Protocols

The set of 400 videos (200 real-accesses and 200 at-

tacks) is decomposed into 3 subsets allowing for train-

ing, development and testing of binary classiﬁers. Iden-

tities for each subset were chosen randomly but do not

overlap, i.e. people that are on one of the subsets do

not appear in any other set. This choice guarantees

that speciﬁc behavior (such as eye-blinking patterns or

head-poses) are not picked up by detectors and ﬁnal

systems generalize well.

Moreover, each print-attack subset can be further

sub-classiﬁed into two groups that split the attack-

ing support used during the acquisition (hand-based

or ﬁxed-support). Counter-measures developed using

this database should report error ﬁgures that consider

both separated and aggregated grouping, from which

it is possible to understand which types of attacks are

better handled by the proposed method. Table 1 sum-

marizes the number of videos taken for both real-access

Type Train Devel. Test Total

Real-access 60 60 80 200

Print-attack 30+30 30+30 40+40 100+100

Total 120 120 160 400

Table 1. Number of videos in each database subset. Num-

bers displayed as sums indicate the amount of hand-based

and ﬁxed-support attacks available in each subset when rel-

evant.

and print-attack attempts and how they are split in the

diﬀerent subsets and groups.

It is recommended that training and development

samples are used to learn classiﬁers how to discrimi-

nate. One trivial example is to use the training set for

training the classiﬁer itself and the development data

to estimate when to stop training. A second possibil-

ity, which may generalize less well, is to merge both

training and development sets, using the merged set as

training data and to formulate a stop criteria. Finally,

the test set should be solely used to report error rates

and performance curves. If a single number is desired,

a threshold τ should be chosen at the development set

and the HTER reported using the test set data. As

means of uniformizing reports, we recommend choos-

ing the threshold τ on the EER at the development

set.

We now deﬁne a baseline technique that can be used

as comparison point for future work developed using

this database, exemplifying how error should be re-

ported.

4. The Proposed Counter-Measure

Motion-based algorithms for anti-spooﬁng typically

use complex methods such as Optical Flow estimators

to extract deformation patterns from the image being

analyzed. Nevertheless, for stationary recognition sys-

tems, another far simpler clue can be eﬀectively used

to distinguish between real-accesses and attacks: the

relative movement intensity between the face and the

scene background. In the case of an attack, using a

photograph or a video-clip, it should be possible to ob-

serve a high-correlation between the total amount of

movement in these two regions of interest (RoI).

4.1. Feature Extraction

For this baseline technique we ignore the move-

ment direction and focus on intensity only. The to-

tal motion in the RoI is calculated using simple gray-

scaled frame-diﬀerence and an area-based normaliza-

tion technique that removes diﬀerences in size so dif-

ferent face/background regions remain comparable as

50 100 150 200

(a) M, real-access

face

background

50 100 150 200

Frames

(b) M, attack

face

background

Figure 2. Motion M

calculated as function of time in a

typical real-access (a) and an attack (b).

shown in Equation 2. M

represents the motion co-

eﬃcient for the given RoI support (x, y) ∈ D, with a

given support area of S

. Eﬀectively, M

represents

the average absolute gray-scale diﬀerence between two

consecutive images (I

and I

t−1

) in the video stream.

In the case of the face, the support region is provided

by same face detector used during the acquisition. The

background is computed by making D the whole im-

age and subtracting the part relative to the face prior

to averaging. Noise arriving from the face localisation

is avoided by considering the face region not to move

between two consecutive images.

(x,y)∈D

(D) − I

t−1

(D)| (2)

The calculation of M

, even considering both RoIs,

can be implemented in a very eﬃcient manner allow-

ing the variable to be computed for every two images

in the sequence being observed. Figure 2 shows the

evolution of M

for both face and background in two

scenarios: a real-access (a) and an attack (b). As it

can be observed, the motion variations exhibit greater

correlation in the case of an attack. Also note that the

signal for an attack seems to exhibit more vari-

ations in time, characterized by the amount of signal

energy and higher-frequency components.

4.2. Classiﬁcation

To input the motion coeﬃcients into a classiﬁer and

avoid the variability in time, we extract 5 quantities

that describe the signal pattern for windows of N non-

overlapping images. The 5 quantities are the minimum

of the signal in the window, the maximum, the average,

the standard deviation and the ratio R between the

spectra sum for all non-DC components components

and the DC component itself taking as basis the N -

point Fourier transform of the signal at the window

(see Equation 3).

R =

i=1

|FFT

(3)

These quantities allow for a trained classiﬁer to eval-

uate the degree of synchronized shaking within the

scene, during the period of time deﬁned by N . If

there is no movement (ﬁxed support attack) or too

much movement (hand-based attack), the input data

is likely to come from a spoof attempt. Normal ac-

cesses will exhibit decorrelated movement between the

two RoIs as normal users move independently from the

background.

4.3. Temporal Processing

In order to combine the time information with that

of the window-based classiﬁer, we accumulate the out-

put over time for every block of N frames and apply

a very simple binary decision scheme using a majority-

wins approach. For every output the threshold τ de-

ﬁned at the EER on the development set is applied

and, if the output is greater than τ we label it as a

real-access, with a value of 1. Otherwise, we apply a

label of 0. After a number M of decisions have been

collected, we average the values attributed in each win-

dow and check if such a value exceeds 0.5. If that is

the case, by majority of votes, we determine the video

comes from a real-access, otherwise, a spoof attempt.

5. Exp eriments

For this work, the window size N has been arbitrar-

ily ﬁxed at 20. This value represents roughly a second

of activity and allows the counter measure to be applied

in discrete moments when integrated into a face recog-

nition framework. After the calculation of M

, the in-

put signal is broken into 20-point non-overlapping win-

dows and fed to a multi-layer perceptron (MLP) classi-

ﬁer [24] with 5 hidden neurons, matching the number of

inputs, and a single output node. Tries to increase the

number of neurons on the hidden layer did not show

better generalization and increases the probability of

over-ﬁtting. Reducing the number of neurons in such

a layer showed performance degradation.

The network is trained using a resilient back-

propagation algorithm [25] and exclusively using the

training set video sequences. To avoid over-ﬁtting and

improve generalization, the development set is used to

Development Test

Support FAR FRR FAR FRR HTER

Hand 10.91% 10.93% 6.82% 7.71% 7.27%

Fixed 10.30% 10.28% 14.77% 7.29% 11.03%

All 10.61% 10.65% 10.45% 7.50% 8.98%

Table 2. Summary of results by analyzing shaking behavior

on print attacks.

0.5

99.5

99.8

99.9

99.95

99.98

99.99

False Rejection Rate [in %]

0.001

0.002

0.005

0.01

0.02

0.05

0.1

0.2

0.5

False Acceptance Rate [in %]

DET curve (motion-based spoofing counter-measure)

Test set

Developement set

Figure 3. DET curves for the classiﬁer leading to results in

Table 2. The curves are traced using all data available in

the respective sets (hand + ﬁxed-support).

stop the training procedure as soon as the squared-

output error on such a set reaches its ﬁrst minimum.

After training, a threshold is chosen on the equal-error

rate (EER) using the development set and, based on

such a value, the test set is used to evaluate the ﬁnal

performance of the classiﬁer.

5.1. Results

Table 2 summarizes the best results for the print-

attack development and test set classiﬁcation. Figure 3

shows the DET curve for both the test and develop-

ment sets taking as base the same classiﬁer.

Naturally, for the training procedure, the MLP

weights are initialized randomly. To assure stable con-

vergence we repeated the training procedure several

times (> 10), verifying equivalent minima is reached for

the squared error and similar generalization is achieved

by the MLP network. Other MLP’s, trained using the

same parameters, achieve similar results, with diﬀer-

ence of only 1 percent on the test HTER.

Counter-measures to photo attacks in face recognition: A public database and a baseline

Figures

Citations

Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

Face Spoof Detection With Image Distortion Analysis

On the effectiveness of local binary patterns in face anti-spoofing

A face antispoofing database with diverse attacks

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

References

The Nature of Statistical Learning Theory

Pattern Recognition and Machine Learning (Information Science and Statistics)

A direct adaptive method for faster backpropagation learning: the RPROP algorithm

Face Recognition with Local Binary Patterns

The DET Curve in Assessment of Detection Task Performance

Related Papers (5)

On the effectiveness of local binary patterns in face anti-spoofing

A face antispoofing database with diverse attacks

Face spoofing detection from single images using micro-texture analysis

Face liveness detection from a single image with sparse low rank bilinear discriminative model

Face Spoof Detection With Image Distortion Analysis

Frequently Asked Questions (12)

Q1. What contributions have the authors mentioned in the paper "Counter-measures to photo attacks in face recognition: a public database and a baseline" ?

Q2. What is the method for detecting spoofing?

Q3. What is the missing key to this puzzle?

Q4. What are examples of detectable texture patterns?

Q5. How many quantities are used to describe the signal pattern for windows of N nonoverlapping images?

Q6. How do the authors combine the time information with the binary decision scheme?

Q7. What is the likely source of the input data coming from a spoof attempt?

Q8. How many images can be used to calculate the MD?

Q9. What is the common variable in research?

Q10. What is the definition of a spoofing detection system?

Q11. What is the first work on literature to propose a publicly available database specifically tailored towards the development?

Q12. How does the HTER on the test set compare to the values on the fixed-support column?