scispace - formally typeset
Open AccessProceedings ArticleDOI

Analyzing and Reducing the Damage of Dataset Bias to Face Recognition With Synthetic Data

TLDR
This study demonstrates the large potential of synthetic data for analyzing and reducing the negative effects of dataset bias on deep face recognition systems and shows that current neural network architectures cannot disentangle face pose and facial identity, which limits their generalization ability.
Abstract
It is well known that deep learning approaches to face recognition suffer from various biases in the available training data. In this work, we demonstrate the large potential of synthetic data for analyzing and reducing the negative effects of dataset bias on deep face recognition systems. In particular we explore two complementary application areas for synthetic face images: 1) Using fully annotated synthetic face images we can study the face recognition rate as a function of interpretable parameters such as face pose. This enables us to systematically analyze the effect of different types of dataset biases on the generalization ability of neural network architectures. Our analysis reveals that deeper neural network architectures can generalize better to unseen face poses. Furthermore, our study shows that current neural network architectures cannot disentangle face pose and facial identity, which limits their generalization ability. 2) We pre-train neural networks with large-scale synthetic data that is highly variable in face pose and the number of facial identities. After a subsequent fine-tuning with real-world data, we observe that the damage of dataset bias in the real-world data is largely reduced. Furthermore, we demonstrate that the size of real-world datasets can be reduced by 75% while maintaining competitive face recognition performance. The data and software used in this work are publicly available.

read more

Content maybe subject to copyright    Report

Analyzing and Reducing the Damage of Dataset Bias
to Face Recognition with Synthetic Data
Adam Kortylewski Bernhard Egger Andreas Schneider
Thomas Gerig Andreas Morel-Forster Thomas Vetter
Department of Mathematics and Computer Science
University of Basel
Abstract
It is well known that deep learning approaches to face
recognition suffer from various biases in the available train-
ing data. In this work, we demonstrate the large potential
of synthetic data for analyzing and reducing the negative
effects of dataset bias on deep face recognition systems. In
particular we explore two complementary application areas
for synthetic face images: 1) Using fully annotated synthetic
face images we can study the face recognition rate as a
function of interpretable parameters such as face pose. This
enables us to systematically analyze the effect of different
types of dataset biases on the generalization ability of neu-
ral network architectures. Our analysis reveals that deeper
neural network architectures can generalize better to un-
seen face poses. Furthermore, our study shows that current
neural network architectures cannot disentangle face pose
and facial identity, which limits their generalization ability.
2) We pre-train neural networks with large-scale synthetic
data that is highly variable in face pose and the number of
facial identities. After a subsequent fine-tuning with real-
world data, we observe that the damage of dataset bias in
the real-world data is largely reduced. Furthermore, we
demonstrate that the size of real-world datasets can be re-
duced by 75% while maintaining competitive face recogni-
tion performance. The data and software used in this work
are publicly available
1
.
1. Introduction
Deep face recognition systems [22, 21, 19] have
achieved remarkable performances on challenging datasets,
due to advances in deep learning [18] and the availability
of large-scale training data [10, 13, 25]. However, training
datasets for face recognition are biased regarding nuisance
variables, such as the face pose or the illumination condi-
tions, because they were mostly collected from the web. It
1
https://github.com/unibas-gravis/parametric-face-image-generator
is well known that such biases have severe negative effects
on the generalization performance of machine learning sys-
tems [24, 14, 23, 17]. Therefore, the face recognition com-
munity faces two fundamental problems: 1) It is difficult to
systematically analyze the effects of dataset bias on the gen-
eralization performance, since a fine-grained annotation of
nuisance variables is practically unfeasible on large-scale
datasets. 2) Deep face recognition systems do not gener-
alize well across benchmarks, due to the severe sampling
biases in public datasets (as illustrated in Section 4). This
causes well-known problems such as a lack of diversity and
fairness in face recognition [15]. It is unclear how such
damages from dataset bias can be undone.
We propose to overcome both problems by leveraging
synthetic face images which are generated with a paramet-
ric 3D Morphable Face Model [3, 7]. In particular, we in-
troduce a data generator which creates synthetic face im-
ages with precise annotation of parameters that define the
facial identity, such as shape and texture, but also of nui-
sance parameters, such as light, camera and head pose. In
our experiments, we explore two application areas for syn-
thetic images in the context of face recognition:
Systematic analysis of the damage from dataset
bias. We use fully annotated synthetic face images to
study the face recognition rate as a function of nui-
sance variables such as face pose. This enables us
to systematically study the effect of different types of
dataset biases on the generalization ability of neural
network architectures.
Pre-training with synthetic data. We generate large-
scale synthetic data for pre-training DCNNs and sub-
sequently fine-tune them with real-world data. The
parametric nature of the generator enables us to design
the distribution of nuisances in the synthetic data such
that is it highly variable in nuisance parameters that are
well known to be biased in real-world datasets (such as
pose and facial identity).
1

Based on our extensive experimental evaluation we gain
several novel insights about the effects of dataset bias on the
generalization ability of DCNNs at the task of face recog-
nition: i) It is well known that DCNNs with the VGG-16
architecture can generalize better than with the AlexNet ar-
chitecture at face recognition tasks. Using the presented
methodology we reveal that VGG-16 outperforms AlexNet,
because it can much better generalize to unseen face poses,
although it has significantly more parameters (Section 3.2).
ii) In a real world scenario, not all identities in the training
data share the same distribution of face poses. We simulate
this setting and observe that DCNNs cannot disentangle the
facial identity from the face pose, which limits their abil-
ity to generalize from biased data (Section 3.3). iii) Using
synthetic face images for pre-training, we can enhance the
generalization performance of deep neural networks consis-
tently across several benchmark datasets (Section 4.3). iv)
The amount of real-world data needed to achieve competi-
tive performance is reduced considerably (Section 4.3) after
pre-training with synthetic data. Thus, offering a means to
concentrate data collection efforts to less but higher quality
data in terms variability.
Curiously, despite the success of 3D Morphable Face
Models at facial image generation, we are not aware of any
previous work that uses this effective and easily accessible
approach to analyze and enhance face recognition systems.
2. Face Image Generator
We use a fully parametric generator for the synthesis of
face images with detailed annotation of the most relevant
nuisance transformations. Our generator is based on a 3D
Morphable Model [3] of face shape, color and expression.
In particular, we use the Basel Face Model 2017 (BFM-
2017) [7] which is learned from 200 neutral face scans and
160 expression deformations. Natural looking, three dimen-
sional faces with expressions can be generated by sampling
from the statistical distribution of the model. In order to
achieve a natural illumination in the synthetic face images,
we sample the spherical harmonics illumination parameters
from the Basel Illumination Prior (BIP) [5]. Using com-
puter graphics we generate a 2D image from a 3D face, sam-
pled from the model. We use a non-parametric background
model that chooses random background textures from the
data provided in the describable texture database [4]. The
face image generator is built on the scalismo-faces software
framework [20]. The advantage of using 3DMMs for data
synthesis over related generative face models such as e.g.
GANs [2, 8] is that the 3DMM provides full control over
disentangled parameters that change the facial identity in
the terms of shape and albedo texture as well as pose, il-
lumination and facial expression. The proposed generator
enables us to generate infinite amount of face images with
detailed labeling of the most relevant sources of image vari-
Figure 1: Synthetic face images sampled from our data gen-
erator. The facial identity in each row is the same. The top
row illustrates the precise control over image parameters,
where only the yaw pose is changed while all other nuisance
parameters are fixed (as used in Section 3). The bottom row
illustrates synthetic faces generated by randomly sampling
all nuisance variables (as used in Section 4).
ation. Example images synthesized from the generator are
illustrated in Figure 1. Using the fine-grained annotation of
the synthetic data enables us to systematically analyze dif-
ferent DCNN architectures on a common ground at the task
of face recognition in the next section. Subsequently, we
study how the generalization performance is affected when
large-scale synthetic data is used for pre-training in Section
4.
3. Analyzing the Damage of Dataset Bias
The fine-grained control over the image variation in the
training and test data enables us to decompose the total
recognition rate (TRR) as a function along the axis of nui-
sance transformations. With this tool at hand, we study how
biases in the training data, in particular missing viewpoints
of a face, affect the generalization of DCNNs to unseen data
at test time.
3.1. Experimental Setup
Figure 2 schematically illustrates our experimental
setup. We generate synthetic images of different facial iden-
tities and transform them along the axes of the nuisance
transformations that we want to study (Figure 2 (I)). In
this work we focus on studying the effects of biases in the
face pose only. We simulate strong background variations,
which are common in real world data, by sampling random
textures from our empirical background model. All other
nuisance parameters are fixed. We illustrate samples of the
2

Figure 2: Experimental setup for our empirical analysis of the effect of biased training data on the generalization ability of
different DCNN architectures. (I) We generate synthetic identities with a 3D Morphable Face Model and render them in
different face poses. We simulate background variation by overlaying the faces on different textures. (II) We bias the training
data by removing certain viewpoints from the training set. (III) We train common DCNN architectures on the biased training
data. (IV) The annotation of the test data makes possible to analyze the recognition rate as a function of the face pose. It
provides fine-grained information about the generalization ability of the different DCNN architectures.
face image generator with the nuisance transformations that
we consider in our experiments in Figure 2. After splitting
the synthetic data into a training and test set we bias the
training data e.g. by removing certain face poses (Figure 2
(II)). Subsequently, we train different DCNN architectures
on the biased training data (Figure 2 (III)) and evaluate
how well the DCNNs generalize to the unbiased test data.
The fully parametric nature of the synthetic data, allows us
to evaluate the recognition rate as a function of the biased
nuisance transformation (Figure 2 (IV )).
In our experiments, we focus on comparing DCNNs with
a significantly diverging performance at face recognition
(AlexNet and VGG-16), as our methodology makes pos-
sible to study why exactly one model performs better than
the other. We test these networks at the task of face classifi-
cation. Thus, the task is to recognize a face from an image,
for which the identity is known at training time. Another
common way of performing face recognition is to use the
neural representation of the penultimate layer and to per-
form recognition via nearest neighbor in this feature space
[19]. However, we focus on diagnosing the performance of
DCNNs on the task that they were explicitly optimized on.
Parameter Settings. The size of the images is set to
227 × 227 pixels. We train the DCNNs with stochastic gra-
dient descent (SGD) and backpropagation with the Caffe
deep learning framework [12] via the Nvidia DIGITS train-
ing system. Every DCNN is trained from scratch for 30
epochs with a base learning rate of l = 0.001 which is mul-
tiplied every 10 epochs by γ = 0.1. We use L
2
regulariza-
tion with a weight regularization parameter of λ =
l
100
. If
not stated otherwise, the data is uniformly sampled across
the pose and illumination axes in the specified ranges. The
training data consists of 30 different identities, which we
obtain by randomly sampling the shape and appearance pa-
rameter of the 3DMM. The images in the test set always
reflect an unbiased sampling of the nuisance transformation
that we want to study. For the yaw pose, we sample the pa-
rameter space at intervals of
π
32
radian and for the direction
of light at
π
16
radian. Each face image is overlayed on 50
different background textures in the training as well as in
the test set.
3.2. Common bias over all facial identities
In this section, we limit the range of nuisance transfor-
mations in the training data and analyze if DCNNs can gen-
eralize to the unobserved nuisance transformations. We ap-
ply the same bias to all identities in the training set (see
example in Figure 5a).
EXP-1: Bias in the range of the yaw pose. In the fol-
lowing experiments, we limit the range of the yaw pose in
the training data. The light direction is fixed to be frontal.
Figure 3a illustrates the recognition performance as a func-
tion of the yaw pose, when faces in the training set are
restricted to a yaw pose range of [45
, 45
]. Both DC-
NNs achieve high recognition rates for the observed yaw
poses. However, the recognition performance drops signif-
icantly when faces are outside of the observed pose range.
The same generalization pattern can be observed when re-
stricting the faces at training time to a yaw pose range of
[90
, 0
] (Figure 3b). In both experiments, the VGG-16
network achieves higher overall recognition rates, because
it generalizes better to larger unseen yaw poses.
EXP-2: Sparse sampling of the yaw pose. In Fig-
ure 4 we illustrate the effect of sampling the training data
more sparsely along the axis of the yaw pose. We first bias
the training set to yaw poses of 45
and 45
. VGG-16
3

(a)
(b)
Figure 3: Effect of restricting the range of yaw poses
at training time. (a) Yaw pose restricted to the range
[45
, 45
]. AlexNet TRR: 77.6%; VGG-16 TRR:85.9%.
(b) Yaw pose restricted to the range [90
, 0
]. AlexNet
TRR: 81.8%; VGG-16 TRR:86.9%. In both setups the
DCNNs cannot recognize faces well from previously un-
observed views. VGG-16 achieves a higher TRR due to the
better generalization to large unseen yaw poses.
achieves a TRR of 70.5% at test time, whereas AlexNet
only achieves 51. 8%. Figure 4a illustrates how these TRRs
decompose as a function of the yaw pose. VGG-16 achieves
constantly higher recognition rates across all poses. Most
significantly, it is more than twice as good as AlexNet at
recognizing frontal faces. If we add frontal faces at train-
ing time (Figure 4b) VGG-16 achieves a TRR of 81.9%,
whereas AlexNet achieves 69.3%. Remarkably, VGG-
16 is now able to recognize all faces correctly across the
full range of [45
, 45
], whereas the recognition rates
of AlexNet still drop significantly for poses in between
[45
, 0
] and [0
, 45
]. Thus, the architecture of VGG-16
enables the DCNN to generalize well from only a few well
distributed example views to other unseen views, although
it has more parameters than AlexNet.
3.3. Disentanglement bias across facial identities
In the previous section, we have observed that DCNNs
generalize well as soon as a nuisance transformation is suf-
(a)
(b)
Figure 4: Effect of sparsely sampling the yaw pose of faces
at training time. (a)Yaw pose sampled at 45
and 45
(AlexNet TRR: 51.8%; VGG-16 TRR: 70.5%); VGG-16
generalizes much better to frontal poses than AlexNet. (b)
Yaw pose sampled at 45
, 0
and 45
(AlexNet TRR:
69.3%; VGG-16 TRR: 81.9%); VGG-16 generalizes per-
fectly across the full range [45
, 45
], whereas AlexNet
still cannot generalize in between the sampled poses.
ficiently represented for each identity in the training. When
this was not the case, the generalization performance de-
creased significantly. In this section, we study if DCNNs
are capable of generalizing if the nuisance transformation is
densely reflected in the training data across multiple identi-
ties. In particular, each face identity in the training is varied
in a certain interval of the yaw pose. However, across all
identities the full yaw pose variation is reflected. In Fig-
ure 5b we schematically illustrate how this setup compares
to the one from the previous Section 3.2 (Figure 5a). We
call this type of bias disentanglement bias, since if DCNNs
are capable of disentangling the image variation induced by
the yaw pose from the face identity, then they would be able
to generalize well.
EXP-3: Disentanglement of pose variation. In this ex-
periment, half of the identities in the training set vary in the
yaw pose range of [90
, 0
]. We refer to those identities
as the set Left-identities. The other half of the faces varies
in the range [0
, 90
] (Right-identities, Figure 5b). Figure 6
4

(a)
(b)
Figure 5: Different types of biases illustrated on the exam-
ple of yaw pose. Faces with red background are part of the
training set. (a) The same bias is applied to all the identities
in the training set. Thus, the pose variation space is only
partially observed. We use this setup in Section 3.2. (b)
For each half of the identities an alternating half of the pose
transformation is applied. Thus, the full pose transforma-
tion space is reflected in the data (Section 3.3).
illustrates the recognition performance of DCNNs trained
on the full training set. We evaluate the Left-identities and
Right-identities separately (Figure 6a & Figure 6b). We ob-
serve, that the DCNNs only slightly improve compared to
setup where the yaw pose range is restricted to [90
, 0
]
for all identities (dotted curves). Thus, both DCNNs cannot
benefit from the additional information in the training set.
We conclude that this phenomenon occurs because they are
not able to disentangle the image variation induced by the
pose variation and the identity change.
3.4. Discussion - Analysis with Synthetic Data
Our experiments in this section demonstrate that the full
control over the image variation makes possible to decom-
pose the recognition score as a function of nuisance trans-
formations. This enabled us to systematically analyze and
compare DCNNs at the task of face recognition. In our ex-
periements we observed the following phenomena:
Deeper networks generalize better to unseen head
poses. A major reason why VGG-16 outperforms AlexNet
at face recognition is that it can generalize better to faces in
previously unseen face poses (Section 3.2).
Deep networks cannot disentangle face pose from fa-
cial identity. A major limitation of the analyzed DCNN ar-
chitectures is that they have severe difficulties to generalize
when facial identities do not share the same pose variation
(Section 3.3). Thus, deep networks cannot disentangle well
the image variation caused by changes in the face pose from
the one induced by changes in the facial identity.
(a)
(b)
Figure 6: Testing disentanglement ability of DCNNs. Dot-
ted lines: DCNNs trained on a biased yaw pose (illustrated
in Figure 5a). Solid lines: Disentanglement setup (illus-
trated in Figure 5b). (a) Left-Identities with biased yaw pose
of [90
, 0
]. (b) Right-Identities with biased yaw pose of
[0
, 90
]. DCNNs cannot make use of the additional infor-
mation about the pose transformation which is present in
the data in the disentanglement setup.
4. Reducing the Damage of Dataset Bias
In this section, we study the impact on the generalization
performance when using large-scale synthetic data for pre-
training of deep face recognition systems.
4.1. Experimental Setup
Our face recognition experiments are based on the pub-
licly available OpenFace framework [1]. For face detection
and alignment we use a publicly available multi-task CNN
2
[26]. In case the face detection fails, we use the face boxes
as defined in the individual datasets
3
. We train the FaceNet-
NN4 architecture that was originally proposed by Schroff et
al. [21] with the vanilla setting, as provided in the OpenFace
framework. The aligned images are scaled to 96×96 pixels.
2
https://github.com/kpzhang93/mtcnn
face detection
alignment
3
For LFW and IJB-A these face boxes are provided in the dataset, for
Multi-PIE we use the annotations provided in [6].
5

Figures
Citations
More filters
Proceedings Article

A morphable model for the synthesis of 3D faces

Matthew Turk
Journal ArticleDOI

3D Morphable Face Models—Past, Present, and Future

TL;DR: A detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed is provided in this paper, where the challenges in building and applying these models, namely, capture, modeling, image formation, and image analysis, are still active research topics, and the state-of-the-art in each of these areas are reviewed.
Book

Synthetic Data for Deep Learning

TL;DR: The synthetic-to-real domain adaptation problem that inevitably arises in applications of synthetic data is discussed, including synthetic- to-real refinement with GAN-based models and domain adaptation at the feature/model level without explicit data transformations.
Journal ArticleDOI

Demographic Bias in Biometrics: A Survey on an Emerging Challenge

TL;DR: The main contributions of this article are an overview of the topic of algorithmic bias in the context of biometrics, a comprehensive survey of the existing literature on biometric bias estimation and mitigation, and a discussion of the pertinent technical and social matters.
Posted Content

3D Morphable Face Models -- Past, Present and Future

TL;DR: A detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed is provided, identifying unsolved challenges, proposing directions for future research, and highlighting the broad range of current and future applications.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI

FaceNet: A unified embedding for face recognition and clustering

TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.