scispace - formally typeset
Open AccessProceedings ArticleDOI

Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network

TLDR
This paper used a CNN to regress 3DMM shape and texture parameters directly from an input photo and achieved state-of-the-art results on the LFW, YTF and IJB-A benchmarks.
Abstract
The 3D shapes of faces are well known to be discriminative. Yet despite this, they are rarely used for face recognition and always under controlled viewing conditions. We claim that this is a symptom of a serious but often overlooked problem with existing methods for single view 3D face reconstruction: when applied in the wild, their 3D estimates are either unstable and change for different photos of the same subject or they are over-regularized and generic. In response, we describe a robust method for regressing discriminative 3D morphable face models (3DMM). We use a convolutional neural network (CNN) to regress 3DMM shape and texture parameters directly from an input photo. We overcome the shortage of training data required for this purpose by offering a method for generating huge numbers of labeled examples. The 3D estimates produced by our CNN surpass state of the art accuracy on the MICC data set. Coupled with a 3D-3D face matching pipeline, we show the first competitive face recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes as representations, rather than the opaque deep feature vectors used by other modern systems.

read more

Content maybe subject to copyright    Report

Regressing Robust and Discriminative 3D Morphable Models
with a very Deep Neural Network
Anh Tu
´
ˆ
an Tr
`
ˆ
an
1
, Tal Hassner
2,3
, Iacopo Masi
1
, and G
´
erard Medioni
1
1
Institute for Robotics and Intelligent Systems, USC, CA, USA
2
Information Sciences Institute, USC, CA, USA
3
The Open University of Israel, Israel
Abstract
The 3D shapes of faces are well known to be discrimi-
native. Yet despite this, they are rarely used for face recog-
nition and always under controlled viewing conditions. We
claim that this is a symptom of a serious but often over-
looked problem with existing methods for single view 3D
face reconstruction: when applied “in the wild”, their
3D estimates are either unstable and change for different
photos of the same subject or they are over-regularized
and generic. In response, we describe a robust method
for regressing discriminative 3D morphable face models
(3DMM). We use a convolutional neural network (CNN) to
regress 3DMM shape and texture parameters directly from
an input photo. We overcome the shortage of training data
required for this purpose by offering a method for generat-
ing huge numbers of labeled examples. The 3D estimates
produced by our CNN surpass state of the art accuracy on
the MICC data set. Coupled with a 3D-3D face matching
pipeline, we show the first competitive face recognition re-
sults on the LFW, YTF and IJB-A benchmarks using 3D face
shapes as representations, rather than the opaque deep fea-
ture vectors used by other modern systems.
1. Introduction
Single view 3D face shape estimation methods originally
proposed using their 3D shapes for recognition [4, 7, 28].
This makes sense because 3D shapes are discriminative
different people have different face shapes yet invariant to
lighting, texture changes and more. Indeed, previous work
showed that when available, high resolution 3D face scans
are excellent face representations which can even be used to
distinguish between the faces of identical twins [9].
Curiously, however, despite their widespread use, single
view face reconstruction methods are rarely employed by
modern face recognition systems. The highly successful 3D
Morphable Models (3DMM), for example, were only ever
used for recognition in limited, controlled viewing condi-
tions [4, 7, 11, 17, 28]. To our knowledge, there are no
Figure 1: Unconstrained, single view, 3D face shape recon-
struction. (a) Input images of the same subject with disrup-
tive poses and occlusions. (b-e) 3D reconstructions using
(b) single-view 3DMM [33], (c) flow based method [13]
(d) 3DDFA [47], (e) Our proposed approach. (b-c) Present
different 3D shapes for the same subject and (d) appears
generic, whereas our method (e) is robust, producing simi-
lar discriminative 3D shapes for different views.
reports of successfully using single view face shape estima-
tion 3DMM or any other method to recognize faces in
challenging unconstrained, in the wild settings.
An important reason why this may be so, is that these
methods can be unstable in unconstrained viewing condi-
tions. We later verify this quantitatively but it can also
be seen in Fig. 1 which presents 3D shapes estimated
from three unconstrained photos by three different meth-
ods (Fig. 1 (b-d)). Clearly, though the same subject appears
in all photos, shapes produced by the same method are ei-
ther very different (b,c) or highly regularized and generic
(d). It is therefore unsurprising that these shapes are poor
representations for recognition. It also explains why some
recently proposed using coarse, simple 3D shape approxi-
mations only as proxies when rendering faces to new views
rather than as face representations [13, 15, 25, 26, 39].
Contrary to previous work, we show that robust and dis-
1
5163

criminative 3D face shapes can, in fact, be estimated from
single, unconstrained images (Fig. 1 (e)). We propose esti-
mating 3D facial shapes using a very deep convolutional
neural network (CNN) to regress 3DMM shape and tex-
ture parameters directly from single face photos. We iden-
tify shortage of labeled training data as an obstacle to us-
ing data-hungry CNNs for this purpose. We address this
problem with a novel means for generating a huge labeled
training set of unconstrained faces and their 3DMM repre-
sentations. Coupled with additional technical novelties, we
obtain a method which is fast, robust and accurate.
The accuracy of our estimated shapes is verified on the
MICC data set [1] and quantitatively shown to surpass the
accuracy of other 3D reconstruction methods. We further
show that our estimated shapes are robust and discrimina-
tive by presenting face recognition results on the Labeled
Faces in the Wild (LFW) [18], YouTube Faces (YTF) [42]
and IJB-A [23] benchmarks. To our knowledge, this is the
first time single image 3D face shapes are successfully used
to represent faces from modern, unconstrained face recog-
nition benchmarks. Finally, to promote reproduction of our
results, we publicly release our code and models.
1
.
2. Related work
Over the years, many attempts were made to estimate
the 3D surface of a face appearing in a single view. Be-
fore listing them, it is important to mention recent multi
image methods which use image sets for reconstruction
(e.g., [24, 30, 34, 35, 38]). Although these methods pro-
duce accurate 3D reconstructions, they require many im-
ages from multiple sources to produce a single 3D face
shape whereas we reconstruct faces from single images.
Methods for single view 3D face reconstructions can
broadly be categorized into the following types.
Statistical shape representations, such as the widely pop-
ular 3DMM [5, 6, 11, 28, 32, 40, 45], use many aligned
3D face shapes to learn a distribution of 3D faces, repre-
sented as a high dimensional subspace. Each point on this
subspace is a parameter vector representing facial geome-
try and sometimes expression and texture. Reconstruction
is performed by searching for a point on this subspace that
represents a face similar to the one in the input image. These
methods do not attempt to produce discriminative facial ge-
ometries and indeed, as mentioned earlier, were only used
for face recognition under controlled settings.
The very recent method of [31] also uses a CNN to
regress 3DMM parameters for face photos. They too rec-
ognize absence of training data as a major concern. Con-
trary to us, they propose synthesizing training faces with
known geometry by sampling from the 3DMM distribution.
1
Please see www.openu.ac.il/home/hassner/projects/CNN3DMM for
updates.
This approach produces synthetic looking photos which can
easily cause overfitting problems when training large net-
works [26]. They were therefore able to train only a shal-
low residual network (seven layers compared to our 101)
and their estimated shapes were not shown to be more ro-
bust or discriminative than other methods.
Scene assumption methods. In order to obtain correct re-
constructions, some make strong assumptions on the scene
and the viewing conditions in the input image. Shape from
shading methods [21], for example, make assumptions on
the light sources, facial reflectance and more. Others in-
stead use facial symmetry [12]. The assumptions they and
others make often do not hold in practice, limiting the ap-
plication of these methods to controlled settings.
Example based methods, beginning from the work of [14]
and more recently [13, 39], modify the 3D surface of ex-
ample face shapes, fitting them to the face appearing in
input photo. These methods favor robustness to challeng-
ing viewing conditions over detailed reconstructions. They
were thus only used for face recognition to synthesize new
views from unseen poses.
Landmark fitting methods. Finally, some reconstruction
techniques fit a 3D surface to detected facial landmarks
rather than to face intensities directly. These include meth-
ods designed for videos (e.g., [19, 36]) and the CNN based
approaches of [20, 47]. These focus more on landmark de-
tection than 3D shape estimation and so do not attempt to
produce detailed and discriminative facial geometries.
3. Regressing 3DMM parameters with a CNN
We propose to regress 3DMM face shape parameters di-
rectly from an input photo using a very deep CNN. Osten-
sibly, CNNs are ideal for this task: After all, they are being
successfully applied to many related computer vision tasks.
But despite their success, apart from [31], we are unaware
of published reports of using CNNs for 3DMM parameter
regression.
We believe CNNs were not used here because this is a
regression problem where both the input photo and the out-
put 3DMM shape parameters are high dimensional. Solv-
ing such problems requires deep networks and these need
massive amounts of training data. Unfortunately, existing
unconstrained face sets with ground truth 3D shapes are far
too small for this purpose and obtaining large quantities of
3D face scans is labor intensive and impractical.
We therefore instead leverage three key observations.
1. As discussed in Sec. 2, accurate 3D estimates can be
obtained by using multiple images of the same face.
2. Unlike the limited availability of ground truth 3D face
shapes, there is certainly no shortage of challenging
face sets containing multiple photos per subject.
5164

3. Highly effective deep networks are available for the re-
lated task of extracting robust and discriminative face
representations for face recognition.
From (1), we have a reasonable way of producing 3D face
shape estimates for training, as surrogates for ground truth
shapes: by using a robust method for multi-view 3DMM
estimation. Getting multiple photos for enough subjects is
very easy (2). This abundance of examples further allows
balancing any reconstruction errors with potentially limit-
less subjects to train on. Finally, (3), a state of the art CNN
for face recognition may be fine-tuned to this problem. It
should already be tuned for unconstrained facial appear-
ance variations and trained to produce similar, discrimina-
tive outputs for different images of the same face.
3.1. Generating training data
To generate training data, we use a simple yet effective
multi image 3DMM estimation method, loosely based on
the one recently proposed by [30]. We run it on the uncon-
strained faces in the CASIA WebFace dataset [46]. These
multi image 3DMM estimates are then used as ground truth
3D face shapes when training our CNN 3DMM regressor.
Multi image 3DMM reconstruction is performed by first
estimating 3DMM parameters from the 500k single images
in CASIA. 3DMM estimates for images of the same subject
are then aggregated into a single 3DMM per subject (10k
subjects). This process is described next (see also, Fig. 2).
The 3DMM representation. Our system uses the popular
Basel Face Model (BFM) [28]. It is a publicly available
3DMM representation and one of the state of the art meth-
ods for single view 3D face modeling.
A face is modeled by decoupling its shape and texture
giving the following two independent generative models.
S
=
b
s + W
S
α , T
=
b
t + W
T
β. (1)
Here, the vectors
b
s and
b
t are the mean face shape and tex-
ture, computed over the aligned facial 3D scans in the Basel
Faces collection and represented by the concatenated 3D
coordinates of the 3D point clouds and the concatenated
RGB values of their textures. Matrices W
S
and W
T
are
the principle components, computed from the same aligned
facial scans. Finally, α and β are each 99D parameter vec-
tors, representing shape and texture respectively.
Single image 3DMM fitting. Fitting a 3DMM to each
training image is performed with a slightly modified version
of the two standard methods of [8] and [33]. Given an image
I, we estimate parameter vectors α
and β
which repre-
sent a face similar to the one in I (Eq. (1)). Unlike previous
work, we begin processing by applying the CLNF [22] state
of the art facial landmark detector. It provides K = 68 fa-
cial landmarks p
k
R
2
, k 1..K, and a confidence score
value w (which we use later on).
Landmarks are used to obtain an initial estimate for the
pose of the input face, in the reference 3DMM coordinate
system. Pose is represented by six degrees of freedom for
rotation, r = [r
α
, r
β
, r
γ
], and translation, t = [t
X
, t
Y
, t
Z
],
and estimated similar to [13]. 3DMM fitting then proceeds
by optimizing over the shape, texture, pose, illumination,
and color model following [8]. We found that CLNF makes
occasional localization errors. To introduce more stability,
our optimization also uses the edge-based cost of [33]. For
more details on this optimization, we refer to [8] and [33].
Once the optimization converges, we take the shape and
texture parameters, α
and β
, from the last iteration as our
single image 3DMM estimate for the input image I. Impor-
tantly, though this process is known to be computationally
expensive, it is applied in our pipeline only in preprocessing
and once for every training image. We later show our CNN
regressor to be much faster.
Multi image 3DMM fitting. Although a number of multi
image 3D face shape estimation methods were proposed in
the past, we found the following simple approach, inspired
by the very recent work of [30], to be particularly effective.
Specifically, we pool the shape and texture 3DMM pa-
rameters γ
i
= [α
i
, β
i
], i 1..N across all the N single
view estimates belonging to the same subject. Pooling is
performed by element wise weighted averaging of the N
3DMM vectors, resulting in a single 3DMM estimate for
that subject,
b
γ. That is,
b
γ =
N
X
i=1
w
i
· γ
i
and
N
X
i=1
w
i
= 1, (2)
where w
i
are normalized per-image confidences provided
by the CLNF facial landmark detector.
Note that unlike [30], we do not use a rank-list based on
distances of normals as a quality measure to pool 3DMM
parameters, instead taking the landmark detection confi-
dence measure for these weights. Following this process,
each CASIA subject is associated with a single, pooled
3DMM parameter vector
b
γ. For ease of notation, hence-
forth we will drop the hat when denoting pooled features,
assuming all training set 3DMM parameters were pooled.
3.2. Learning to regress pooled 3DMM
Following the process described in Sec. 3.1, each subject
in our data set is associated with a number of images and
a single, pooled 3DMM. We now use this data to learn a
function which, ideally, regresses the same pooled 3DMM
feature vector for different photos of the same subject.
To this end, we use a state of the art CNN, trained for
face recognition. We use the very deep ResNet architec-
ture [16] with 101 layers, recently trained for face recogni-
tion by [26]. We modify its last fully-connected layer to out-
put the 198D 3DMM feature vector γ. The network is then
5165

Figure 2: Overview of our process. (a) Large quantities of unconstrained photos are used to fit a single 3DMM for each
subject. (b) This is done by first fitting single image 3DMM shape and texture parameters to each image separately. Then, all
3DMM estimates for the same subject are pooled together for a single estimate per subject. (c) These pooled estimates are
used in place of expensive ground truth face scans to train a very deep CNN to regress 3DMM parameters directly.
fine-tuned on CASIA images using the pooled 3DMM esti-
mates as target values; different images of the same subject
presented to the CNN using the same target 3DMM shape.
We note that we also tried using the VGG-Face CNN of [27]
with 16 layers. Its results were similar to those obtained by
the ResNet architecture, though somewhat lower.
The asymmetric Euclidean loss. Training our network re-
quires some care when defining its loss function. 3DMM
vectors, by construction, belong to a multivariate Gaussian
distribution with its mean on the origin, representing the
mean face (Sec. 3.1). Consequently, during training, us-
ing the standard Euclidean loss to minimize distances be-
tween estimated and target 3DMM vectors will favor esti-
mates closer to the origin: these will have a higher proba-
bility of being closer to their target values than those further
away. In practice, we found that a network trained with the
Euclidean loss tends to output less detailed faces (Fig. 3).
To counter this bias towards a mean face shape, we in-
troduce an asymmetric Euclidean loss. It is designed to en-
courage the network to favor estimates further away from
the origin by decoupling under-estimation errors (errors on
the side of the 3DMM target closer to the origin) from over-
estimation errors (where the estimate is further out from the
origin than the target). It is defined by:
L(γ
p
, γ) = λ
1
· ||γ
+
γ
max
||
2
2
|
{z }
over-estimate
+λ
2
· ||γ
+
p
γ
max
||
2
2
|
{z }
under-estimate
, (3)
using the element-wise operators:
γ
+
.
= abs(γ)
.
= sign(γ) · γ; γ
+
p
.
= sign(γ) · γ
p
, (4)
γ
max
.
= max(γ
+
, γ
+
p
). (5)
Here, γ is the target pooled 3DMM value, γ
p
is the output,
regressed 3DMM and λ
1,2
control the trade-off between the
over and under estimation errors. When both equal 1, this
reduces to the traditional Euclidean loss. In practice, we set
λ
1
= 1, λ
2
= 3, thus changing the behavior of the train-
ing process, allowing it to escape under-fitting faster and
Figure 3: Effect of our loss function: (left) Input image,
(a) generic model, (b) regressed shape and texture with a
regular
2
loss and (c) our proposed asymmetric
2
loss.
encouraging the network to produce more detailed, realistic
3D face models (Fig. 3).
Network hyperparameters. Eq. (3) is solved using
Stochastic Gradient Descent (SGD) with a mini-batch of
size 144, momentum set to 0.9 and with regularization
over the weights provided by
2
with a weight decay of
0.0005. When performing back-propagation, we learn the
inner product layer (fc) after pool5 faster, setting the learn-
ing rate to 0.01, since it is trained from scratch for the re-
gression problem. Other network weights are updated with
a learning rate an order of magnitude lower. When the vali-
dation loss saturates, we decrease learning rates by an order
of magnitude, until the validation loss stops decreasing.
Discussion: Render-free 3DMM estimator. It is impor-
tant to note that by choosing to use a CNN to regress 3DMM
parameters, we obtain a function that is render-free. That
is, 3DMM parameters are regressed directly from the input
image, without an optimization process which renders the
face and compares it to the photo, as do existing methods
for 3DMM estimation (including our method for generating
training data in Sec. 3.1). By using a CNN, we therefore
hope to gain not only improved accuracy, but also much
faster 3DMM estimation speeds.
3.3. Parameter based 3D-3D recognition
The CNN we train in Sec. 3.2 represents a function
f : I 7→ γ
p
, giving us 3DMM parameters γ
p
for an input
image I. We later use our 3DMM estimates in face recogni-
5166

tion benchmarks, to test how robust and discriminative they
are. We next describe the method used for that purpose to
evaluate the similarity of two face shapes and textures to
determine if they represent the same subject.
3D-3D recognition with a single image. We perform
face recognition using the 3DMM parameters regressed by
our network: By using the 3DMM parameters γ
p
as face
descriptors. Because different benchmarks often exhibit
specific appearance biases, we apply Principal Component
Analysis (PCA), learned from the training splits of the test
benchmark, to adapt our estimated parameter vectors to
the benchmark. Signed, element wise square rooting of
these vectors is then used to further improve representation
power [29]. Finally, the similarity of two faces, s(γ
p1
, γ
p2
),
is evaluated by computing their cosine score:
s(γ
1
, γ
2
) =
γ
p1
· γ
T
p2
||γ
p1
|| · ||γ
p2
||
. (6)
3D-3D recognition with multiple-images. In some scenar-
ios, a subject is represented by a set of images, rather than
just one. This is the case in the YTF benchmark [42] where
videos are used, each containing multiple frames, and in the
recent IJB-A [23], which uses templates containing hetero-
geneous visual data (images, videos and possibly more).
We use the same pipeline for single images also for im-
age sets. Here, however, 3DMM parameters for differ-
ent images or frames are first pooled using Eq. (2). Un-
like the process applied in Sec. 3.1, all images here have
equal weights, as we do not run landmark detection prior to
3DMM fitting with our CNN (see below). When using tem-
plates with both videos and images, following [26], we first
pool the 3DMM estimates for frames in each video sepa-
rately, obtaining one 3DMM per video. We then pool these
3DMMs with those of other images in the same template.
Face alignment. Facial landmark detection and face align-
ment are known to improve recognition accuracy (e.g., [43,
15]). In fact, the recent, related work of [17] manually as-
signed landmarks before using their 3DMM fitting method
for recognition on controlled images. We, however, did not
align faces beyond using the bounding boxes provided in
their data sets. We found our method robust to misalign-
ments and so spared the runtime this required.
4. Experimental results
We test our proposed method, comparing the accuracy of
its estimated 3D shapes, its speed and its ability to represent
faces for recognition with existing methods. Importantly,
we are unaware of any previous work on single view 3D face
shape estimation which reported as many quantitative tests
as we do, in terms of the number of benchmarks used, the
number of baseline methods compared with and the level of
difficulty of the photos used in these tests.
Method 3DRMSE RMSE log
10
×10
4
Rel×10
4
Sec.
Generic 1.88±.52 3.48±.76 28±7 65±16
3DMM [33] 1.75±.42 3.64±.94 29±8 68±18 120
Flow-based [13] 1.83±.39 3.29±.70 27±6 62±14 13.3
Us 1.57±.33 3.18±.77 26±6 59±14 .088
Generic+pool 1.88±.52 3.48±.76 28±7 65±16
3DMM [33]+pool
1.60±.46 3.31±.98 27±9 62±20 120
3DDFA [47]+pool 1.83±.58 3.45±.85 28±7 65±17 .146
[19] 1.84±.32 3.73±.62 30±5 68±11 .372
[2]+pool 1.84±.58 3.45±.85 28±6 65±13 52.3
Us +pool 1.53±.29 3.14±.70 25±6 58±13 .088
Table 1: 3D estimation accuracy and per-image speed on
the MICC dataset. Top are single view methods, bottom are
multi frame. See text for details on measures. 3DRMSE in
real-world mm.
Denotes the method used to produce the
training data in Sec. 3.1. Lower values are better.
Figure 4: Qualitative comparison of surface errors, visual-
ized as heat maps with real world mm errors on MICC face
videos and their ground truth 3D shapes. Left to right, top
to bottom: frame from input; 3D ground-truth; generic face;
estimates for flow-based method [13], Huber et al. [19],
3DDFA [47], Bas et al. [2], 3DMM +pool [33], us +pool.
Specifically, we evaluate the accuracy of our estimated
3D shapes using videos and photos and their correspond-
ing scanned, ground truth 3D shapes from the MICC Flo-
rence Faces dataset [1] (Sec. 4.1). To test how discrimina-
tive and robust our shapes are when estimated from uncon-
strained images, we perform single image and multi image
face recognition using the LFW [18], YTF [42] and the new
IARPA JANUS Benchmark-A (IJB-A) [23] (Sec. 4.3). Fi-
nally we also provide qualitative results in Sec. 4.4.
As baseline 3D reconstruction methods we used stan-
dard 3DMM fitting [33], implemented by us, the flow-based
method of [13], the edge based method of [2], the multi res-
olution, multi-view approach of [19] and 3DDFA of [47],
were all tested with their authors’ implementations.
4.1. 3D shape reconstruction accuracy
The MICC dataset [1] contains challenging face videos
of 53 subjects. The videos span the range of controlled to
challenging unconstrained outdoor settings. For each of the
subjects in these videos, the data set contains also a ground-
truth 3D model acquired using a structured-light scanning
system with high precision. This allows comparing our 3D
5167

Citations
More filters
Proceedings Article

A morphable model for the synthesis of 3D faces

Matthew Turk
Proceedings ArticleDOI

RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild

TL;DR: A novel single-shot, multi-level face localisation method, named RetinaFace, which unifies face box prediction, 2D facial landmark localisation and 3D vertices regression under one common target: point regression on the image plane.
Journal ArticleDOI

Deep video portraits

TL;DR: In this paper, a generative neural network with a novel space-time architecture is proposed to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor.
Book ChapterDOI

Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network

TL;DR: Yadira et al. as mentioned in this paper proposed a simple convolutional neural network to regress the 3D shape of a complete face from a single 2D image, which can reconstruct full facial geometry along with semantic meaning.
Proceedings ArticleDOI

Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression

TL;DR: In this article, the authors propose a simple CNN architecture that performs direct regression of a volumetric representation of the 3D facial geometry from a single 2D image, and demonstrate how the related task of facial landmark localization can be incorporated into the proposed framework and help improve reconstruction quality.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI

A method for registration of 3-D shapes

TL;DR: In this paper, the authors describe a general-purpose representation-independent method for the accurate and computationally efficient registration of 3D shapes including free-form curves and surfaces, based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point.
Proceedings ArticleDOI

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

TL;DR: This work revisits both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network.

Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments

TL;DR: The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life, and exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background.
Proceedings ArticleDOI

Face recognition using eigenfaces

TL;DR: An approach to the detection and identification of human faces is presented, and a working, near-real-time face recognition system which tracks a subject's head and then recognizes the person by comparing characteristics of the face to those of known individuals is described.
Related Papers (5)