scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A morphable model for the synthesis of 3D faces

01 Jul 1999-pp 187-194
TL;DR: A new technique for modeling textured 3D faces by transforming the shape and texture of the examples into a vector space representation, which regulates the naturalness of modeled faces avoiding faces with an “unlikely” appearance.
Abstract: In this paper, a new technique for modeling textured 3D faces is introduced. 3D faces can either be generated automatically from one or more photographs, or modeled directly through an intuitive user interface. Users are assisted in two key problems of computer aided face modeling. First, new face images or new 3D face models can be registered automatically by computing dense one-to-one correspondence to an internal face model. Second, the approach regulates the naturalness of modeled faces avoiding faces with an “unlikely” appearance. Starting from an example set of 3D face models, we derive a morphable face model by transforming the shape and texture of the examples into a vector space representation. New faces and expressions can be modeled by forming linear combinations of the prototypes. Shape and texture constraints derived from the statistics of our example faces are used to guide manual modeling or automated matching algorithms. We show 3D face reconstructions from single images and their applications for photo-realistic image manipulations. We also demonstrate face manipulations according to complex parameters such as gender, fullness of a face or its distinctiveness.

Summary (3 min read)

1 Introduction

  • Computer aided modeling of human faces still requires a great deal of expertise and manual control to avoid unrealistic, non-face-like results.
  • A limited number of labeled feature points marked in one face, e.g., the tip of the nose, the eye corner and less prominent points on the cheek, must be located precisely in another face.
  • For this, human knowledge is even more critical.

1.2 Organization of the paper

  • Along with a 3D reconstruction, the algorithm can compute correspondence, based on the morphable model.
  • In Section 5, the authors introduce an iterative method for building a morphable model automatically from a raw data set of 3D face scans when no correspondences between the exemplar faces are available.

2 Database

  • The laser scans provide head structure data in a cylindrical representation, with radiir(h; ) of surface points sampled at 512 equally-spaced angles , and at 512 equally spaced vertical stepsh.
  • Additionally, the RGB-color values R(h; ), G(h; ),andB(h; ), were recorded in the same spatial resolution and were stored in a texture map with 8 bit per channel.
  • All faces were without makeup, accessories, and facial hair.
  • The subjects were scanned wearing bathing caps, that were removed digitally.

3 Morphable 3D Face Model

  • Morphing between faces requires full correspondence between all of the faces.
  • The algorithm for computing correspondence will be described in Section 5.
  • A morphable face model was then constructed using a data set ofm exemplar faces, each represented by its shape-vectorSi and texturevectorTi.
  • The deviation of a prototype from the average is added (+) or subtracted (-) from the average.

3.1 Facial attributes

  • Shape and texture coefficients i and i in their morphable face model do not correspond to the facial attributes used in human language.
  • While some facial attributes can easily be related to biophysical measurements [13, 10], such as the width of the mouth, others such as facial femininity or being more or less bony can hardly be described by numbers.
  • The authors describe a method for mapping facial attributes, defined by a hand-labeled set of example faces, to the parameter space of their morphable model.
  • At each position in face space (that is for any possible face), the authors define shape and texture vectors that, when added to or subtracted from a face, will manipulate a specific attribute while keeping all other attributes as constant as possible.
  • A different kind of facial attribute is its “distinctiveness”, which is commonly manipulated in caricatures.

4 Matching a morphable model to images

  • A crucial element of their framework is an algorithm for automatically matching the morphable face model to one or more images.
  • For high resolution 3D meshes, variations inImodel across each trianglek 2 f1; :::; ntg are small, soEI may be approximated by EI ntX k=1 ak kIinput( px;k; py;k) Imodel;kk 2; whereak is the image area covered by trianglek.
  • It then determinesak, and detects hidden surfaces and cast shadows in a two-pass z-buffer technique.
  • With parameters j fixed, coefficients j and j are optimized independently for each segment.

5 Building a morphable model

  • The authors describe how to build the morphable model from a set of unregistered 3D prototypes, and to add a new face to the existing morphable model, increasing its dimensionality.
  • The key problem is to compute a dense point-to-point correspondence between the vertices of the faces.
  • Since the method described in Section 4.1 finds the best match of a given face only within the range of the morphable model, it cannot add new dimensions to the vector space of faces.
  • To determine residual deviations between a novel face and the best match within the model, as well as to set unregistered prototypes in correspondence, the authors use an optic flow al- gorithm that computes correspondence between two faces without the need of a morphable model [35].

5.1 3D Correspondence using Optic Flow

  • Initially designed to find corresponding points in grey-level images I(x; y), a gradient-based optic flow algorithm [2] is modified to establish correspondence between a pair of 3D scansI(h; ) (Equation 8), taking into account color and radius values simultaneously [35].
  • The authors therefore perform a smooth interpolation based on simulated relaxation of a system of flow vectors that are coupled with their neighbors.
  • The quadratic coupling potential is equal for all flow vectors.
  • On high-contrast areas, components of flow vectors orthogonal to edges are bound to the result of the previous optic flow computation.
  • Given a definition of shape and texture vectorsSref andTref for the reference face,S andT for each face in the database can be obtained by means of the point-topoint correspondence provided by( h(h; ); (h; )).

5.2 Bootstrapping the model

  • The texture estimate can be improved by additional texture extraction (4).
  • Starting from an arbitrary face as the temporary reference, preliminary correspondence between all other faces and this reference is computed using the optic flow algorithm.
  • Their average serves as a new reference face.
  • The first morphable model is then formed by the most significant components as provided by a standard PCA decomposition.
  • The current morphable model is now matched to each of the 3D faces according to the method described in Section 4.1.

6 Results

  • The authors built a morphable face model by automatically establishing correspondence between all of their 200 exemplar faces.
  • The authors interactive face modeling system enables human users to create new characters and to modify facial attributes by varying the model coefficients.
  • The whole matching procedure was performed in105 iterations.
  • To human observers who also know only the input image, the results obtained with their method look correct.
  • The authors therefore apply a different method for transferring all details of the painting to novel views.

8 Acknowledgment

  • The authors thank Michael Langer, Alice O’Toole, Tomaso Poggio, Heinrich Bülthoff and Wolfgang Straßer for reading the manuscript and for many insightful and constructive comments.
  • In particular, the authors thank Marney Smyth and Alice O’Toole for their perseverance in helping us to obtain the following.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A Morphable Model For The Synthesis Of 3D Faces
Volker Blanz Thomas Vetter
Max-Planck-Institut f¨ur biologische Kybernetik,
ubingen, Germany
Abstract
In this paper, a new technique for modeling textured 3D faces is
introduced. 3D faces can either be generated automatically from
one or more photographs, or modeled directly through an intuitive
user interface. Users are assisted in two key problems of computer
aided face modeling. First, new face images or new 3D face mod-
els can be registered automatically by computing dense one-to-one
correspondence to an internal face model. Second, the approach
regulates the naturalness of modeled faces avoiding faces with an
“unlikely” appearance.
Starting from an example set of 3D face models, we derive a
morphable face model by transforming the shape and texture of the
examples into a vector space representation. New faces and expres-
sions can be modeled by forming linear combinations of the proto-
types. Shape and texture constraints derived from the statistics of
our example faces are used to guide manual modeling or automated
matching algorithms.
We show 3D face reconstructions from single images and their
applications for photo-realistic image manipulations. We also
demonstrate face manipulations according to complex parameters
such as gender, fullness of a face or its distinctiveness.
Keywords: facial modeling, registration, photogrammetry, mor-
phing, facial animation, computer vision
1 Introduction
Computer aided modeling of human faces still requires a great deal
of expertise and manual control to avoid unrealistic, non-face-like
results. Most limitations of automated techniques for face synthe-
sis, face animation or for general changes in the appearance of an
individual face can be described either as the problem of finding
corresponding feature locations in different faces or as the problem
of separating realistic faces from faces that could never appear in
the real world. The correspondence problem is crucial for all mor-
phing techniques, both for the application of motion-capture data
to pictures or 3D face models, and for most 3D face reconstruction
techniques from images. A limited number of labeled feature points
marked in one face, e.g., the tip of the nose, the eye corner and less
prominent points on the cheek, must be located precisely in another
face. The number of manually labeled feature points varies from
MPI f¨ur biol. Kybernetik, Spemannstr. 38, 72076 T¨ubingen, Germany.
E-mail:
f
volker.blanz, thomas.vetter
g
@tuebingen.mpg.de
Modeler
Morphable
Face Model
Face
Analyzer
3D Database
2D Input 3D Output
Figure 1: Derived from a dataset of prototypical 3D scans of faces,
the morphable face model contributes to two main steps in face
manipulation: (1) deriving a 3D face model from a novel image,
and (2) modifying shape and texture in a natural way.
application to application, but usually ranges from 50 to 300.
Only a correct alignment of all these points allows acceptable in-
termediate morphs, a convincing mapping of motion data from the
reference to a new model, or the adaptation of a 3D face model to
2D images for ‘video cloning’. Human knowledge and experience
is necessary to compensate for the variations between individual
faces and to guarantee a valid location assignment in the different
faces. At present, automated matching techniques can be utilized
only for very prominent feature points such as the corners of eyes
and mouth.
A second type of problem in face modeling is the separation of
natural faces from non faces. For this, human knowledge is even
more critical. Many applications involve the design of completely
new natural looking faces that can occur in the real world but which
have no “real” counterpart. Others require the manipulation of an
existing face according to changes in age, body weight or simply to
emphasize the characteristics of the face. Such tasks usually require
time-consuming manual work combined with the skills of an artist.
In this paper, we present a parametric face modeling technique
that assists in both problems. First, arbitrary human faces can be
created simultaneously controlling the likelihood of the generated
faces. Second, the system is able to compute correspondence be-
tween new faces. Exploiting the statistics of a large dataset of 3D
face scans (geometric and textural data,
C yberware
TM
) we built
a morphable face model and recover domain knowledge about face
variations by applying pattern classification methods. The mor-
phable face model is a multidimensional 3D morphing function that
is based on the linear combination of a large number of 3D face
scans. Computing the average face and the main modes of vari-
ation in our dataset, a probability distribution is imposed on the
morphing function to avoid unlikely faces. We also derive paramet-
ric descriptions of face attributes such as gender, distinctiveness,
“hooked” noses or the weight of a person, by evaluating the distri-
bution of exemplar faces for each attribute within our face space.
Having constructed a parametric face model that is able to gener-
ate almost any face, the correspondence problem turns into a mathe-
matical optimization problem. New faces, images or 3D face scans,
can be registered by minimizing the difference between the new
face and its reconstruction by the face model function. We devel-

oped an algorithm that adjusts the model parameters automatically
for an optimal reconstruction of the target, requiring only a mini-
mum of manual initialization. The output of the matching proce-
dure is a high quality 3D face model that is in full correspondence
with our morphable face model. Consequently all face manipula-
tions parameterized in our model function can be mapped to the
target face. The prior knowledge about the shape and texture of
faces in general that is captured in our model function is sufficient
to make reasonable estimates of the full 3D shape and texture of a
face even when only a single picture is available. When applying
the method to several images of a person, the reconstructions reach
almost the quality of laser scans.
1.1 Previous and related work
Modeling human faces has challenged researchers in computer
graphics since its beginning. Since the pioneering work of Parke
[25, 26], various techniques have been reported for modeling the
geometry of faces [10, 11, 22, 34, 21] and for animating them
[28, 14, 19, 32, 22, 38, 29]. A detailed overview can be found in
the book of Parke and Waters [24].
The key part of our approach is a generalized model of human
faces. Similar to the approach of DeCarlos et al. [10], we restrict
the range of allowable faces according to constraints derived from
prototypical human faces. However, instead of using a limited set
of measurements and proportions between a set of facial landmarks,
we directly use the densely sampled geometry of the exemplar faces
obtained by laser scanning (
C yber w ar e
TM
). The dense model-
ing of facial geometry (several thousand vertices per face) leads
directly to a triangulation of the surface. Consequently, there is no
need for variational surface interpolation techniques [10, 23, 33].
We also added a model of texture variations between faces. The
morphable 3D face model is a consequent extension of the interpo-
lation technique between face geometries, as introduced by Parke
[26]. Computing correspondence between individual 3D face data
automatically, we are able to increase the number of vertices used
in the face representation from a few hundreds to tens of thousands.
Moreover, we are able to use a higher number of faces, and thus
to interpolate between hundreds of basis’ faces rather than just a
few. The goal of such an extended morphable face model is to rep-
resent any face as a linear combination of a limited basis set of face
prototypes. Representing the face of an arbitrary person as a linear
combination (morph) of “prototype” faces was first formulated for
image compression in telecommunications [8]. Image-based linear
2D face models that exploit large data sets of prototype faces were
developed for face recognition and image coding [4, 18, 37].
Different approaches have been taken to automate the match-
ing step necessary for building up morphable models. One class
of techniques is based on optic flow algorithms [5, 4] and another
on an active model matching strategy [12, 16]. Combinations of
both techniques have been applied to the problem of image match-
ing [36]. In this paper we extend this approach to the problem of
matching 3D faces.
The correspondence problem between different three-
dimensional face data has been addressed previously by Lee
et al.[20]. Their shape-matching algorithm differs significantly
from our approach in several respects. First, we compute the
correspondence in high resolution, considering shape and texture
data simultaneously. Second, instead of using a physical tissue
model to constrain the range of allowed mesh deformations, we use
the statistics of our example faces to keep deformations plausible.
Third, we do not rely on routines that are specifically designed to
detect the features exclusively found in faces, e.g., eyes, nose.
Our general matching strategy can be used not only to adapt the
morphable model to a 3D face scan, but also to 2D images of faces.
Unlike a previous approach [35], the morphable 3D face model is
now directly matched to images, avoiding the detour of generat-
ing intermediate 2D morphable image models. As a consequence,
head orientation, illumination conditions and other parameters can
be free variables subject to optimization. It is sufficient to use rough
estimates of their values as a starting point of the automated match-
ing procedure.
Most techniques for ‘face cloning’, the reconstruction of a 3D
face model from one or more images, still rely on manual assistance
for matching a deformable 3D face model to the images [26, 1, 30].
The approach of Pighin et al. [28] demonstrates the high realism
that can be achieved for the synthesis of faces and facial expressions
from photographs where several images of a face are matched to a
single 3D face model. Our automated matching procedure could be
used to replace the manual initialization step, where several corre-
sponding features have to be labeled in the presented images.
For the animation of faces, a variety of methods have been pro-
posed. For a complete overview we again refer to the book of
Parke and Waters [24]. The techniques can be roughly separated
in those that rely on physical modeling of facial muscles [38, 17],
and in those applying previously captured facial expressions to a
face [25, 3]. These performance based animation techniques com-
pute the correspondence between the different facial expressions of
a person by tracking markers glued to the face from image to im-
age. To obtain photo-realistic face animations, up to 182 markers
are used [14]. Working directly on faces without markers, our au-
tomated approach extends this number to its limit. It matches the
full number of vertices available in the face model to images. The
resulting dense correspondence elds can even capture changes in
wrinkles and map these from one face to another.
1.2 Organization of the paper
We start with a description of the database of 3D face scans from
which our morphable model is built.
In Section 3, we introduce the concept of the morphable face
model, assuming a set of 3D face scans that are in full correspon-
dence. Exploiting the statistics of a dataset, we derive a parametric
description of faces, as well as the range of plausible faces. Ad-
ditionally, we define facial attributes, such as gender or fullness of
faces, in the parameter space of the model.
In Section 4, we describe an algorithm for matching our exible
model to novel images or 3D scans of faces. Along with a 3D re-
construction, the algorithm can compute correspondence, based on
the morphable model.
In Section 5, we introduce an iterativemethod for building a mor-
phable model automatically from a raw data set of 3D face scans
when no correspondences between the exemplar faces are available.
2 Database
Laser scans (
C yber w ar e
TM
) of 200 heads of young adults (100
male and 100 female) were used. The laser scans provide head
structure data in a cylindrical representation, with radii
r
(
h;
)
of
surface points sampled at 512 equally-spaced angles
, and at 512
equally spaced vertical steps
h
. Additionally, the RGB-color values
R
(
h;
)
,
G
(
h;
)
,and
B
(
h;
)
, were recorded in the same spatial
resolution and were stored in a texture map with 8 bit per channel.
All faces were without makeup, accessories, and facial hair. The
subjects were scanned wearing bathing caps, that were removed
digitally. Additional automatic pre-processing of the scans, which
for most heads required no human interaction, consisted of a ver-
tical cut behind the ears, a horizontal cut to remove the shoulders,
and a normalization routine that brought each face to a standard
orientation and position in space. The resultant faces were repre-
sented by approximately 70,000 vertices and the same number of
color values.

3 Morphable 3D Face Model
The morphable model is based on a data set of 3D faces. Morphing
between faces requires full correspondence between all of the faces.
In this section, we will assume that all exemplar faces are in full
correspondence. The algorithm for computing correspondence will
be described in Section 5.
We represent the geometry of a face with a shape-vector
S
=
(
X
1
;Y
1
;Z
1
;X
2
; :::::; Y
n
;Z
n
)
T
2<
3
n
, that contains the
X; Y; Z
-
coordinates of its
n
vertices. For simplicity, we assume that the
number of valid texture values in the texture map is equal to the
number of vertices. We therefore represent the texture of a face by
a texture-vector
T
=(
R
1
;G
1
;B
1
;R
2
; :::::; G
n
;B
n
)
T
2<
3
n
,that
contains the
R; G; B
color values of the
n
corresponding vertices.
A morphable face model was then constructed using a data set of
m
exemplar faces, each represented by its shape-vector
S
i
and texture-
vector
T
i
. Since we assume all faces in full correspondence (see
Section 5), new shapes
S
model
and new textures
T
model
can be
expressed in barycentric coordinates as a linear combination of the
shapes and textures of the
m
exemplar faces:
S
mod
=
m
P
i
=1
a
i
S
i
;
T
mod
=
m
P
i
=1
b
i
T
i
;
m
P
i
=1
a
i
=
m
P
i
=1
b
i
=1
:
We define the morphable model as the set of faces
(
S
mod
(
~a
)
,
T
mod
(
~
b
))
, parameterized by the coefficients
~a
= (
a
1
;a
2
:::a
m
)
T
and
~
b
=(
b
1
;b
2
:::b
m
)
T
.
1
Arbitrary new faces can be generated by
varying the parameters
~a
and
~
b
that control shape and texture.
For a useful face synthesis system, it is important to be able to
quantify the results in terms of their plausibility of being faces. We
therefore estimated the probability distribution for the coefficients
a
i
and
b
i
from our example set of faces. This distribution enables
us to control the likelihood of the coefficients
a
i
and
b
i
and conse-
quently regulates the likelihood of the appearance of the generated
faces.
We fit a multivariate normal distribution to our data set of 200
faces, based on the averages of shape
S
and texture
T
and the co-
variance matrices
C
S
and
C
T
computed over the shape and texture
differences
S
i
=
S
i
,
S
and
T
i
=
T
i
,
T
.
A common technique for data compression known as Principal
Component Analysis (PCA) [15, 31] performs a basis transforma-
tion to an orthogonal coordinate system formed by the eigenvectors
s
i
and
t
i
of the covariance matrices (in descending order according
to their eigenvalues)
2
:
S
model
=
S
+
m
,
1
X
i
=1
i
s
i
; T
model
=
T
+
m
,
1
X
i
=1
i
t
i
;
(1)
~;
~
2<
m
,
1
. The probability for coefficients
~
is given by
p
(
~
)
exp
[
,
1
2
m
,
1
X
i
=1
(
i
=
i
)
2
]
;
(2)
with
2
i
being the eigenvalues of the shape covariance matrix
C
S
.
The probability
p
(
~
)
is computed similarly.
Segmented morphable model: The morphable model de-
scribed in equation (1), has
m
,
1
degrees of freedom for tex-
ture and
m
,
1
for shape. The expressiveness of the model can
1
Standard morphing between two faces (
m
=2
) is obtained if the pa-
rameters
a
1
;b
1
are varied between
0
and
1
, setting
a
2
= 1
,
a
1
and
b
2
=1
,
b
1
.
2
Due to the subtracted average vectors
S
and
T
, the dimensions of
Span
f
S
i
g
and
Span
f
T
i
g
are at most
m
,
1
.
S ( +)
T ( −)
S (0 0 +)
T (+ + + +)
S (+ + +.−)
Average
Prototype
Segments
S (+ + + +)
T (+ + + +)
S (0 0 0 0)
T (0 0 0 0)
T (0 0 0 0)
S ( + + −)
T (0 0 0 0)
S ( + 0)
S (1/2 1/2 1/2 1/2)
T (1/2 1/2 1/2 1/2)
*
S (+ −)
T ( −)
T ( −)
S ( −)
#
S (+ 0 0)
T ( −)
T (+ + + +)
Figure 2: A single prototype adds a large variety of new faces to the
morphable model. The deviation of a prototype from the average is
added (+) or subtracted (-) from the average. A standard morph (*)
is located halfway between average and the prototype. Subtracting
the differences from the average yields an ’anti’-face (#). Adding
and subtracting deviations independently for shape (S) and texture
(T) on each of four segments produces a number of distinct faces.
be increased by dividing faces into independent subregions that are
morphed independently, for example into eyes, nose, mouth and a
surrounding region (see Figure 2). Since all faces are assumed to
be in correspondence, it is sufficient to define these regions on a
reference face. This segmentation is equivalent to subdividing the
vector space of faces into independent subspaces. A complete 3D
face is generated by computing linear combinations for each seg-
ment separately and blending them at the borders according to an
algorithm proposed for images by [7] .
3.1 Facial attributes
Shape and texture coefficients
i
and
i
in our morphable face
model do not correspond to the facial attributes used in human lan-
guage. While some facial attributes can easily be related to biophys-
ical measurements [13, 10], such as the width of the mouth, others
such as facial femininity or being more or less bony can hardly be
described by numbers. In this section, we describe a method for
mapping facial attributes, defined by a hand-labeled set of example
faces, to the parameter space of our morphable model. At each po-
sition in face space (that is for any possible face), we define shape
and texture vectors that, when added to or subtracted from a face,
will manipulate a specific attribute while keeping all other attributes
as constant as possible.
In a performance based technique [25], facial expressions can be
transferred by recording two scans of the same individual with dif-
ferent expressions, and adding the differences
S
=
S
expression
,
S
neutral
,
T
=
T
expression
,
T
neutral
, to a different individual
in a neutral expression.
Unlike facial expressions, attributes that are invariant for each in-
dividual are more difficult to isolate. The following method allows
us to model facial attributes such as gender, fullness of faces, dark-
ness of eyebrows, double chins, and hooked versus concave noses
(Figure 3). Based on a set of faces
(
S
i
;T
i
)
with manually assigned
labels
i
describing the markedness of the attribute, we compute

weighted sums
S
=
m
X
i
=1
i
(
S
i
,
S
)
;
T
=
m
X
i
=1
i
(
T
i
,
T
)
:
(3)
Multiples of
(
S;
T
)
can now be added to or subtracted from
any individual face. For binary attributes, such as gender, we assign
constant values
A
for all
m
A
faces in class
A
,and
B
6
=
A
for
all
m
B
faces in
B
. Affecting only the scaling of
S
and
T
,the
choice of
A
,
B
is arbitrary.
To justify this method, let
(
S; T
)
be the overall function de-
scribing the markedness of the attribute in a face
(
S; T
)
.Since
(
S; T
)
is not available per se for all
(
S; T
)
, the regression prob-
lem of estimating
(
S; T
)
from a sample set of labeled faces has
to be solved. Our technique assumes that
(
S; T
)
is a linear func-
tion. Consequently, in order to achieve a change
of the at-
tribute, there is only a single optimal direction
(
S;
T
)
for the
whole space of faces. It can be shown that Equation (3) defines
the direction with minimal variance-normalized length
k
S
k
2
M
=
h
S; C
,
1
S
S
i
,
k
T
k
2
M
=
h
T; C
,
1
T
T
i
.
A different kind of facial attribute is its “distinctiveness”, which
is commonly manipulated in caricatures. The automated produc-
tion of caricatures has been possible for many years [6]. This tech-
nique can easily be extended from 2D images to our morphable face
model. Individual faces are caricatured by increasing their distance
from the average face. In our representation, shape and texture co-
efficients
i
;
i
are simply multiplied by a constant factor.
ORIGINAL
CARICA
T
URE
MORE MALE
FEMALE
S
MILE
FROWN
HOOKED NO
S
E
T
Figure 3: Variation of facial attributes of a single face. The appear-
ance of an original face can be changed by adding or subtracting
shape and texture vectors specific to the attribute.
4 Matching a morphable model to images
A crucial element of our framework is an algorithm for automati-
cally matching the morphable face model to one or more images.
Providing an estimate of the face’s 3D structure (Figure 4), it closes
the gap between the specific manipulations described in Section 3.1,
and the type of data available in typical applications.
Coefficients of the 3D model are optimized along with a set of
rendering parameters such that they produce an image as close as
possible to the input image. In an analysis-by-synthesis loop, the
algorithm creates a texture mapped 3D face from the current model
parameters, renders an image, and updates the parameters accord-
ing to the residual difference. It starts with the average head and
with rendering parameters roughly estimated by the user.
Model Parameters: Facial shape and texture are defined
by coefficients
j
and
j
,
j
= 1
; :::; m
,
1
(Equation 1).
Rendering parameters
~
contain camera position (azimuth and
elevation), object scale, image plane rotation and translation,
intensity
i
r;amb
;i
g;amb
;i
b;amb
of ambient light, and intensity
Initializing
the
Morphable Model
rough interactive
alignment of
3D average head
Automated 3D Shape and Texture Reconstruction
Illumination Corrected Texture Extraction
Detail
Detail
2D Input
Figure 4: Processing steps for reconstructing 3D shape and texture
of a new face from a single image. After a rough manual alignment
of the average 3D head (top row), the automated matching proce-
dure ts the 3D morphable model to the image (center row). In the
right column, the model is rendered on top of the input image. De-
tails in texture can be improved by illumination-corrected texture
extraction from the input (bottom row).
i
r;dir
;i
g;dir
;i
b;dir
of directed light. In order to handle photographs
taken under a wide variety of conditions,
~
also includes color con-
trast as well as offset and gain in the red, green, and blue channel.
Other parameters, such as camera distance, light direction, and sur-
face shininess, remain fixed to the values estimated by the user.
From parameters
(
~;
~
; ~
)
, colored images
I
model
(
x; y
)=(
I
r;mod
(
x; y
)
;I
g;mod
(
x; y
)
;I
b;mod
(
x; y
))
T
(4)
are rendered using perspective projection and the Phong illumina-
tion model. The reconstructed image is supposed to be closest to
the input image in terms of Euclidean distance
E
I
=
P
x;y
k
I
input
(
x; y
)
,
I
model
(
x; y
)
k
2
:
Matching a 3D surface to a given image is an ill-posed problem.
Along with the desired solution, many non-face-like surfaces lead
to the same image. It is therefore essential to impose constraints
on the set of solutions. In our morphable model, shape and texture
vectors are restricted to the vector space spanned by the database.
Within the vector space of faces, solutions can be further re-
stricted by a tradeoff between matching quality and prior proba-
bilities, using
P
(
~
)
,
P
(
~
)
from Section 3 and an ad-hoc estimate
of
P
(
~
)
. In terms of Bayes decision theory, the problem is to find
the set of parameters
(
~;
~
; ~
)
with maximum posterior probabil-
ity, given an image
I
input
. While
~
,
~
, and rendering parame-
ters
~
completely determine the predicted image
I
model
, the ob-
served image
I
input
may vary due to noise. For Gaussian noise

with a standard deviation
N
, the likelihood to observe
I
input
is
p
(
I
input
j
~;
~
; ~
)
exp[
,
1
2
2
N
E
I
]
. Maximum posterior probabil-
ity is then achieved by minimizing the cost function
E
=
1
2
N
E
I
+
m
,
1
X
j
=1
2
j
2
S;j
+
m
,
1
X
j
=1
2
j
2
T;j
+
X
j
(
j
,
j
)
2
2
;j
(5)
The optimization algorithm described below uses an estimate of
E
based on a random selection of surface points. Predicted color
values
I
model
are easiest to evaluate in the centers of triangles. In
the center of triangle
k
,texture
(
R
k
;
G
k
;
B
k
)
T
and 3D location
(
X
k
;
Y
k
;
Z
k
)
T
are averages of the values at the corners. Perspec-
tive projection maps these points to image locations
(
p
x;k
;
p
y;k
)
T
.
Surface normals
n
k
of each triangle
k
are determined by the 3D lo-
cations of the corners. According to Phong illumination, the color
components
I
r;model
,
I
g;model
and
I
b;model
take the form
I
r;model;k
=(
i
r;amb
+
i
r;dir
(
n
k
l
))
R
k
+
i
r;dir
s
(
r
k
v
k
)
(6)
where
l
is the direction of illumination,
v
k
the normalized differ-
ence of camera position and the position of the triangle’s center, and
r
k
=2(
nl
)
n
,
l
the direction of the reflected ray.
s
denotes sur-
face shininess, and
controls the angular distribution of the spec-
ular reflection. Equation (6) reduces to
I
r;model;k
=
i
r;amb
R
k
if
a shadow is cast on the center of the triangle, which is tested in a
method described below.
For high resolution 3D meshes, variations in
I
model
across each
triangle
k
2f
1
; :::; n
t
g
are small, so
E
I
may be approximated by
E
I
n
t
X
k
=1
a
k
k
I
input
(
p
x;k
;
p
y;k
)
,
I
model;k
k
2
;
where
a
k
is the image area covered by triangle
k
. If the triangle is
occluded,
a
k
=0
.
In gradient descent, contributions from different triangles of the
mesh would be redundant. In each iteration, we therefore select a
random subset
Kf
1
; :::; n
t
g
of 40 triangles
k
and replace
E
I
by
E
K
=
X
k
2K
k
I
input
(
p
x;k
;
p
y;k
)
,
I
model;k
)
k
2
:
(7)
The probability of selecting
k
is
p
(
k
2K
)
a
k
. This method of
stochastic gradient descent [16] is not only more efficient computa-
tionally, but also helps to avoid local minima by adding noise to the
gradient estimate.
Before the first iteration, and once every 1000 steps, the algo-
rithm computes the full 3D shape of the current model, and 2D po-
sitions
(
p
x
;p
y
)
T
of all vertices. It then determines
a
k
, and detects
hidden surfaces and cast shadows in a two-pass z-buffer technique.
We assume that occlusions and cast shadows are constant during
each subset of iterations.
Parameters are updated depending on analytical derivatives of
the cost function
E
,using
j
7!
j
,
j
@E
@
j
, and similarly for
j
and
j
, with suitable factors
j
.
Derivatives of texture and shape (Equation 1) yield derivatives
of 2D locations
(
p
x;k
;
p
y;k
)
T
, surface normals
n
k
, vectors
v
k
and
r
k
,and
I
model;k
(Equation 6) using chain rule. From Equation (7),
partial derivatives
@E
K
@
j
,
@E
K
@
j
,and
@E
K
@
j
can be obtained.
Coarse-to-Fine: In order to avoid local minima, the algorithm fol-
lows a coarse-to-fine strategy in several respects:
a) The first set of iterations is performed on adown-sampled version
of the input image with a low resolution morphable model.
b) We start by optimizing only the first coefficients
j
and
j
con-
trolling the first principal components, along with all parameters
Automated
Simultaneous
Matching
Reconstruction
of 3D Shape
and Texture
Pair of
Input Images
3D Result
Illumination
Corrected
Texture
Extraction
New Views
Reconstruction
Original
Figure 5: Simultaneous reconstruction of 3D shape and texture of a
new face from two images taken under different conditions. In the
center row, the 3D face is rendered on top of the input images.
j
. In subsequent iterations, more and more principal components
are added.
c) Starting with a relatively large
N
, which puts a strong weight
on prior probability in equation (5) and ties the optimum towards
the prior expectation value, we later reduce
N
to obtain maximum
matching quality.
d) In the last iterations, the face model is broken down into seg-
ments (Section 3). With parameters
j
fixed, coefficients
j
and
j
are optimized independently for each segment. This increased
number of degrees of freedom significantly improves facial details.
Multiple Images: It is straightforward to extend this technique to
the case where several images of a person are available (Figure 5).
While shape and texture are still described by a common set of
j
and
j
, there is now a separate set of
j
for each input image.
E
I
is replaced by a sum of image distances for each pair of input and
model images, and all parameters are optimized simultaneously.
Illumination-Corrected Texture Extraction: Specific features of
individual faces that are not captured by the morphable model, such
as blemishes, are extracted from the image in a subsequent texture
adaptation process. Extracting texture from images is a technique
widely used in constructing 3D models from images (e.g. [28]).
However, in order to be able to change pose and illumination, it
is important to separate pure albedo at any given point from the
influence of shading and cast shadows in the image. In our ap-
proach, this can be achieved because our matching procedure pro-
vides an estimate of 3D shape, pose, and illumination conditions.
Subsequent to matching, we compare the prediction
I
mod;i
for each
vertex
i
with
I
input
(
p
x;i
;p
y;i
)
, and compute the change in texture
(
R
i
;G
i
;B
i
)
that accounts for the difference. In areas occluded in
the image, we rely on the prediction made by the model. Data from
multiple images can be blended using methods similar to [28].
4.1 Matching a morphable model to 3D scans
The method described above can also be applied to register new
3D faces. Analogous to images, where perspective projection

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors provide an up-to-date critical survey of still-and video-based face recognition research, and provide some insights into the studies of machine recognition of faces.
Abstract: As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system.This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.

6,384 citations

Journal ArticleDOI
TL;DR: This paper presents a method for face recognition across variations in pose, ranging from frontal to profile views, and across a wide range of illuminations, including cast shadows and specular reflections, using computer graphics.
Abstract: This paper presents a method for face recognition across variations in pose, ranging from frontal to profile views, and across a wide range of illuminations, including cast shadows and specular reflections. To account for these variations, the algorithm simulates the process of image formation in 3D space, using computer graphics, and it estimates 3D shape and texture of faces from single images. The estimate is achieved by fitting a statistical, morphable model of 3D faces to images. The model is learned from a set of textured 3D scans of heads. We describe the construction of the morphable model, an algorithm to fit the model to images, and a framework for face identification. In this framework, faces are represented by model parameters for 3D shape and texture. We present results obtained with 4,488 images from the publicly available CMU-PIE database and 1,940 images from the FERET database.

2,187 citations

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A Supervised Descent Method (SDM) is proposed for minimizing a Non-linear Least Squares (NLS) function and achieves state-of-the-art performance in the problem of facial feature detection.
Abstract: Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization of a general smooth function. However, in the context of computer vision, 2nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable and numerical approximations are impractical. (2) The Hessian might be large and not positive definite. To address these issues, this paper proposes a Supervised Descent Method (SDM) for minimizing a Non-linear Least Squares (NLS) function. During training, the SDM learns a sequence of descent directions that minimizes the mean of NLS functions sampled at different points. In testing, SDM minimizes the NLS objective using the learned descent directions without computing the Jacobian nor the Hessian. We illustrate the benefits of our approach in synthetic and real examples, and show how SDM achieves state-of-the-art performance in the problem of facial feature detection. The code is available at www.humansensing.cs. cmu.edu/intraface.

2,138 citations


Cites methods from "A morphable model for the synthesis..."

  • ...Parameterized Appearance Models (PAMs), such as Active Appearance Models [11, 14, 2], Morphable Mod- els [6, 19], Eigentracking [5], and template tracking [22, 30] build an object appearance and shape representation by computing Principal Component Analysis (PCA) on a set of manually labeled data....

    [...]

  • ...Parameterized Appearance Models (PAMs), such as Active Appearance Models [11, 14, 2], Morphable Models [6, 19], Eigentracking [5], and template tracking [22, 30] build an object appearance and shape representation by computing Principal Component Analysis (PCA) on a set of manually labeled data....

    [...]

Journal ArticleDOI
TL;DR: This work proposes an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm and shows that the effects of appearance variation during fitting can be precomputed (“projected out”) using this algorithm and how it can be extended to include a global shape normalising warp.
Abstract: Active Appearance Models (AAMs) and the closely related concepts of Morphable Models and Active Blobs are generative models of a certain visual phenomenon. Although linear in both shape and appearance, overall, AAMs are nonlinear parametric models in terms of the pixel intensities. Fitting an AAM to an image consists of minimising the error between the input image and the closest model instances i.e. solving a nonlinear optimisation problem. We propose an efficient fitting algorithm for AAMs based on the inverse compositional image alignment algorithm. We show that the effects of appearance variation during fitting can be precomputed (“projected out”) using this algorithm and how it can be extended to include a global shape normalising warp, typically a 2D similarity transformation. We evaluate our algorithm to determine which of its novel aspects improve AAM fitting performance.

1,775 citations


Cites background or methods from "A morphable model for the synthesis..."

  • ...This class includes Active Appearance Models (AAMs) (Cootes et al., 1998a, 2001; Edwards, 1999; Edwards et al., 1998; Lanitis et al., 1997), Shape AAMs (Cootes et al., 1998b; Cootes and Kittipanyangam, 2002; Cootes et al., 2001), Direct Appearance Models (Hou et al., 2001), Active Blobs (Sclaroff and Isidoro, 2003), and Morphable Models (Blanz and Vetter, 1999; Jones and Poggio, 1998; Vetter and Poggio, 1997) as well as possibly others....

    [...]

  • ...Keywords: Active Appearance Models, AAMs, Active Blobs, Morphable Models, fitting, efficiency, GaussNewton gradient descent, inverse compositional image alignment...

    [...]

  • ...…Shape AAMs (Cootes et al., 1998b; Cootes and Kittipanyangam, 2002; Cootes et al., 2001), Direct Appearance Models (Hou et al., 2001), Active Blobs (Sclaroff and Isidoro, 2003), and Morphable Models (Blanz and Vetter, 1999; Jones and Poggio, 1998; Vetter and Poggio, 1997) as well as possibly others....

    [...]

  • ...For example, Levenberg-Marquardt was used in Sclaroff and Isidoro (1998) and a stochastic gradient descent algorithm was used in Blanz and Vetter (1999) and Jones and Poggio (1998)....

    [...]

  • ...Another thing that makes empirical evaluation hard is the wide variety of AAM fitting algorithms (Blanz and Vetter, 1999; Cootes et al., 1998a, 2001; Jones and Poggio, 1998; Sclaroff and Isidoro, 1998) and the lack of a standard test set....

    [...]

Journal ArticleDOI
TL;DR: It is demonstrated that the suggested model can enable a model of object recognition in cortex to expand from recognizing individual objects in isolation to sequentially recognizing all objects in a more complex scene.

1,269 citations


Cites methods from "A morphable model for the synthesis..."

  • ...To test attentional modulation of object recognition beyond paper clips, we also tested stimuli consisting of synthetic faces rendered from 3D models, which were obtained by scanning the faces of human subjects ( Vetter&Blanz,1999 )....

    [...]

References
More filters
Book
01 May 1986
TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.
Abstract: Introduction * Properties of Population Principal Components * Properties of Sample Principal Components * Interpreting Principal Components: Examples * Graphical Representation of Data Using Principal Components * Choosing a Subset of Principal Components or Variables * Principal Component Analysis and Factor Analysis * Principal Components in Regression Analysis * Principal Components Used with Other Multivariate Techniques * Outlier Detection, Influential Observations and Robust Estimation * Rotation and Interpretation of Principal Components * Principal Component Analysis for Time Series and Other Non-Independent Data * Principal Component Analysis for Special Types of Data * Generalizations and Adaptations of Principal Component Analysis

17,446 citations

Journal ArticleDOI
Abstract: We describe a new method of matching statistical models of appearance to images. A set of model parameters control modes of shape and gray-level variation learned from a training set. We construct an efficient iterative matching algorithm by learning the relationship between perturbations in the model parameters and the induced image errors.

6,200 citations

Book ChapterDOI
02 Jun 1998
TL;DR: A novel method of interpreting images using an Active Appearance Model (AAM), a statistical model of the shape and grey-level appearance of the object of interest which can generalise to almost any valid example.
Abstract: We demonstrate a novel method of interpreting images using an Active Appearance Model (AAM). An AAM contains a statistical model of the shape and grey-level appearance of the object of interest which can generalise to almost any valid example. During a training phase we learn the relationship between model parameter displacements and the residual errors induced between a training image and a synthesised model example. To match to an image we measure the current residuals and use the model to predict changes to the current parameters, leading to a better fit. A good overall match is obtained in a few iterations, even from poor starting estimates. We describe the technique in detail and give results of quantitative performance tests. We anticipate that the AAM algorithm will be an important method for locating deformable objects in many applications.

3,905 citations

Journal ArticleDOI
TL;DR: In this article, a method for the representation of (pictures of) faces is presented, which results in the characterization of a face, to within an error bound, by a relatively low-dimensional vector.
Abstract: A method is presented for the representation of (pictures of) faces. Within a specified framework the representation is ideal. This results in the characterization of a face, to within an error bound, by a relatively low-dimensional vector. The method is illustrated in detail by the use of an ensemble of pictures taken for this purpose.

2,089 citations

Book
01 Jan 1994
TL;DR: The newest techniques are explained and the latest normative data on age-related changes in measurements (i.e., population norms) are completely examined in nearly 20 chapters and six appendices of this reference.
Abstract: The second edition of this reference provides an update on the best methods for the measurement of the surfaces of the head and neck. The newest techniques are explained and the latest normative data on age-related changes in measurements (i.e., population norms) are completely examined in nearly 20 chapters and six appendices. Topics covered include sources of error in anthropometry and anthroposcopy; age-related changes in selected measurements of the craniofacial complex; anthropometry of minor defects in the craniofacial complex; anthropometry in craniomaxillofacial surgery, genetics, and aesthetic surgery of the nose; and anthropometry of the attractive face. Other chapters discuss the reconstruction of a photograph of a missing child, medical photography in clinical practice, interpolation of growth curves, racial and ethnic morphometric differences in the craniofacial complex, and basic statistical methods in clinical research. Complementing the text throughout are detailed illustrations and highly informative tables.

1,069 citations

Frequently Asked Questions (20)
Q1. What have the authors contributed in "A morphable model for the synthesis of 3d faces" ?

In this paper, a new technique for modeling textured 3D faces is introduced. Starting from an example set of 3D face models, the authors derive a morphable face model by transforming the shape and texture of the examples into a vector space representation. The authors show 3D face reconstructions from single images and their applications for photo-realistic image manipulations. The authors also demonstrate face manipulations according to complex parameters such as gender, fullness of a face or its distinctiveness. 

The authors plan to speed up their matching algorithm by implementing a simplified Newton-method for minimizing the cost function ( Equation 5 ). While the current database is sufficient to model Caucasian faces of middle age, the authors would like to extend it to children, to elderly people as well as to other races. The authors also plan to incorporate additional 3D face examples representing the time course of facial expressions and visemes, the face variations during speech. Automated reconstruction of hair styles from images is one of the future challenges. 

To determine residual deviations between a novel face and the best match within the model, as well as to set unregistered prototypes in correspondence, the authors use an optic flow algorithm that computes correspondence between two faces without the need of a morphable model [35]. 

Other parameters, such as camera distance, light direction, and surface shininess, remain fixed to the values estimated by the user. 

Instead of the time consuming computation of derivatives for each iteration step, a global mapping of the matching error into parameter space can be used [9]. 

The morphable face model is a multidimensional 3D morphing function that is based on the linear combination of a large number of 3D face scans. 

Because the optic flow algorithm does not incorporate any constraints on the set of solutions, it fails on some of the more unusualfaces in the database. 

The authors also derive parametric descriptions of face attributes such as gender, distinctiveness, “hooked” noses or the weight of a person, by evaluating the distribution of exemplar faces for each attribute within their face space. 

In terms of Bayes decision theory, the problem is to find the set of parameters (~ ; ~ ; ~ ) with maximum posterior probability, given an image Iinput. 

The morphable 3D face model is a consequent extension of the interpolation technique between face geometries, as introduced by Parke [26]. 

The authors plan to speed up their matching algorithm by implementing a simplified Newton-method for minimizing the cost function (Equation 5). 

The authors tested the expressive power of their morphable model by automatically reconstructing 3D faces from photographs of arbitrary Caucasian faces of middle age that were not in the database. 

The goal of such an extended morphable face model is to represent any face as a linear combination of a limited basis set of face prototypes. 

A morphable face model was then constructed using a data set of m exemplar faces, each represented by its shape-vector Si and texturevector Ti. Since the authors assume all faces in full correspondence (see Section 5), new shapes Smodel and new textures Tmodel can be expressed in barycentric coordinates as a linear combination of the shapes and textures of the m exemplar faces: 

Most techniques for ‘face cloning’, the reconstruction of a 3D face model from one or more images, still rely on manual assistance for matching a deformable 3D face model to the images [26, 1, 30]. 

The face can be combined with other 3D graphic objects, such as glasses or hats, and then be rendered in front of the background, computing cast shadows or new illumination conditions (Fig. 7). 

Computer aided modeling of human faces still requires a great deal of expertise and manual control to avoid unrealistic, non-face-like results. 

For animation, the missing part of the head can be automatically replaced by a standard hair style or a hat, or by hair that is modeled using interactive manual segmentation and adaptation to a 3D model [30, 28]. 

Most limitations of automated techniques for face synthesis, face animation or for general changes in the appearance of an individual face can be described either as the problem of finding corresponding feature locations in different faces or as the problem of separating realistic faces from faces that could never appear in the real world. 

As demonstrated in Figures 6 and 7, the results can be used for automatic post-processing of a face within the original picture or movie sequence.