scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A Morphable Face Albedo Model

TL;DR: In this article, the authors combine photometric face capture and statistical 3D face appearance modeling to build the first morphable face albedo model. And they demonstrate their model in a state-of-the-art analysis-by-synthesis 3DMM fitting pipeline.
Abstract: In this paper, we bring together two divergent strands of research: photometric face capture and statistical 3D face appearance modelling. We propose a novel lightstage capture and processing pipeline for acquiring ear-to-ear, truly intrinsic diffuse and specular albedo maps that fully factor out the effects of illumination, camera and geometry. Using this pipeline, we capture a dataset of 50 scans and combine them with the only existing publicly available albedo dataset (3DRFE) of 23 scans. This allows us to build the first morphable face albedo model. We believe this is the first statistical analysis of the variability of facial specular albedo maps. This model can be used as a plug in replacement for the texture model of the Basel Face Model and we make our new albedo model publicly available. We ensure careful spectral calibration such that our model is built in a linear sRGB space, suitable for inverse rendering of images taken by typical cameras. We demonstrate our model in a state of the art analysis-by-synthesis 3DMM fitting pipeline, are the first to integrate specular map estimation and outperform the Basel Face Model in albedo reconstruction.

Summary (3 min read)

1. Introduction

  • 3D Morphable Models (3DMMs) were proposed over 20 years ago [4] as a dense statistical model of 3D face geometry and texture.
  • Existing 3DMMs are built using ill-defined “textures” that bake in shading, shadowing, specularities, light source colour, camera spectral sensitivity and colour transformations.
  • Capturing truly intrinsic face appearance parameters is a well studied problem in graphics but this work has been done largely independently of the computer vision and 3DMM communities.
  • In this paper the authors present a novel capture setup and processing pipeline for measuring ear-to-ear diffuse and specular albedo maps.
  • The authors capture their own dataset of 50 faces, combine this with the 3DRFE dataset [27] and build a statistical albedo model that can be used as a dropin replacement for existing texture models.

2. Data capture

  • A lightstage exploits the phenomenon that specular reflection from a dielectric material preserves the plane of polarisation of linearly polarised incident light whereas subsurface diffuse reflection randomises it.
  • A polarising filter on each lightsource is oriented such that a specular reflection towards the viewer has the same plane of polarisation.
  • The second, Iperp, has the polarising filter oriented perpendicularly, blocking the specular but still permitting transmission of the diffuse reflectance.
  • The authors augment the photometric camera with additional cameras providing multiview, single-shot images captured in sync with the photometric images.
  • The authors participants range in age from 18 to 67 and cover skin types I-V of the Fitzpatrick scale [11].

3. Data processing

  • The authors then warp the 3DMM template mesh to the scan geometry.
  • As well as other sources of alignment error, since the three photometric views are not acquired simultaneously, there is likely to be non-rigid deformation of the face between these views.
  • For this reason, in Section 3.3 the authors propose a robust algorithm for stitching the photometric views without blurring potentially misaligned features.

3.1. Multiview stereo

  • The authors commence by applying uncalibrated structure-frommotion followed by dense multiview stereo [1] to all 24 viewpoints (see Fig. 2, blue boxed images).
  • Solving this uncalibrated multiview reconstruction problem provides both the base mesh (see Fig. 2, bottom left) to which the authors fit the 3DMM template and also intrinsic and extrinsic camera parameters for the three photometric views.
  • These form the input to their stitching process.

3.2. Template fitting

  • The authors use the Basel Face Pipeline [14] which uses smooth deformations based on Gaussian Processes.
  • The authors adopted the threshold to exclude vertices from the optimisation for the different levels (to 32mm, 16mm, 8mm, 4mm, 2mm, 1mm, 0.5mm from coarse to fine) to reach better performance for missing parts of the scans.
  • Besides this minor change the authors used the Basel Face Pipeline as is, with between 25 and 45 manually annotated landmarks (eyes: 8, nose 9, mouth 6, eyebrows 4, ears 18).
  • The authors used the template of the BFM 2017 for registration which makes their model compatible to this model.

3.3. Sampling and stitching

  • The authors stitch the multiple photometric viewpoints into seamless diffuse and specular per-vertex albedo maps using Poisson blending.
  • Blending in the gradient domain via solution of a Poisson equation was first proposed by Pérez et al. [23] for 2D images.
  • The approach allows us to avoid visible seams where texture or geometry from different views are inconsistent.
  • Otherwise, the authors take the dot product between the surface normal and view vectors as the weight, giving preference to observations whose projected resolution is higher.
  • (1) We define an additional selection matrix Svk+1 that selects all triangles not selected in any view (i.e. that have no nonzero weight).the authors.the authors.

3.4. Calibrated colour transformation

  • The authors photometric camera captures RAW linear images.
  • The authors transform these to linear sRGB space using a colour transformation matrix computed from light SPD and camera spectral sensitivity calibrations, discretised at D evenly spaced wavelengths.
  • The authors measure the spectral power distribution of the LEDs used in their lightstage, e ∈ RD, using a B&W Tek BSR111E-VIS spectroradiometer.
  • The first performs white balancing: Twb(C, e) = diag(C T e)−1. (4) The second converts from the camera-specific colour space to the standardised XYZ space: Traw2xyz(C) = CCIEC +, (5) where CCIE ∈ R D×3 contains the wavelength discrete CIE1931 2-degree color matching function and C+ is the pseudoinverse of C.
  • To preserve white balance the authors rescale each row such that: Traw2xyz(C)1 =.

4. Integrating 3DRFE

  • The authors augment their own dataset by additionally including the 23 scans from the 3DRFE dataset [27].
  • This enables us to estimate geometric camera calibration parameters from the 3D vertex positions and corresponding 2D UV coordinates.
  • The authors fit the BFM template to the meshes in the same way as for their own data (see Section 3.2).
  • To account for variation in overall skin brightness, during capture the camera gain (ISO) was adjusted for each subject.
  • The authors apply this transformation to all of the linearised, ISO-normalised albedo maps to give the final set of maps used in their model.

5. Modelling

  • The authors model diffuse and specular albedo using a linear sta- tistical model learnt with PCA: x(b) = Pb+ x̄, (6) where P ∈ R3n×d contains the d principal components, x̄ ∈ R3n is the vectorised average map and x : Rd 7→ R3n is the generator function that maps from the low dimensional parameter vector b ∈.
  • For triangles on the boundary between masked and non-masked regions the authors encourage zero gradient.
  • For this reason, the authors replace specular albedo values in the eyeball region by a robust maximum (95th percentile) of the estimated specular albedo values in that region (see Fig. 3(e)).
  • The authors use symmetry augmentation in their modelling.

6. Experiments

  • The final model is a combination of the proposed diffuse and specular albedo model to model facial appearance and the BFM 2017 to model face shape and expressions.
  • The authors adopted the publicly available model adaptation framework1 based on [26] and compare it directly to model adaptation results based on the BFM in Fig.7.
  • The authors perform the experiment on the LFW dataset [18] exactly as proposed in [14] and just exchanged the model (including applying gamma) and used statistical specular albedo maps during model adaptation.
  • These are simply SLR cameras in auto mode with no polarisation, representing a realistic image in approximately ambient light.
  • The authors apply the inverse rendering framework with the same configuration, except for limiting the illumination condition to an ambient one and estimate albedo and observe better albedo reconstruction performance for their proposed model compared to the BFM for every single case.

7. Conclusion

  • The authors built and make available the first statistical model of facial diffuse and specular albedo.
  • The model at hand fills a gap in 3DMM literature and might be beneficial in various directions.
  • Besides applications for computer graphics and vision, the authors also see a benefit for studying human face perception.
  • B. Egger and J. Tenenbaum are supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.
  • The authors acknowledge Abhishek Dutta for the original design and construction of their light stage.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

This is a repository copy of A Morphable Face Albedo Model.
White Rose Research Online URL for this paper:
https://eprints.whiterose.ac.uk/164169/
Version: Accepted Version
Proceedings Paper:
Smith, William Alfred Peter orcid.org/0000-0002-6047-0413, Seck, Alassane, Dee, Hannah
et al. (3 more authors) (Accepted: 2020) A Morphable Face Albedo Model. In: Proceeding
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020).
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 14-19
Jun 2020 IEEE . (In Press)
eprints@whiterose.ac.uk
https://eprints.whiterose.ac.uk/
Reuse
Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless
indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by
national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of
the full text version. This is indicated by the licence information on the White Rose Research Online record
for the item.
Takedown
If you consider content in White Rose Research Online to be in breach of UK law, please notify us by
emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request.

A Morphable Face Albedo Model
William A. P. Smith
1
Alassane Seck
2,3
Hannah Dee
3
Bernard Tiddeman
3
Joshua Tenenbaum
4
Bernhard Egger
4
1
University of York, UK
2
ARM Ltd, UK
3
Aberystwyth University, UK
4
MIT - BCS, CSAIL & CBMM, USA
william.smith@york.ac.uk, alou.kces@live.co.uk, {hmd1,bpt}@aber.ac.uk, {jbt,egger}@mit.edu
Statistical Diffuse Albedo Model Statistical Specular Albedo Model Combined Model Rendering
1 2 3
+
Mean
1 2 3
+
Mean
Mean
1 2 3
+
Figure 1: First 3 principal components of our statistical diffuse (left) and specular (middle) albedo models. Both are visualised
in linear sRGB space. Right: rendering of the combined model under frontal illumination in nonlinear sRGB space.
Abstract
In this paper, we bring together two divergent strands of
research: photometric face capture and statistical 3D face
appearance modelling. We propose a novel lightstage cap-
ture and processing pipeline for acquiring ear-to-ear, truly
intrinsic diffuse and specular albedo maps that fully factor
out the effects of illumination, camera and geometry. Using
this pipeline, we capture a dataset of 50 scans and com-
bine them with the only existing publicly available albedo
dataset (3DRFE) of 23 scans. This allows us to build the
first morphable face albedo model. We believe this is the
first statistical analysis of the variability of facial specular
albedo maps. This model can be used as a plug in replace-
ment for the texture model of the Basel Face Model and we
make our new albedo model publicly available. We ensure
careful spectral calibration such that our model is built in a
linear sRGB space, suitable for inverse rendering of images
taken by typical cameras. We demonstrate our model in a
state of the art analysis-by-synthesis 3DMM fitting pipeline,
are the first to integrate specular map estimation and out-
perform the Basel Face Model in albedo reconstruction.
1. Introduction
3D Morphable Models (3DMMs) were proposed over 20
years ago [4] as a dense statistical model of 3D face geom-
etry and texture. They can be used as a generative model of
2D face appearance by combining shape and texture param-
eters with illumination and camera parameters that are pro-
vided as input to a graphics renderer. Using such a model
in an analysis-by-synthesis framework allows a principled
disentangling of the contributing factors of face appearance
in an image. More recently, 3DMMs and differentiable ren-
derers have been used as model-based decoders to train con-
volutional neural networks (CNNs) to regress 3DMM pa-
rameters directly from a single image [29].
The ability of these methods to disentangle intrinsic (ge-
ometry and reflectance) from extrinsic (illumination and
camera) parameters relies upon the 3DMM capturing only
intrinsic parameters, with geometry and reflectance mod-
elled independently. 3DMMs are usually built from cap-
tured data [4, 22, 5, 7]. This necessitates a face capture
setup in which not only 3D geometry but also intrinsic face
reflectance properties, e.g. diffuse albedo, can be measured.
A recent large scale survey of 3DMMs [10] identified a lack
of intrinsic face appearance datasets as a critical limiting
factor in advancing the state-of-the-art. Existing 3DMMs
are built using ill-defined “textures” that bake in shading,
shadowing, specularities, light source colour, camera spec-
tral sensitivity and colour transformations. Capturing truly
intrinsic face appearance parameters is a well studied prob-
lem in graphics but this work has been done largely inde-
pendently of the computer vision and 3DMM communities.
In this paper we present a novel capture setup and pro-
cessing pipeline for measuring ear-to-ear diffuse and spec-
ular albedo maps. We use a lightstage to capture multiple
photometric views of a face. We compute geometry using
uncalibrated multiview stereo, warp a template to the raw
5011

scanned meshes and then stitch seamless per-vertex diffuse
and specular albedo maps. We capture our own dataset of
50 faces, combine this with the 3DRFE dataset [27] and
build a statistical albedo model that can be used as a drop-
in replacement for existing texture models. We make this
model publicly available. To demonstrate the benefits of
our model, we use it with a state-of-the-art fitting algorithm
and show improvements over existing texture models.
1.1. Related work
3D Morphable Face Models The original 3DMM of
Blanz and Vetter [4] was built using 200 scans captured in a
Cyberware laser scanner which also provides a colour tex-
ture map. Ten years later the first publicly available 3DMM,
the Basel Face Model (BFM) [22], was released. Again,
this was built from 200 scans, this time captured using a
structured light system from ABW-3D. Here, texture is cap-
tured by three cameras synchronised with three flashes with
diffusers, providing relatively consistent illumination. The
later BFM 2017 [14] used largely the same data from the
same scanning setup. More recently, attempts have been
made to scale up training data to better capture variabil-
ity across the population. Both the large scale face model
(LSFM) [5] (10k subjects) and Liverpool-York Head Model
(LYHM) [7] (1.2k subjects) use shape and textures captured
by a 3DMD multiview structured light scanner under rel-
atively uncontrolled illumination conditions. Ploumpis et
al. [24] show how to combine the LSFM and LYHM but
do so only for shape, not for texture. All of these previous
models use texture maps that are corrupted by shading ef-
fects related to geometry and the illumination environment,
mix specular and diffuse reflectance and are specific to the
camera with which they were captured. Gecer et al. [12] use
a Generative Adversarial Network (GAN) to learn a non-
linear texture model from high resolution scanned textures.
Although this enables them to capture high frequency de-
tails usually lost by linear models, it does not resolve the
issues with the source textures.
Recently, there have been attempts to learn 3DMMs di-
rectly from in-the-wild data simultaneously with learning to
fit the model to images [30, 28]. The advantage of such ap-
proaches is that they can exploit the vast resource of avail-
able 2D face images. However, the separation of illumina-
tion and albedo is ambiguous while non-Lambertian effects
are usually neglected and so these methods do not currently
provide intrinsic appearance models of a quality compara-
ble with those built from captured textures.
Face Capture Existing methods for face capture fall
broadly into two categories: photometric and geometric.
Geometric methods rely on finding correspondences be-
tween features in multiview images enabling the triangu-
lation of 3D position. These methods are relatively robust,
can operate in uncontrolled illumination conditions, provide
instantaneous capture and can provide high quality shape
estimates [3]. They are sufficiently mature that commer-
cial systems are widely available, for example using struc-
tured light stereo, multiview stereo or laser scanning. How-
ever, the texture maps captured by these systems are nothing
other than an image of the face under a particular set of en-
vironmental conditions and hence are useless for relighting.
Worse, since appearance is view-dependent (the position of
specularities changes with viewing direction), no one single
appearance can explain the set of multiview images.
On the other hand, photometric analysis allows estima-
tion of additional reflectance properties such as diffuse and
specular albedo [21], surface roughness [15] and index of
refraction [16] through analysis of the intensity and polar-
isation state of reflected light. This separation of appear-
ance into geometry and reflectance is essential for the con-
struction of 3DMMs that truly distentangle the different fac-
tors of appearance. The required setups are usually much
more restrictive, complex and not yet widely commercially
available. Hence, the availability of datasets has been ex-
tremely limited, particularly of the scale required for learn-
ing 3DMMs. There is a single publicly available dataset
of scans, the 3D Relightable Facial Expression (3DRFE)
database [27] captured using the setup of Ma et al. [21].
Ma et al. [21] were the first to propose the use of po-
larised spherical gradient illumination in a lightstage. This
serves two purposes. On the one hand, spherical gradient il-
lumination provides a means to perform photometric stereo
that avoids problems caused by binary shadowing in point
source photometric stereo. On the other hand, the use of po-
larising filters on the lights and camera enables separation
of diffuse and specular reflectance which, for the constant
illumination case, allows measurement of intrinsic albedo.
This was extended to realtime performance capture by Wil-
son et al. [31] who showed how a certain sequence of il-
lumination conditions allowed for temporal upsampling of
the photometric shape estimates. The main drawback of the
lightstage setup is that the required illumination polariser
orientation is view dependent and so diffuse/specular sep-
aration is only possible for a single viewpoint which does
not permit capturing full ear-to-ear face models. Ghosh et
al. [17] made an empirical observation that using two illu-
mination fields with locally orthogonal patterns of polari-
sation allows approximate specular/diffuse separation from
any viewpoint on the equator. Although practically useful,
in this configuration specular and diffuse reflectance is not
fully separated. More generally, lightstage albedo bakes in
ambient occlusion (which depends on geometry) and RGB
values are dependent on the light source spectra and camera
spectral sensitivities.
3D Morphable Model Fitting The estimation of 3DMM
parameters (shape, expression, colour, illumination and
5012

camera) is an ongoing inverse rendering challenge. Most
approaches focus on shape estimation only and omit the
reconstruction of colour/albedo and illumination, e.g. [20].
The few methods taking the colour into account suffer from
the ambiguity between albedo and illumination demon-
strated in Egger et al. [9]. This ambiguity is especially hard
to overcome for two reasons: 1. all publicly available face
models don’t model real diffuse or specular albedo, 2. most
models have a strong bias towards Caucasian faces which
results in a strongly biased prior. The reflectance models
used for inverse rendering are usually dramatically simpli-
fied and the specular term is either omitted or constant.
Genova et al. [13] point out the limitation of no statistics
on specularity and use a heuristic for their specular term.
Romdhani et al. [25] use the position of specularities as
shape cues but again with homogeneous specular maps. The
work of Yamaguchi et al. [32] demonstrate the value of sep-
arate estimation of specular and diffuse albedo, however
they do not explore the statistics or build a generative model
and their approach is not available to the community. Cur-
rent limitations are mainly caused by the lack of a publicly
available diffuse and specular albedo model.
2. Data capture
A lightstage exploits the phenomenon that specular re-
flection from a dielectric material preserves the plane of po-
larisation of linearly polarised incident light whereas sub-
surface diffuse reflection randomises it. This allows separa-
tion of specular and diffuse reflectance by capturing a pair
of images under polarised illumination. A polarising filter
on each lightsource is oriented such that a specular reflec-
tion towards the viewer has the same plane of polarisation.
The first image, I
para
, has a polarising filter in front of the
camera oriented parallel to the plane of polarisation of the
specularly reflected light, allowing both specular and dif-
fuse transmission. The second, I
perp
, has the polarising fil-
ter oriented perpendicularly, blocking the specular but still
permitting transmission of the diffuse reflectance. The dif-
ference, I
para
I
perp
, gives only the specular reflection.
Setup Our setup comprises a custom built lightstage with
polarised LED illumination, a single photometric camera
(Nikon D200) with optoelectric polarising filter (LC-Tec
FPM-L-AR) and seven additional cameras (Canon 7D) to
provide multiview coverage. We use 41 ultra bright white
LEDs mounted on a geodesic dome of diameter 1.8m. Each
LED has a rotatable linear polarising filter in front of it.
Their orientation is tuned by placing a sphere of low diffuse
albedo and high specular albedo (a black snooker ball) in
the centre of the dome and adjusting the filter orientation
until the specular reflection is completely cancelled in the
photometric camera’s view. Since we only seek to estimate
albedo maps, we require only the constant illumination con-
dition in which all LEDs are set to maximum brightness.
In contrast to previous lightstage-based methods, we
capture multiple virtual viewpoints by capturing the face
in different poses, specifically frontal and left/right pro-
file. This provides full ear-to-ear coverage for the single
polarisation-calibrated photometric viewpoint. The opto-
electric polarising filter enables the parallel/perpendicular
conditions to be captured in rapid succession without re-
quiring mechanical filter rotation. We augment the photo-
metric camera with additional cameras providing multiview,
single-shot images captured in sync with the photometric
images. We position these additional cameras to provide
overlapping coverage of the face. We do not rely on a fixed
geometric calibration, so the exact positioning of these cam-
eras is unimportant and we allow the cameras to autofocus
between captures. In our setup, we use 7 such cameras in
addition to the photometric view giving a total of 8 simul-
taneous views. Since we repeat the capture three times, we
have 24 effective views. For synchronisation, we control
camera shutters and the polarisation state of the photomet-
ric camera using an MBED micro controller. A complete
dataset for a face is shown in Fig. 2.
Participants We captured 50 individuals (13 females) in
our setup. Our participants range in age from 18 to 67 and
cover skin types I-V of the Fitzpatrick scale [11].
3. Data processing
In order to merge these views and to provide a rough
base mesh, we perform a multiview reconstruction. We
then warp the 3DMM template mesh to the scan geometry.
As well as other sources of alignment error, since the three
photometric views are not acquired simultaneously, there is
likely to be non-rigid deformation of the face between these
views. For this reason, in Section 3.3 we propose a robust
algorithm for stitching the photometric views without blur-
ring potentially misaligned features. We provide an imple-
mentation of our sampling, weighting and blending pipeline
as an extension of the MatlabRenderer toolbox [2].
3.1. Multiview stereo
We commence by applying uncalibrated structure-from-
motion followed by dense multiview stereo [1] to all 24
viewpoints (see Fig. 2, blue boxed images). Solving this un-
calibrated multiview reconstruction problem provides both
the base mesh (see Fig. 2, bottom left) to which we fit the
3DMM template and also intrinsic and extrinsic camera pa-
rameters for the three photometric views. These form the
input to our stitching process.
3.2. Template fitting
To build a 3DMM from raw scanning data, we es-
tablish correspondence to a template. We use the Basel
5013

!
"#$%%
&#$'(
!
)*#*
&#$'(
!
"#$%%
+,&(
!
)*#*
+,&(
Frontal Pose Left Pose
Right Pose
Multiview stereo
Template warping
!
"#$%%
#-./(
!
)*#*
#-./(
0!
"#$%%
01!
)*#*
2 !
"#$%%
3
Sampling
Poisson
blending
Raw scan
!
%(-("/
Colour transformed
albedo images
Sampled albedo maps
Fitted
template
Diffuse:
Specular:
Figure 2: Overview of our capture and blending pipeline. Images within a blue box are captured simultaneously. Photometric
image pairs within a dashed orange box are captured sequentially with perpendicular/parallel polarisation state respectively.
Face Pipeline [14] which uses smooth deformations based
on Gaussian Processes. We adopted the threshold to ex-
clude vertices from the optimisation for the different levels
(to 32mm, 16mm, 8mm, 4mm, 2mm, 1mm, 0.5mm from
coarse to fine) to reach better performance for missing parts
of the scans. Besides this minor change we used the Basel
Face Pipeline as is, with between 25 and 45 manually anno-
tated landmarks (eyes: 8, nose 9, mouth 6, eyebrows 4, ears
18). We used the template of the BFM 2017 for registration
which makes our model compatible to this model.
3.3. Sampling and stitching
We stitch the multiple photometric viewpoints into seam-
less diffuse and specular per-vertex albedo maps using Pois-
son blending. Blending in the gradient domain via solution
of a Poisson equation was first proposed by P
´
erez et al. [23]
for 2D images. The approach allows us to avoid visible
seams where texture or geometry from different views are
inconsistent.
For each viewpoint, v V = {v
1
, . . . , v
k
}, we sample
RGB intensities onto the n vertices of the mesh, I
v
R
n×3
.
Then, for each view we compute a per-triangle confidence
value for each of the t triangles, w
v
R
t
. For each tri-
angle, this is defined as the minimum per-vertex weight for
each vertex in the triangle, where the per-vertex weights are
defined as follows. If the vertex is not visible in that view,
the weight is set to zero. We also set the weight to zero if
the vertex projection is within a threshold distance of the
occluding boundary to avoid sampling background onto the
mesh. Otherwise, we take the dot product between the sur-
face normal and view vectors as the weight, giving prefer-
ence to observations whose projected resolution is higher.
Next, we define a selection matrix for each view, S
v
{0, 1}
m
v
×t
, that selects a triangle if view v has the highest
weight for that triangle:
S
T
v
1
m
v
i
= 1 iff u V \ {v}, w
u
i
< w
v
i
. (1)
We define an additional selection matrix S
v
k+1
that selects
all triangles not selected in any view (i.e. that have no non-
zero weight). Hence, every triangle is selected exactly once
and
P
k+1
i=1
m
v
i
= t. We similarly define per-vertex selec-
tion matrices
˜
S
v
{0, 1}
˜m
v
×n
that select the vertices for
which view v has the highest per-vertex weights.
We write a screened Poisson equation as a linear system
5014

Citations
More filters
Proceedings Article
01 Jan 1999

2,010 citations

01 Jan 2010
TL;DR: A passive stereo system for capturing the 3D geometry of a face in a single-shot under standard light sources is described, modified of standard stereo refinement methods to capture pore-scale geometry, using a qualitative approach that produces visually realistic results.
Abstract: This paper describes a passive stereo system for capturing the 3D geometry of a face in a single-shot under standard light sources. The system is low-cost and easy to deploy. Results are submillimeter accurate and commensurate with those from state-of-the-art systems based on active lighting, and the models meet the quality requirements of a demanding domain like the movie industry. Recovered models are shown for captures from both high-end cameras in a studio setting and from a consumer binocular-stereo camera, demonstrating scalability across a spectrum of camera deployments, and showing the potential for 3D face modeling to move beyond the professional arena and into the emerging consumer market in stereoscopic photography. Our primary technical contribution is a modification of standard stereo refinement methods to capture pore-scale geometry, using a qualitative approach that produces visually realistic results. The second technical contribution is a calibration method suited to face capture systems. The systemic contribution includes multiple demonstrations of system robustness and quality. These include capture in a studio setup, capture off a consumer binocular-stereo camera, scanning of faces of varying gender and ethnicity and age, capture of highly-transient facial expression, and scanning a physical mask to provide ground-truth validation.

254 citations

Journal ArticleDOI
TL;DR: A detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed is provided in this paper, where the challenges in building and applying these models, namely, capture, modeling, image formation, and image analysis, are still active research topics, and the state-of-the-art in each of these areas are reviewed.
Abstract: In this article, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely, capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research, and highlighting the broad range of current and future applications.

205 citations

Posted Content
TL;DR: This paper harnesses the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images and achieves for the first time, to the best of the authors' knowledge, facial texture reconstruction with high-frequency details.
Abstract: A lot of work has been done towards reconstructing the 3D facial structure from single images by capitalizing on the power of Deep Convolutional Neural Networks (DCNNs). In the recent works, the texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction is still not capable of modeling facial texture with high-frequency details. In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful facial texture prior \edit{from a large-scale 3D texture dataset}. Then, we revisit the original 3D Morphable Models (3DMMs) fitting making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. In order to be robust towards initialisation and expedite the fitting process, we propose a novel self-supervised regression based approach. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details.

27 citations

Posted Content
TL;DR: This work presents a new neural representation for face reflectance where all components of the reflectance responsible for the final appearance from a monocular image are estimated, and outperforms existing monocular reflectance reconstruction methods due to better capturing of physical effects, such as sub-surface scattering, specularities, self-shadows and other higher-order effects.
Abstract: The reflectance field of a face describes the reflectance properties responsible for complex lighting effects including diffuse, specular, inter-reflection and self shadowing. Most existing methods for estimating the face reflectance from a monocular image assume faces to be diffuse with very few approaches adding a specular component. This still leaves out important perceptual aspects of reflectance as higher-order global illumination effects and self-shadowing are not modeled. We present a new neural representation for face reflectance where we can estimate all components of the reflectance responsible for the final appearance from a single monocular image. Instead of modeling each component of the reflectance separately using parametric models, our neural representation allows us to generate a basis set of faces in a geometric deformation-invariant space, parameterized by the input light direction, viewpoint and face geometry. We learn to reconstruct this reflectance field of a face just from a monocular image, which can be used to render the face from any viewpoint in any light condition. Our method is trained on a light-stage training dataset, which captures 300 people illuminated with 150 light conditions from 8 viewpoints. We show that our method outperforms existing monocular reflectance reconstruction methods, in terms of photorealism due to better capturing of physical premitives, such as sub-surface scattering, specularities, self-shadows and other higher-order effects.

14 citations

References
More filters
Journal ArticleDOI
29 Jun 2009
TL;DR: This technique provides a direct estimate of the per‐pixel specular roughness and thus does not require off‐line numerical optimization that is typical for the measure‐and‐fit approach to classical BRDF modeling.
Abstract: This paper presents a novel method for estimating specular roughness and tangent vectors, per surface point, from polarized second order spherical gradient illumination patterns. We demonstrate that for isotropic BRDFs, only three second order spherical gradients are sufficient to robustly estimate spatially varying specular roughness. For anisotropic BRDFs, an additional two measurements yield specular roughness and tangent vectors per surface point. We verify our approach with different illumination configurations which project both discrete and continuous fields of gradient illumination. Our technique provides a direct estimate of the per-pixel specular roughness and thus does not require off-line numerical optimization that is typical for the measure-and-fit approach to classical BRDF modeling.

109 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: The results show that the proposed encoder-decoder network expands the capacity of 3DMM for capturing discriminative shape features and facial detail, and thus outperforms existing methods both in 3D face reconstruction accuracy and in face recognition accuracy.
Abstract: This paper proposes an encoder-decoder network to disentangle shape features during 3D face reconstruction from single 2D images, such that the tasks of reconstructing accurate 3D face shapes and learning discriminative shape features for face recognition can be accomplished simultaneously. Unlike existing 3D face reconstruction methods, our proposed method directly regresses dense 3D face shapes from single 2D images, and tackles identity and residual (i.e., non-identity) components in 3D face shapes explicitly and separately based on a composite 3D face shape model with latent representations. We devise a training process for the proposed network with a joint loss measuring both face identification error and 3D face shape reconstruction error. To construct training data we develop a method for fitting 3D morphable model (3DMM) to multiple 2D images of a subject. Comprehensive experiments have been done on MICC, BU3DFE, LFW and YTF databases. The results show that our method expands the capacity of 3DMM for capturing discriminative shape features and facial detail, and thus outperforms existing methods both in 3D face reconstruction accuracy and in face recognition accuracy.

102 citations

Proceedings ArticleDOI
28 May 2019
TL;DR: This work proposes two methods for solving the problem of combining two or more 3DMMs that are built using different templates that perhaps only partly overlap, have different representation capabilities and are built from different datasets that may not be publicly-available.
Abstract: Three-dimensional Morphable Models (3DMMs) are powerful statistical tools for representing the 3D surfaces of an object class. In this context, we identify an interesting question that has previously not received research attention: is it possible to combine two or more 3DMMs that (a) are built using different templates that perhaps only partly overlap, (b) have different representation capabilities and (c) are built from different datasets that may not be publicly-available? In answering this question, we make two contributions. First, we propose two methods for solving this problem: i. use a regressor to complete missing parts of one model using the other, ii. use the Gaussian Process framework to blend covariance matrices from multiple models. Second, as an example application of our approach, we build a new head and face model that combines the variability and facial detail of the LSFM with the full head modelling of the LYHM. The resulting combined model achieves state-of-the-art performance and outperforms existing head models by a large margin. Finally, as an application experiment, we reconstruct full head representations from single, unconstrained images by utilizing our proposed large-scale model in conjunction with the Face-Warehouse blendshapes for handling expressions.

77 citations

Proceedings ArticleDOI
15 Dec 2010
TL;DR: In this article, the Stokes reflectance field of circularly polarized spherical illumination is used to estimate diffuse albedo and specular roughness of isotropic BRDFs.
Abstract: We present a novel method for surface reflectometry from a few observations of a scene under a single uniform spherical field of circularly polarized illumination. The method is based on a novel analysis of the Stokes reflectance field of circularly polarized spherical illumination and yields per-pixel estimates of diffuse albedo, specular albedo, index of refraction, and specular roughness of isotropic BRDFs. To infer these reflectance parameters, we measure the Stokes parameters of the reflected light at each pixel by taking four photographs of the scene, consisting of three photographs with differently oriented linear polarizers in front of the camera, and one additional photograph with a circular polarizer. The method only assumes knowledge of surface orientation, for which we make a few additional photometric measurements. We verify our method with three different lighting setups, ranging from specialized to off-the-shelf hardware, which project either discrete or continuous fields of spherical illumination. Our technique offers several benefits: it estimates a more detailed model of per-pixel surface reflectance parameters than previous work, it requires a relatively small number of measurements, it is applicable to a wide range of material types, and it is completely viewpoint independent.

70 citations

Journal ArticleDOI
TL;DR: A direct application of the algorithm with the 3D Morphable Model leads to a fully automatic face recognition system with competitive performance on the Multi-PIE database without any database adaptation.
Abstract: We present a novel fully probabilistic method to interpret a single face image with the 3D Morphable Model. The new method is based on Bayesian inference and makes use of unreliable image-based information. Rather than searching a single optimal solution, we infer the posterior distribution of the model parameters given the target image. The method is a stochastic sampling algorithm with a propose-and-verify architecture based on the Metropolis---Hastings algorithm. The stochastic method can robustly integrate unreliable information and therefore does not rely on feed-forward initialization. The integrative concept is based on two ideas, a separation of proposal moves and their verification with the model (Data-Driven Markov Chain Monte Carlo), and filtering with the Metropolis acceptance rule. It does not need gradients and is less prone to local optima than standard fitters. We also introduce a new collective likelihood which models the average difference between the model and the target image rather than individual pixel differences. The average value shows a natural tendency towards a normal distribution, even when the individual pixel-wise difference is not Gaussian. We employ the new fitting method to calculate posterior models of 3D face reconstructions from single real-world images. A direct application of the algorithm with the 3D Morphable Model leads us to a fully automatic face recognition system with competitive performance on the Multi-PIE database without any database adaptation.

68 citations