What have the authors contributed in "Automatic multi-view face recognition via 3d model based pose regularization" ?

In this paper, the authors propose a fully automatic method for multiview face recognition.

What future works have the authors mentioned in the paper "Automatic multi-view face recognition via 3d model based pose regularization" ?

The authors also plan to improve the 3D modeling accuracy by building 3D models for individual demographic groups, such as age, gender and race.

How can the authors generate new facial images from a face image?

By transforming S using different transla-tion, rotation, scaling, and projection transformations, the authors can easily generated novel synthetic target images from a target face image.

How many vertices are included in the original 3D face?

The original 3D face includes 75,972 vertices, but for efficient computation, the authors interactively select 76 vertices based on the 76 keypoints defined in an open source Active Shape Model (Stasm [23]).

What is the effect of pose regularization on face matching?

The authors can find that pose regularization in the proposed method greatly reduces the pose gap between target and query images, and therefore improves the face matching accuracy.

How do the authors improve the 3D modeling accuracy?

The authors also plan to improve the 3D modeling accuracy by building 3D models for individual demographic groups, such as age, gender and race.

How does the proposed approach achieve performance?

While the state-of-the-art system MKD-SRC gets around 20% verification rates at 0.1 FAR under large yaw rotations, the propose approach achieves much better performance (50%).

How can the authors reduce the pose disparity between the two images?

By building a 3D model and generating synthetic target face images to resemble the the poses of query images, the authors are able to reduce the pose disparity between them.

Why is FaceVACS no longer available as a baseline?

The authors should point out that under the scenario of large yaw rotations, FaceVACS is no longer available as a baseline because no faces can be enrolled.

What are the common face verification experiments used by existing multi-view face matching systems?

Face images with small yaw rotations are commonly used by existing multi-view face matching systems in their evaluations (see Table 1).

What is the difference between the proposed approach and MKD-SRC?

The comparison between Fig. 4 (a) and (b) reveals that both the proposed approach and MKD-SRC are more robust to background and illumination variations, as well as motion blurs in the Mobile databasethan FaceVACS, but the proposed approach is more effective than MKD-SRC in handling small pose variations.

(Open Access) Automatic multi-view face recognition via 3D model based pose regularization (2013) | Koichiro Niinuma

Q: How many synthetic images are used for matching?

6Since the estimated pose by MTSPM is prone to error, instead of using only one synthetic image, multiple synthetic images with similar poses will be used for matching.

Automatic Multi-view Face Recognition via 3D Model Based Pose Regularization

Koichiro Niinuma, Hu Han, and Anil K. Jain

Department of Computer Science and Engineering

Michigan State University, East Lansing, MI, U.S.A.

{niinumak, hhan, jain}@msu.edu

Abstract

One of the major challenges encountered by face recog-

nition lies in the difﬁculty of handling arbitrary poses vari-

ations. While different approaches have been developed for

face recognition across pose variations, many methods ei-

ther require manual landmark annotations or assume the

face poses to be known. These constraints prevent many

face recognition systems from working automatically. In

this paper, we propose a fully automatic method for multi-

view face recognition. We ﬁrst build a 3D model from each

frontal target face image, which is used to generate syn-

thetic target face images. The pose of a query face image

is also estimated using a multi-view face detector so that

the synthetic target face images can be generated to resem-

ble the pose variation of a query face image. Procrustes

analysis is then applied to align the synthetic target images

and the query image, and block based MLBP features are

extracted for face matching. Experimental results on two

public-domain databases (Color FERET and PubFig), and

a Mobile face database collected using mobile phones show

that the proposed approach outperforms two state-of-the-

art face matchers (FaceVACS and MKD-SRC) in automatic

multi-view face recognition. The proposed approach can

also be easily extended to leverage existing face recognition

systems for automatic multi-view face recognition.

1. Introduction

The goal of automated face recognition (AFR) is to au-

tomatically recognize a person from digital images or video

sequences containing his face. AFR has attracted substan-

tial attention in the past decades due to its wide applications

in real-world scenarios [31] ranging from mobile phone au-

thentication to surveillance. While AFR in controlled con-

ditions, such as frontal or near-frontal poses, neutral expres-

sions and near uniform illumination, has shown impressive

performance, AFR in uncontrolled environments, such as

arbitrary poses, non-uniform illumination, and partial oc-

clusion, remains a challenging problem [31].

Figure 1. Scenarios and examples of images of face recognition

in uncontrolled environments. (a) Proﬁle face image which led

to arrest

, (b) Non-frontal face image in FERET dataset [27], (c)

Non-frontal face image in the Mobile dataset collected in our lab-

oratory, (d) Non-frontal face image in PubFig dataset [17].

One typical application of AFR in uncontrolled condi-

tions is identiﬁcation or authorization of individuals with

face images or videos captured by mobile devices, such as

handheld terminals, mobile phones, or surveillance cameras

(see Fig. 1). In these scenarios, there is a high possibility

that the face images are captured without the cooperation of

subjects. As a result, faces in the query images can be of ar-

bitrary poses. Fig. 1 (a) shows an example where a proﬁle

face image led to the arrest of a robbery suspect. Despite

the potential value of non-frontal face images in forensic

applications, the arbitrary pose variations have become one

of the primary stumbling blocks for most existing systems

to perform face recognition automatically.

This paper proposes a fully automatic multi-view face

recognition method that

1. does not necessitate manual landmark annotations or

the assumption of known poses within a limited range,

2. achieves higher performance than two state-of-the-art

face matchers in several scenarios with different pose

variations,

3. and naturally facilitates the application of existing

AFR systems in uncontrolled environments.

http://tonn.rssing.com/chan-1762138/all_p47.

html

To Appear in The IEEE 6th International Conference on Biometrics: Theory, Applications and

Systems (BTAS), Sept. 29-Oct. 2, 2013, Washington DC, USA

Table 1. A comparison of existing methods for multi-view face recognition.

Publication Approach

Pose assumed

Manual annotation

Databases used

to be known?

required for non-

(pose variations)

frontal face image?

Sharma et al. [33]

Partial Least Squares,

Yes Yes

FERET (±60

◦

)

Bilinear Model, CMU-PIE (±90

◦

)

Canonical Correlation Analysis Multi-PIE (±90

◦

)

Pose invariant feature extraction

Holistic

Li et al. [19] Partial Least Squares Yes Yes

Multi-PIE(±90

◦

)

CMU-PIE (±90

◦

)

Fischer et al. [10] Partial Least Squares Yes Yes Multi-PIE (±90

◦

)

Prince et al. [29] Tied Factor Analysis Yes Yes

FERET (±90

◦

)

CMU-PIE (±90

◦

)

XM2VTS (±90

◦

)

Li et al. [18] Linear Regression Yes Yes

FERET (±60

◦

)

CMU-PIE(±90

◦

)

Blanz and Vetter [6] 3D Morphable Model No Yes

FERET (±60

◦

)

CMU-PIE(±90

◦

)

Wang et al. [36]

Orthogonal

No No

FERET(±25

◦

)

Discriminant Vector

CMU-PIE(±15

◦

)

Yale B, AR

Local

Kanade and Yamada [16]

Subregion Based

Yes Yes CMU-PIE (±90

◦

)

Probabilistic Model

Ashraf et al. [4] Probabilistic Stack-ﬂow Yes Yes FERET (±60

◦

)

Lucey and Chen [22]

Patch-whole

No Yes FERET (±60

◦

)

Sparse Registration

Castillo and Jacobs [7] Stereo Matching No Yes CMU-PIE (±90

◦

)

Arashloo and Kittler [3] Markov Random Field No

CMU-PIE (±90

◦

)

XM2VTS

Liao et al. [21] Multi-keypoint Descriptor No No PubFig (Arbitrary)

Chai et al. [8] Linear Regression Yes No CMU-PIE(±45

◦

)

Sarfraz and Hellwich [32] Multivariate Regression Yes No

CMU-PIE (±90

◦

)

FERET (±60

◦

)

Li et al. [20] Morphable Displacement Field Yes No

FERET (±60

◦

)

CMU-PIE (±90

◦

)

Pose Normalization

To frontal

Teijeiro-Mosquera et al. [35] Active Appearance Model No No CMU-PIE(±45

◦

)

Asthana et al. [5] No No

FERET(±40

◦

)

View Based CMU-PIE(±45

◦

)

Active Appearance Model Multi-PIE(±45

◦

)

FacePix(±45

◦

)

Ding et al. [9]

Random Forest Embedded

No No

FERET(±60

◦

)

Active Shape Model

CMU-PIE(±67.5

◦

)

CAS-PEAL(±45

◦

)

To non-frontal

Prabhu et al. [28] 3D Generic Elastic Model No No

Multi-PIE (±60

◦

)

Video Clips

Han and Jain [12] 3D Modeling from two images No Yes FERET (±22.5

◦

)

Our approach 3D Based Pose Regularization No No

FERET(±90

◦

)

Mobile (±90

◦

)

PubFig (Arbitrary)

1.1. Related Work

Over the past couple of decades, many methods have

been proposed to handle the pose variation problem in AFR

(see Table 1). These approaches for multi-view face recog-

nition can be grouped into two main categories: (i) pose in-

variant feature extraction, and (ii) pose normalization. Ap-

proaches in the ﬁrst category aim to provide a common

representation which maximizes the correlation among sub-

ject’s face images with different poses. They can be further

classiﬁed into (i) holistic representation, and (ii) local rep-

resentation. For holistic representation, linear regression,

partial least squares (PLS), Bilinear Model (BLM), Canoni-

cal Correlation Analysis (CCA), 3D Morphable Model, are

widely used approaches [6, 10, 18, 19, 29, 33, 36] that

project face images with different poses into latent spaces,

A bounding box is required, but it is not clear if the bounding box is

obtained manually or automatically.

where a pose-independent representation is obtained. The

merit of these approaches is that the pose variation prob-

lem and feature representation are solved simultaneously.

However, many holistic methods assume that the poses of

face images are known. For example, the poses provided in

the databases are directly used to build pose-speciﬁc mod-

els, and only the model covering the pose of a testing im-

age is used for recognition. Additionally, holistic repre-

sentation can easily be affected by face deformations due

to large pose variations. By contrast, local representations

that extract features from individual patches of a face are

more robust to large pose variations. Markov Random Field

[3], subregion based probabilistic model [16], probabilis-

tic stack-ﬂow [4], patch-whole sparse registration [22], and

stereo matching [7] are representative approaches of this

category. However, most local representation based ap-

proaches [3, 4, 7, 16, 22] require manual landmarks to es-

tablish the local patch correspondence between frontal and

non-frontal face images.

Figure 2. An overview of the proposed approach for multi-view automated face recognition.

Approaches for pose invariant feature extraction usu-

ally involves the design of new feature representation and

matching methods. By contrast, pose normalization ap-

proaches which transform face images with different poses

into face images with the same pose, make it possible to

directly use existing feature representation and matching

methods for multi-view face recognition. Since face recog-

nition techniques for frontal or near-frontal poses have been

widely studied, a natural approach is to transform non-

frontal face images into frontal images . Linear and multi-

variate regressions, Active Shape Model (ASM), and Active

Appearance Model (AAM) are representative approaches

[5, 8, 9, 20, 32, 35] that are used to recover frontal face

images from non-frontal views. However, as observed in

[5], the recovered frontal face images can be bad due to the

self-occlusion under large poses. Li et al. [20] avoided the

matching of corrupted facial regions in the recovered frontal

images by generating occlusion masks. Instead of recov-

ering a frontal image and dropping its corrupted facial re-

gions, a different approach is to generate non-frontal views

from frontal images, so that the generated non-frontal views

are able to resemble the poses in testing face images. This

idea was explored by Park et al. [25] using 3D face data.

However, 3D sensing is still expensive and the acquisition

time can be slow. Also, 2D images constitute the legacy

databases; subjects may not be available to provide their 3D

images. Under these circumstances, 3D face models recon-

structed from frontal face images can be the substitutions

for real 3D faces. 3D Morphable Model and 3D generic

elastic model (3D GEM), are typical approaches [12, 28]

used for generating non-frontal images from frontal views.

Despite various studies in pose invariant feature extrac-

tion and pose normalization, most face recognition systems

cannot perform fully automatic multi-view face recognition

due to the requirements of manual landmark annotations

and assumption of known poses. These constraints limit

the application of these systems in real scenarios.

For face images with large pose variations, one of the two eyes is

1.2. Proposed Method

The proposed method presents a new fully automatic

multi-view face recognition method via 3D model based

pose regularization, and extends existing face recogni-

tion systems into multi-view scenarios. Fig. 2 illus-

trates the proposed approach which consists of two main

modules: (i) Pose regularization based on 3D model,

and (ii) Face matching with block based multi-scale LBP

(MLBP) features. Unlike previous pose normalization ap-

proaches, where non-frontal face images were transformed

into frontal images, the proposed 3D model based pose reg-

ularization method generates synthetic target images to re-

semble the pose variations in query images. We should

point out that generating non-frontal views from frontal face

images is much easier and more accurate than recovering

frontal views from non-frontal face images. This is because

it is difﬁcult to automatically detect accurate landmarks un-

der large pose variations which are required to build a 3D

face model. Additionally, since many areas of a face are sig-

niﬁcantly occluded under large pose variations, it is prob-

lematic to recover the frontal view for the occluded facial

regions.

The proposed pose regularization approach is similar to

the novel view rendering based on 3D GEM [28], but the

proposed method uses a simpliﬁed 3D Morphable Model

[6]. Additionally, instead of aligning the synthetic target

images and testing face images based on eye positions

we perform face alignment using Procrustes analysis un-

der large pose variations. Moreover, our face matching

method with blocked MLBP features provides better robust-

ness against face illumination and expression variations

Finally, we show the expansibility of the proposed approach

by replacing our MLBP based face matcher with two state-

of-the-art face matching systems.

invisible. Under these circumstances, face alignment based on two eyes no

longer works.

Following the discussions in [13], additional face preprocessing meth-

ods might be integrated with MLBP to further improve the robustness.

2. 3D Model Based Pose Regularization

As shown in Fig. 2, to perform pose regularization, we

ﬁrst build a 3D model from each frontal target face image,

which is used to generate synthetic target face images. The

pose of a query face image is also estimated so that the gen-

erated synthetic target face images are able to resemble the

pose variation of a query face image.

2.1. 3D Modeling from A Frontal Image

In this work, we utilize a simpliﬁed 3D Morphable

Model [6] without the texture ﬁtting due to its robustness

and computational efﬁciency. We derive our 3D shape

model from the USF Human ID 3-D database [2], which

includes 3D face shape and texture of 100 subjects captured

with a 3D scanner. The original 3D face includes 75,972

vertices, but for efﬁcient computation, we interactively se-

lect 76 vertices based on the 76 keypoints deﬁned in an open

source Active Shape Model (Stasm [23]). Given the 100 3D

faces, the 3D shape of a new face can be represented using

a PCA model

S =

S +



k=1

, (1)

where S is the shape of a new 3D face,

S is the average

3D shape of 100 3D faces from the USF Human ID 3-D

database, W

is the shape eigenvector corresponding to the

k-th largest eigenvalue, and α

is a coefﬁcient for the k-th

shape eigenvector.

A 2D face image is a projection of a 3D face onto a 2D

plane under a set of transformations such as translation, ro-

tation, scaling, and projection. Based on such a face imag-

ing process, the shape of a 3D face can be recovered from

its 2D projection (facial landmarks in a 2D face image) by

minimizing the following cost function [26]

e(P, R, T,s,{α

}

k=1

)=||P

− s · PRTS||

, (2)

where P

is a set of facial landmarks that are detected us-

ing Stasm, P is an orthogonal projection from 3D to 2D,

and R, T,s are the rotation, translation and scaling opera-

tions for the 3D face shape S, respectively.

We directly use the input frontal face image as the tex-

ture corresponding to the frontal 3D facial shape. When

a novel view of the face is generated, we directly map the

frontal face image to a novel view based on Delaunay tri-

angulation of the 2D facial landmarks. Compared with a

statistical face texture model used in 3D Morphable Model,

texture mapping in our simpliﬁed 3D Model is more efﬁ-

cient. Additionally, texture mapping retains detailed and

realistic features that are important for face recognition.

2.2. Generating Synthetic Target Images

The recovered 3D facial shape S from (1) and (2) is with

a frontal pose. By transforming S using different transla-

tion, rotation, scaling, and projection transformations, we

can easily generated novel synthetic target images from a

target face image. Figs. 3 (a) and (b) show two target face

images and their synthetic images under 19 novel views

(±90

◦

with an interval of 10

◦

) using our 3D face model.

However, the synthetic face images should not be gen-

erated arbitrarily. In fact, to reduce the pose difference be-

tween a target face image and a query face image, the syn-

thetic target face images should be generated to resemble

the pose of a query face image. Although in some public-

domain face databases (e.g. FERET) the poses for individ-

ual face images are available, in many other multi-view face

databases, the poses are not known. Under these circum-

stances, automatic pose estimation from arbitrary face im-

ages is necessary in order to perform fully automatic face

recognition. In our approach, we utilize a mixture of tree-

structured part models (MTSPM) [37] to estimate the pose

from a single query face image

. With the MTSPM, we are

also able to detect a set of facial landmarks, which makes it

possible for us to perform alignment between the synthetic

target images and the query image. Based on the pose esti-

mation for a query image, only synthetic target face images

with similar poses will be generated for face matching.

However, generating synthetic target images online

would increase the computational cost of face matching too

much. In our approach, we adopt a more efﬁcient strategy.

Speciﬁcally, after we build a 3D model from each target face

image, 19 synthetic target images are generated for each tar-

get image ofﬂine. Upon the matching of a target image to

a query image, only synthetic images with similar poses to

the query image will be selected for matching

. In our ex-

periments, ﬁve synthetic target images are typically used for

face matching (see the red rectangles in Fig. 3). This strat-

egy makes it possible for our system to perform large scale

face recognition without increasing the computational cost

signiﬁcantly.

2.3. Face Alignment

By building a 3D model and generating synthetic target

face images to resemble the the poses of query images, we

are able to reduce the pose disparity between them. How-

ever, face alignment is still necessary for the following fea-

ture extraction and face matching steps. Holistic face align-

ment based on two eyes (e.g. Inter-Pupil Distance (IPD))

has been a widely used approach for frontal or near-frontal

face images [28]. However, IPD based face alignment be-

comes problematic for non-frontal poses. Under large pose

variations one of the two eyes is often not visible, and even

In [37], the authors discussed the computational cost of MTSPM, and

stated that pose estimation with MTSPM could be performed in real-time.

Since the estimated pose by MTSPM is prone to error, instead of using

only one synthetic image, multiple synthetic images with similar poses will

be used for matching.

Figure 3. Examples of (a) original target face images, (b) syn-

thetic target images generated ofﬂine using 3D face models, and

of synthetic target images based on the pose estimation from query

images.

when both eyes are visible in non-frontal images, IPD based

alignment can lead to an artiﬁcial increase in the overall size

of the face image.

In our approach, we apply Procrustes analysis [11] to

align the synthetic target images and a query image based

on the facial landmarks from a 3D face model and keypoints

that are detected by MTSPM. Although the numbers of key-

points deﬁned in a 3D face model and MTSPM are differ-

ent, the keypoint sequence in each model is ﬁxed. This

makes it possible for us to manually establish the keypoint

correspondence between two models. We have manually

identiﬁed 19 landmarks in MTSPM that have correspond-

ing landmarks in a 3D model. The Procrustes analysis is

performed based on 19 corresponding landmark pairs.

3. Face Matching

Given the aligned synthetic target face images and a

query face image, we extract MLBP [24] features for face

matching. In our experiments, we use MLBP features

which are a concatenation of LBP histograms with 8 neigh-

bors sampled at different radii R = {1; 3; 5; 7}. We ﬁrst di-

vide a holistic face image (256×192) into 768 sub-regions

(8×8 non-overlapped blocks). Then, MLBP features are ex-

tracted from individual blocks and concatenated together to

represent a face.

Given two MLBP histograms x and y with n dimensions

which are extracted from two face images, chi-squared dis-

tance χ

is calculated as a measure of similarity between

two face images:

(x, y)=



i=1

− y

)

+ y

)/2

(3)

where x

and y

are the features for i-th bin. Since we have

multiple synthetic target images, multiple distances are cal-

culated. The ﬁnal distance between a target and a query is

calculated by ﬁnding the minimum of these distances.

The LFW database [14] also includes arbitrary pose variations, but the

4. Experiments and Results

4.1. Databases and Baselines

Two public-domain databases (Color FERET [27] and

PubFig [17]

) and a Mobile dataset collected in our labo-

ratory using mobile phones are used to evaluate the perfor-

mance of the proposed approach for fully automatic multi-

view face recognition. The Color FERET database includes

facial images with multiple poses from 994 subjects. We

use one frontal image (fa) per subject as the target, and

images with 6 non-frontal poses (ql, qr, hl, hr, pl, pr) as

the query. The FERET database has advanced the develop-

ment of multi-view face recognition systems in the past ten

years. However, the FERET database is collected under a

well controlled scenario. For example, the participants are

required to rotate the head and body to pre-designed direc-

tions

, and the background and illumination in face images

are nearly uniform. To replicate the scenarios of face recog-

nition from images or videos captured using mobile devices,

we have collected a Mobile dataset consisting of 112 sub-

jects using iPhone 4S. For each subject, one or two frontal

face images

and around 10 non-frontal face images were

captured at several locations inside a building (see some

example images in Fig. 7). Compared with the FERET

database, the Mobile dataset has less subjects but more chal-

lenging background and illumination variations, as well as

motion blurs due to the movement of the hand. The PubFig

database [17] contains 200 famous personalities collected

from the Internet, where 60 subjects are designed for algo-

rithm development, and the remaining 140 subjects are de-

signed for algorithm evaluation. Since our method is a non-

learning based approach, we directly evaluate our method

using the 140 subjects from the evaluation set. One frontal

face image per subject is used as the target set, and 513 non-

frontal images with arbitrary poses are used as the query set.

The proposed approach is fully automatic in performing

multi-view face recognition. For fair comparison, a state-of-

the-art system (Multi-keypoint descriptor based sparse rep-

resentation (MKD-SRC) [21]) reviewed in Table 1 that is

also fully automatic in multi-view face recognition is used

as the baseline

. Additionally, we also compare the pro-

posed approach with a Commercial-Off-The-Shelf (COTS)

face matching system (FaceVACS [1]).

While most existing systems are evaluated under small

yaw rotations, we evaluate the proposed approach under

three scenarios: (i) Small yaw rotations (Typically two eyes

are visible.); (ii) Large yaw rotations, and (iii) Arbitrary

pose variations. We also investigated the extensibility of

images of many subjects are captured in the same environment.

http://www.itl.nist.gov/iad/humanid/feret/

feret_master.html

Only one frontal face image per subject is used as a target.

We would like to provide comparisons with more existing systems,

but most are unavailable.

Automatic multi-view face recognition via 3D model based pose regularization

Figures

Citations

A Comprehensive Survey on Pose-Invariant Face Recognition

A Comprehensive Survey on Pose-Invariant Face Recognition

Gaussian mixture 3D morphable face model

Multi angle optimal pattern-based deep learning for automatic facial expression recognition

Face Recognition Using a Unified 3D Morphable Model

References

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments

The FERET evaluation methodology for face-recognition algorithms

Generalized procrustes analysis

Face detection, pose estimation, and landmark localization in the wild

Related Papers (5)

Face recognition based on fitting a 3D morphable model

Fully automatic pose-invariant face recognition via 3D pose normalization

Towards Pose Robust Face Recognition

Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments

A morphable model for the synthesis of 3D faces

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Automatic multi-view face recognition via 3d model based pose regularization" ?

Q2. What future works have the authors mentioned in the paper "Automatic multi-view face recognition via 3d model based pose regularization" ?

Q3. How can the authors generate new facial images from a face image?

Q4. What are the two approaches used for generating non-frontal images from frontal views?

Q5. How many vertices are included in the original 3D face?

Q6. How many synthetic images are used for matching?

Q7. What is the effect of pose regularization on face matching?

Q8. How do the authors improve the 3D modeling accuracy?

Q9. How does the proposed approach achieve performance?

Q10. How can the authors reduce the pose disparity between the two images?

Q11. Why is FaceVACS no longer available as a baseline?

Q12. What are the common face verification experiments used by existing multi-view face matching systems?

Q13. What is the difference between the proposed approach and MKD-SRC?