scispace - formally typeset
Open AccessProceedings ArticleDOI

Like Father, Like Son: Facial Expression Dynamics for Kinship Verification

Reads0
Chats0
TLDR
By using features that describe facial dynamics and spatio-temporal appearance over smile expressions, it is shown that it is possible to improve the state of the art in this problem, and it is indeed possible to recognize kinship by resemblance of facial expressions.
Abstract
Kinship verification from facial appearance is a difficult problem. This paper explores the possibility of employing facial expression dynamics in this problem. By using features that describe facial dynamics and spatio-temporal appearance over smile expressions, we show that it is possible to improve the state of the art in this problem, and verify that it is indeed possible to recognize kinship by resemblance of facial expressions. The proposed method is tested on different kin relationships. On the average, 72.89% verification accuracy is achieved on spontaneous smiles.

read more

Content maybe subject to copyright    Report

UvA-DARE is a service provided by the library of the University of Amsterdam (http
s
://dare.uva.nl)
UvA-DARE (Digital Academic Repository)
Like Father, Like Son: Facial Expression Dynamics for Kinship Verification
Dibeklioğlu, H.; Salah, A.A.; Gevers, T.
DOI
10.1109/ICCV.2013.189
Publication date
2013
Document Version
Author accepted manuscript
Published in
2013 IEEE International Conference on Computer Vision
Link to publication
Citation for published version (APA):
Dibeklioğlu, H., Salah, A. A., & Gevers, T. (2013). Like Father, Like Son: Facial Expression
Dynamics for Kinship Verification. In
2013 IEEE International Conference on Computer
Vision: ICCV 2013 : proceedings: 1-8 December 2013, Sydney, NSW, Australia
(pp. 1497-
1504). IEEE Computer Society. https://doi.org/10.1109/ICCV.2013.189
General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)
and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open
content license (like Creative Commons).
Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please
let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material
inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter
to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You
will be contacted as soon as possible.
Download date:10 Aug 2022

Like Father, Like Son: Facial Expression Dynamics for Kinship Verification
Hamdi Dibeklio
˘
glu
1,2
, Albert Ali Salah
3
, and Theo Gevers
1
1
Intelligent Systems Lab Amsterdam, University of Amsterdam, Amsterdam, The Netherlands
2
Pattern Recognition & Bioinformatics Group, Delft University of Technology, Delft, The Netherlands
3
Department of Computer Engineering, Bo
˘
gazic¸i University, Istanbul, Turkey
h.dibeklioglu@tudelft.nl, salah@boun.edu.tr, th.gevers@uva.nl
Abstract
Kinship verification from facial appearance is a difficult
problem. This paper explores the possibility of employing
facial expression dynamics in this problem. By using fea-
tures that describe facial dynamics and spatio-temporal ap-
pearance over smile expressions, we show that it is possible
to improve the state of the art in this problem, and verify that
it is indeed possible to recognize kinship by resemblance of
facial expressions. The proposed method is tested on differ-
ent kin relationships. On t he average, 72.89% verification
accuracy is achieved on spontaneous smiles.
1. Introduction
Automatic detection of kinship from facial appearance
is a difficult problem with several applications, includ-
ing social media analysis [20, 21], finding missing chil-
dren and children adoptions [9], and coaching for imitation
and personification. Kinship is a genetic relationship be-
tween two family members, including parent-child, sibling-
sibling, and grandparent-grandchild relations. Since a ge-
netic test may not always be available for checking kinship,
an unobtrusive and rapid computer vision solution is po-
tentially very useful. This paper proposes such a novel ap-
proach for kinship detection.
Kinship may be verified between people that have dif-
ferent sex and different ages (e.g. father-daughter), which
makes this problem especially challenging. Humans use
an aggregate of different features to judge kinship from
facial images [1]. Furthermore, depending on the age of
the person assessed for kinship, humans use different sets
of features consistent with the expected aging-related form
changes in faces. For example, upper face cues are more
prominently used for kids, as the lower face does not fully
form until adulthood [13]. Automatic kinship detection
methods also employ aggregate sets of features including
color, geometry, and appearance. In Section 2 we summa-
rize the recent related work in this area.
All the methods proposed so far to verify kinship work
with images. In contrast to all published material, in this
paper, we propose a method using facial dynamics to verify
kinship from videos. Our approach intuitively makes sense:
we all know people who do not look like their parents, un-
til they smile. Furthermore, findings of [14] show that the
appearance of spontaneous facial expressions of born-blind
people and their sighted relatives are similar. However, the
resemblance between facial expressions depends not only
on the appearance of the expression but also on its dynam-
ics, as each expression is created by a combination of vol-
untary and involuntary muscle movements. This is the key
insight behind this paper. In this paper, we verify this in-
sight empirically, and show that dynamic features obtained
during facial expressions have discriminatory power for the
kinship verification. This is the first work that uses dynamic
features for kinship detection. By combining dynamic and
spatio-temporal features, we approach the problem of au-
tomatic kinship verification. We use the recently collected
UvA-NEMO Smile Database [3] in our experiments, com-
pare our method with three recent approaches from the lit-
erature [8, 9, 21], and report state-of-the-art results.
2. Related work
In one of the first works on kinship verification, Fang
et al. used the skin, hair and eye color, facial geometry
measures, as well as holistic texture features computed on
texture gradients of the whole face [8]. They have selected
the most discriminative inherited features. Color based fea-
tures performed better than the other features in general,
since a good registration between individual face images
was largely lacking in their approach. In the present study,
we use their approach as a baseline under controlled regis-
tration conditions.
Different feature descriptors are evaluated for the kinship
verification problem in the literature. In [9], eyes, mouth
and nose parts are matched via DAISY descriptors. During
matching, it is not expected to have good matches on all fea-
tures, but on some features. Therefore, typically, the top few

matching features are used for verification. In [21], Gabor-
based Gradient Orientation Pyramid (GGOP) descriptors
are proposed and used to model facial appearance for kin-
ship verification. Support vector machines (SVM) with ra-
dial basis function kernels are used as the classifier. A mean
accuracy of around 70% is reported on 800 image pairs.
This is well within human kinship estimation range. In [11],
the Self Similarity Representation of Weber face (SSRW)
algorithm is proposed. Each face is represented by only its
reflectance and difference of Gaussian filters are used to se-
lect keypoints to represent each face. SVM classifiers with
different kernel functions are contrasted, and a linear kernel
is found to be the most suitable. While SVM seems to be the
classifier of choice for kinship verification, in [12], a metric
learning approach is adopted. Samples that have the kin-
ship relation are pulled close, and other samples are pushed
apart. In this space, the transformation is complemented by
defining a margin for kinship.
The evaluation protocols used for the kinship verifica-
tion problem typically make use of pairs of photographs,
where each pair is either a positive sample (i.e. kin) or
a negative one. In [9], 100 face pairs with kinship and
100 pairs without are selected from family photos. There
was no decomposition of results into specific kinship cate-
gories. In [8], [21], and [20] photos of celebrities have been
downloaded from the Internet. In these studies, as well as
in [12], four kinship relations (Father-Son, Father-Daughter,
Mother-Son and Mother-Daughter) are analyzed separately.
The largest database reported in the literature so far is the
KinFaceW-II image database, with 250 pairs of kinship re-
lations for each of these four categories.
In [14], Peleg et al. analyze the spontaneous facial ex-
pressions of born-blind people and their sighted relatives.
They show that such expressions carry a unique family sig-
nature. Occurrences of a set of facial movements are used
to classify families of blind subjects. Results show 64%
correct classification on the average, with 60% in joy ex-
pressions. These results justify our motivation. Although
[14] has focused on the facial movements for the task, they
did not analyze the dynamics of expressions in terms of du-
ration, intensity, speed, and acceleration, which is an empir-
ical contribution of this paper.
3. Method
In this paper, we propose to combine spatio-temporal fa-
cial features and facial expression dynamics for the kinship
verification. To this end, videos of enjoyment smiles are
used. Our system analyzes the entire duration of a smile,
starting from a moderately frontal and neutral face, the un-
folding of the smile, and the return to the neutral face. Un-
like other approaches proposed in the literature, our method
works with videos of faces, rather than images. This is the
first approach using videos for kinship verification.
͹
ͺ
ͻ ͳͲ
ͳͳ
ͳʹ
ͳ͵
ͳͶ
ͳͷ
ͳ͸
ͳ͹
ͳ
ʹ
͵
Ͷ
ͷ
͸
܍ܡ܍܊ܚܗܟǡ࢟
܍ܡ܍ܔܑ܌ǡ࢟
܋ܐ܍܍ܓǡ࢟
ܔܑܘǡ࢟
܋ܐ܍܍ܓǡ࢞
ܔܑܘǡ࢞
(a) (b)
Figure 1. (a) The facial feature points used in this study with their
indices, (b) the 3D mesh model and visualization of the ampli-
tude signals, which are defined as the mean of left/right amplitude
signals on the face. For simplicity, visualizations are shown on a
single side of the face
We summarize the proposed method here. Our approach
starts with face detection in t he first frame and the localiza-
tion of 17 facial landmarks, which are subsequently tracked
during the rest of the video. Using the tracked landmarks,
displacement signals of eyebrows, eyelids, cheeks, and lip
corners are computed. Afterwards, the mean displacement
signal of the lip corners is analyzed and the three main tem-
poral phases (i.e. onset, apex, and offset, respectively) of
the smile are estimated. Then, facial expression dynamics
on eyebrows, eyelids, cheeks, and lip corners are extracted
from each phase separately. To describe the change in ap-
pearance between the neutral and the expressive face (i.e.
the apex of the expression), t emporal Completed Local Bi-
nary Pattern (CLBP) descriptors are computed from the eye,
cheek, and lip regions. After a feature selection step, the
most informative dynamic features are identified and com-
bined with temporal CLBP features. Finally, resulting fea-
tures are classified using SVMs. In the rest of the section we
provide more detailed information for each of these steps.
3.1. Landmark detection and tracking
Both the correct detection and accurate tracking of facial
landmarks are crucial for normalizing and aligning faces,
and for extracting consistent dynamic features. In the first
frame of the input video, 17 facial landmarks (i.e. centers
of eyebrows, eyebrow corners, eye corners, centers of upper
eyelids, cheek centers, nose tip, and lip corners) are detected
using a recent landmarking approach [4] (see Fig. 1(a)).
This method models Gabor wavelet features of a neighbor-
hood of the landmarks using incremental mixtures of factor
analyzers and enables a shape prior to ensure the integrity of
the landmark constellation. It follows a coarse-to-fine strat-
egy; landmarks are initially detected on a coarse level and
then fine-tuned for higher resolution. Then, these points are
tracked by a piecewise B
´
ezier volume deformation (PBVD)
tracker [18] during the rest of the video.

Initially, the PBVD tracker warps a generic 3D mesh
model of the face (see Fig. 1(b)) to fit the facial landmarks
in the first frame of the image sequence. 16 surface patches
form the generic face model. These patches are embedded
in B
´
ezier volumes to guarantee the continuity and smooth-
ness of the model. Points in the B
´
ezier volume, x(u, v, w)
can be defined as:
x(u, v, w)=
n
i=0
m
j=0
l
k=0
b
i,j,k
B
n
i
(u)B
m
j
(v)B
l
k
(w),(1)
where the control points denoted with b
i,j,k
and mesh vari-
ables 0 < {u, v, w} < 1 control the shape of the volume.
B
n
i
(u) denotes a Bernstein polynomial, and can be written
as:
B
n
i
(u)=
n
i
u
i
(1 u)
ni
.(2)
Once the face model is fitted, the 3D motion of the head,
as well as individual motions of facial landmarks can be
tracked based on the movements of mesh points. 2D move-
ments on the face (estimated by template matching between
frames, at different resolutions) are modeled as a projection
of the 3D movement onto the image plane. Then, the 3D
movement is calculated using projective motion of several
points.
3.2. Registration
Faces in each frame need to be aligned before the feature
extraction step. To this end, 3D pose of the faces are es-
timated and normalized using the tracked 3D landmarks
i
(see Fig. 1(a)). Since a plane can be constructed by three
non-collinear points, three stable landmarks (eye centers
and nose tip) are used to define a normalizing plane P.Eye
centers c
1
=
7
+
9
2
and c
1
=
10
+
12
2
are the middle points
between the inner and outer eye corners. Then, angles be-
tween the positive normal vector P and unit vectors on X
(horizontal), Y (vertical), and Z (perpendicular) axes give
the relative head pose. Computed angles (θ
z
) and (θ
y
)give
the exact roll and yaw angles of the face with respect to the
camera, respectively. Nevertheless, the estimated pitch (θ
x
)
angle is a subject-dependent measure, since it depends on
the constellation of the eye corners and the nose tip. If the
face in the first frame is assumed as approximately frontal,
then the actual pitch angles (θ
x
) can be calculated by sub-
tracting the initial value. After estimating the pose of the
head, tracked landmarks are normalized with respect to ro-
tation, scale, and translation. Aligned points
i
can be de-
fined as follows:
i
=
i
c
1
+ c
2
2
R(θ
x
, θ
y
, θ
z
)
100
ρ(c1,c2)
,(3)
R(θ
x
y
z
)=R
x
(θ
x
)R
y
(θ
y
)R
z
(θ
z
),(4)
and R
x
, R
y
, and R
z
are the 3D rotation matrices for the
given angles. ρ denotes the Euclidean distance between the
given points. On the normalized face, the middle point be-
tween eye centers is located at the origin and the inter-ocular
distance (distance between eye centers) is set to 100 pixels.
Since the normalized face is approximately frontal with re-
spect to the camera, we ignore the depth (Z) values of the
normalized feature points
i
, and denote them as l
i
.
3.3. Temporal segmentation
In the proposed method, dynamic and spatio-temporal
features are extracted from videos of smiling persons. We
choose to use the smile expression, since it is the most fre-
quently performed facial expression, for showing several
different meanings such as enjoyment, politeness, fear, em-
barrassment, etc. [5]. A smile can be defined as the upward
movement of the lip corners, which corresponds to Ac-
tion Unit 12 in the facial action coding system (FACS) [6].
Anatomically, the zygomatic major muscle contracts and
raises the corners of the lips during a smile [7].
Most facial expressions are composed of three non-
overlapping phases, namely: the onset, apex, and offset, re-
spectively. Onset is the initial phase of a facial expression
and it defines the duration from neutral to expressive state.
Apex phase is the stable peak period (may also be very
short) of the expression between onset and offset. Likewise,
offset is the final phase from expressive to neutral state. Fol-
lowing the normalization step, we detect these three tempo-
ral phases of the smiles.
For this purpose, the amplitude signal of the smile S is
estimated as the mean distance (Euclidean) of the lip cor-
ners to the lip center during the smile. Then, the com-
puted amplitude signal is normalized by the length of the
lip. Since the faces are normalized, center and length of the
lip is calculated only once in the first frame. Afterwards,
the longest continuous increase in S is defined as the onset
phase. Similarly, the offset phase is detected as the longest
continuous decrease in S. The phase between the last frame
of the onset and the first frame of the offset defines the apex.
3.4. Features
We extract two t ypes of features from the faces. What we
call dynamic features are based on the movement of land-
mark points in the registered faces over the expression dura-
tion. These do not contain appearance information. In con-
trast, what we call spatio-temporal features denotes appear-
ance features obtained from multiple frames jointly, thus
contain both spatial and temporal appearance information.
These features are explained in detail next.

3.4.1 Extraction of dynamic features
To describe the smile dynamics, we use horizontal and verti-
cal movements of tracked landmarks and extract a set of dy-
namic features separately from different face regions. Verti-
cal and horizontal amplitude signals are computed from the
movements of eyebrows, eyelids, cheeks, and lip corners.
The (normalized) eye aperture D
eyelid
, and displacements of
eyebrow D
eyebrow
, cheek D
cheek
and lip corner D
lip
, are esti-
mated as follows:
D
eyelid
(t)=
l
t
7
+l
t
9
2
l
t
8
2ρ(l
t
7
,l
t
9
)
+
l
t
10
+l
t
12
2
l
t
11
2ρ(l
t
10
,l
t
12
)
,(5)
D
eyebrow
(t)=
l
1
1
+l
1
2
+l
1
3
3
l
t
2
2ρ(l
1
1
,l
1
3
)
+
l
1
4
+l
1
5
+l
1
6
3
l
t
5
2ρ(l
1
4
,l
1
6
)
,(6)
D
cheek
(t)=
l
1
13
+l
1
14
2
l
t
13
+
l
1
13
+l
1
14
2
l
t
14
2ρ(l
1
13
,l
1
14
)
,(7)
D
lip
(t)=
l
1
16
+l
1
17
2
l
t
16
+
l
1
16
+l
1
17
2
l
t
17
2ρ(l
1
16
,l
1
17
)
,(8)
where l
t
i
denotes the 2D location of the i
th
point in frame
t. Then, vertical (y) components of D
eyebrow
, D
eyelid
, D
cheek
,
D
lip
, and horizontal (x) components of D
cheek
, D
lip
are ex-
tracted (see Fig. 1(b)). Extracted sequences are smoothed
by a 4253H-twice method [19]. These estimates are here-
after referred to as amplitude signals. Finally, amplitude
signals are split into three phases as onset, apex, and offset,
which have been previously defined using the smile ampli-
tude S.
Proposed dynamic features and their definitions are
given in Table 1. It is important to note that the defined fea-
tures are extracted separately from each phase of the smile.
As a result, we obtain three feature sets for each of the
six amplitude signals (see Fig. 1(b)). For a more detailed
analysis, corresponding speed V(t)=
dD
dt
and acceleration
A(t)=
d
2
D
dt
2
signals are computed in addition to amplitudes.
In Table 1, signals marked with superindex (
+
) and (
)
denote the increasing and decreasing segments of the re-
lated signal, respectively. For example, D
+
pools the in-
creasing segments in D. η defines the length (number of
frames) of a given signal, and ω is the frame rate of the
video. For each phase of t he amplitude signal, three 15-
dimensional feature vectors are generated by concatenating
these features. Combination of all the feature vectors forms
the joint dynamic feature vector. In some cases, features
cannot be calculated. For example, if we extract features
from the amplitude signal of the lip corners D
lip
using the
onset phase, then decreasing segments will be an empty set
Table 1. Definitions of the extracted features.
Feature Definition
Duration:
η(D
+
)
ω
,
η(D
)
ω
,
η(D)
ω
Duration Ratio:
η(D
+
)
η(D)
,
η(D
)
η(D)
Maximum Amplitude: max(D )
Mean Amplitude:
D
η(D)
Maximum Speed:
max(V
+
) , max(|V
|)
Mean Speed:
V
+
η(V
+
)
,
|V
|
η(V
)
Maximum Acceleration:
max(A
+
) , max(|A
|)
Mean Acceleration:
A
+
η(A
+
)
,
|A
|
η(A
)
(η (D
)=0). For such exceptions, all the features describ-
ing the related segments are set to zero. This is done to have
a generic feature vector format which has the same features
for different phases of each amplitude signal.
3.4.2 Extraction of spatio-temporal features
To describe the temporal changes in the appearance of
faces, we employ a recently proposed spatio-temporal lo-
cal texture descriptor, namely, the Completed Local Binary
Patterns from Three Orthogonal Planes (CLBP-TOP) [16].
CLBP-TOP is a straightforward extension of Completed
Local Binary Patterns (CLBP) operator [10] to describe dy-
namic textures (image sequences), which is calculated by
extracting CLBP histograms from Three Orthogonal Planes
XY, XT, and YT, individually, and by concatenating them
as a single feature vector. Here, X and Y refer to the spa-
tial extent of the image, and T denotes time. CLBP-TOP
regards the face sequence as a volume, and the neighbor-
hood of each pixel is defined in a three dimensional space,
whereas CLBP uses only X and Y dimensions of a single
image. Difference of the CLBP from the original LBP op-
erator is that in addition to the sign of the local difference, it
includes the center pixel of the local neighborhood and the
magnitude of the difference.
We extract CLBP-TOP features from the previously de-
tected smile onsets, since the onset phase shows the change
from neutral to expressive face. On the selected frames,
faces are normalized with respect to roll rotation using
the eye centers c
1
and c
2
. Then, each face is resized
and cropped as shown in Fig. 2(a). For scaling and nor-
malization, the inter-ocular distance d
io
is set to 50 pix-
els. Resulting normalized face images have a resolution of
125 × 100 pixels. To provide more comparable onset du-

Citations
More filters
Journal ArticleDOI

Discriminative Deep Metric Learning for Face and Kinship Verification

TL;DR: A discriminative deep multi-metric learning method to jointly learn multiple neural networks, under which the correlation of different features of each sample is maximized, and the distance of each positive pair is reduced and that of each negative pair is enlarged.
Journal ArticleDOI

Discriminative Multimetric Learning for Kinship Verification

TL;DR: Experimental results show the effectiveness of the proposed discriminative multimetric learning method for kinship verification via facial image analysis over the existing single-metric and multimetricLearning methods.
Journal ArticleDOI

Prototype-Based Discriminative Feature Learning for Kinship Verification

TL;DR: Experimental results on four publicly available kinship datasets show the superior performance of the proposed PDFL methods over both the state-of-the-art kinship verification methods and human ability in the kinships verification task.
Journal ArticleDOI

Combining Facial Dynamics With Appearance for Age Estimation

TL;DR: This paper proposes a method to extract and use dynamic features for age estimation, using a person's smile, and shows that using spontaneity information reduces the mean absolute error by up to 21%, advancing the state of the art for facial age estimation.
Book ChapterDOI

Modeling Stylized Character Expressions via Deep Learning

TL;DR: This work first train two Convolutional Neural Networks to recognize the expression of humans and stylized characters independently and utilizes a transfer learning technique to learn the mapping from humans to characters to create a shared embedding feature space.
References
More filters
Journal ArticleDOI

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

TL;DR: In this article, the maximal statistical dependency criterion based on mutual information (mRMR) was proposed to select good features according to the maximal dependency condition. But the problem of feature selection is not solved by directly implementing mRMR.

Feature selection based on mutual information: criteria ofmax-dependency, max-relevance, and min-redundancy

TL;DR: This work derives an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection, and presents a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers).
Journal ArticleDOI

A Completed Modeling of Local Binary Pattern Operator for Texture Classification

TL;DR: It is shown that CLBP_S preserves more information of the local structure thanCLBP_M, which explains why the simple LBP operator can extract the texture features reasonably well and can be made for rotation invariant texture classification.
Book

Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage

TL;DR: Ekman et al. as discussed by the authors described how lies vary in form and can differ from other types of misinformation, as well as how a person's body language, voice, and facial expressions can give away a lie but still escape detection.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What have the authors contributed in "Like father, like son: facial expression dynamics for kinship verification" ?

This paper explores the possibility of employing facial expression dynamics in this problem. By using features that describe facial dynamics and spatio-temporal appearance over smile expressions, the authors show that it is possible to improve the state of the art in this problem, and verify that it is indeed possible to recognize kinship by resemblance of facial expressions. 

a weighted SUM rule is used to fuse the computed posterior probabilities for the target classes of these classifiers. 

Since a plane can be constructed by three non-collinear points, three stable landmarks (eye centers and nose tip) are used to define a normalizing plane P . 

Frames in the smile onset portion of the videos (from neutral to expressive face) are split into X = 8×Y = 8×T = 3 non-overlapping blocks, and CLBP-TOP features are extracted from these blocks using three neighborhood pixels. 

The evaluation protocols used for the kinship verification problem typically make use of pairs of photographs, where each pair is either a positive sample (i.e. kin) or a negative one. 

By selecting the spontaneous and posed enjoyment smiles of the subjects who have kin relationships, the authors construct a kinship database which has 95 kin relations from 152 subjects. 

By using different video combinations of each kin relation, 228 pairs of spontaneous and 287 pairs of posed smile videos are included in the database. 

Since a genetic test may not always be available for checking kinship, an unobtrusive and rapid computer vision solution is potentially very useful. 

R(−θ′x,−θy,−θz) 100ρ(c1, c2) , (3)R(θx, θy, θz) = Rx(θx)Ry(θy)Rz(θz), (4)and Rx, Ry , and Rz are the 3D rotation matrices for the given angles. 

This can be explained by the effect of age and gender on facial dynamics, since group specific training leads to dynamic features with better accuracy. 

Each face is represented by only its reflectance and difference of Gaussian filters are used to select keypoints to represent each face. 

To describe the temporal changes in the appearance of faces, the authors employ a recently proposed spatio-temporal local texture descriptor, namely, the Completed Local Binary Patterns from Three Orthogonal Planes (CLBP-TOP) [16].