scispace - formally typeset
Open AccessProceedings ArticleDOI

Evaluating AAM fitting methods for facial expression recognition

TLDR
Overall, the best facial expression recognition results were obtained by using the Iterative Error Bound Minimisation method, which consistently resulted in accurate face model alignment and facial expression Recognition even when the initial face detection used to initialise the fitting procedure was poor.
Abstract
The human face is a rich source of information for the viewer and facial expressions are a major component in judging a person's affective state, intention and personality. Facial expressions are an important part of human-human interaction and have the potential to play an equally important part in human-computer interaction. This paper evaluates various Active Appearance Model (AAM) fitting methods, including both the original formulation as well as several state-of-the-art methods, for the task of automatic facial expression recognition. The AAM is a powerful statistical model for modelling and registering deformable objects. The results of the fitting process are used in a facial expression recognition task using a region-based intermediate representation related to Action Units, with the expression classification task realised using a Support Vector Machine. Experiments are performed for both person-dependent and person-independent setups. Overall, the best facial expression recognition results were obtained by using the Iterative Error Bound Minimisation method, which consistently resulted in accurate face model alignment and facial expression recognition even when the initial face detection used to initialise the fitting procedure was poor.

read more

Content maybe subject to copyright    Report

Evaluating AAM Fitting Methods for Facial Expression Recognition
Akshay Asthana
1
Jason Saragih
2
Michael Wagner
3
Roland Goecke
1,3
1
RSISE, Australian National University, Australia
2
Robotics Institute, Carnegie Mellon University, USA
3
Faculty of Information Sciences and Engineering, University of Canberra, Australia
aasthana@rsise.anu.edu.au , jsaragih@andrew.cmu.edu , Michael.Wagner@canberra.edu.au , roland.goecke@ieee.org
Abstract
The human face is a rich source of information for the
viewer and facial expressions are a major component in
judging a person’s affective state, intention and personality.
Facial expressions are an important part of human-human
interaction and have the potential to play an equally im-
portant part in human-computer interaction. This paper
evaluates various Active Appearance Model (AAM) fitting
methods, including both the original formulation as well as
several state-of-the-art methods, for the task of automatic
facial expression recognition. The AAM is a powerful sta-
tistical model for modelling and registering deformable ob-
jects. The results of the fitting process are used in a facial
expression recognition task using a region-based interme-
diate representation related to Action Units, with the ex-
pression classification task realised using a Support Vec-
tor Machine. Experiments are performed for both person-
dependent and person-independent setups. Overall, the best
facial expression recognition results were obtained by us-
ing the Iterative Error Bound Minimisation method, which
consistently resulted in accurate face model alignment and
facial expression recognition even when the initial face de-
tection used to initialise the fitting procedure was poor.
1. Introduction
Facial expressions are an important component of in-
terpersonal communication. Despite their non-verbal na-
ture, they convey a lot of information about the person and
the person’s affective state, intention and personality. Par-
ticularly for the recognition of the affective state, humans
rely heavily on analysing facial expressions [10, 18]. Fa-
cial expressions also support verbal communication due to
their complementary nature to the acoustic side of the spo-
ken words. Unlike humans, current computer systems can
hardly recognise the affective state of a human user. In fact,
even the problem of recognising facial expressions is still
Figure 1: System Overview - Facial Expression Recogniser
largely unsolved although some progress has been made in
recent years (Section 2). Providing human-machine inter-
faces with the capability of recognising facial expressions
(and subsequently the affective state) of a user would allow
computer systems to monitor a person’s state and to react in
a suitable way.
While much progress has been made on the issue of
classification of facial expressions, for example via Support
Vector Machines (SVM) [13], one of the open questions is
on the problem of how to extract useful features from the
face in an image or video frame. In this paper, we compare
the performance of six Active Appearance Model (AAM)
fitting methods for the task of automatic facial expression
recognition, which serves two purposes. Firstly, it gives the
reader a practical guide to the usefulness of particular AAM
fitting methods for facial expression recognition in realistic
conditions. Secondly, it provides an (indirect) solution to
the problem of objectively evaluating the performance of
AAM fitting methods.
1
The methods were tested on facial
expression images from the Cohn-Kanade database [14] in
both a person-dependent (PDFER) and person-independent
1
While it is possible and common practice to evaluate the performance
via a manually obtained ground truth, the required manual annotation is in
itself error-prone and subjective.
978-1-4244-4799-2/09/$25.00 ©2009 IEEE

(PIFER) setup. The system uses a real-time facial feature
tracker based on the AAM to extract the shape vector in an
image. This shape vector is further processed and a compact
feature vector, representing the facial features, is obtained.
This feature vector, along with the label of the expression
associated with it, is used for training an expression classi-
fier that utilises the SVM for classification into six univer-
sal facial expressions as well as a neutral expression. After
the system has been trained for recognising a set of expres-
sions, it accepts an image as input, followed by the same
process of tracking facial features and extracting a feature
vector. The SVM-based expression recogniser then uses
this extracted feature vector to classify the expression as
Neutral or one of six universal expressions (Anger, Disgust,
Fear, Joy, Sorrow, Surprise). This procedure is illustrated
in Figure 1. The remainder of this paper is structured as fol-
lows. Section 2 provides an overview of related work. The
overview of AAM and various AAM fitting methods com-
pared in this paper is given in Section 3. Our face region-
based intermediate representation is presented in Section 4.
The SVM classifier employed in FER experiments is de-
scribed in Section 5. Section 6 details the experiments and
discusses the results. Finally, Section 7 provides the con-
clusions and an outlook on future work.
2. Related Work
For many decades, developing a fast, accurate and robust
automated system for recognising a face and facial expres-
sions has been a goal in computer vision. In [11], the Facial
Action Coding System (FACS) that defines the human face
by a number of Action Units (AUs) and represents the fa-
cial expressions by different combinations of these AUs was
proposed. Since the classification into AUs is based on fa-
cial anatomy, practically all expressions can be represented
by this coding scheme. Hence, FACS is by far the most
widely used method for facial expression recognition. How-
ever, one of the inherent difficulties with the FACS coding
scheme is that it requires a highly trained human expert to
manually score each frame of a video. As well as being
an extremely tedious process, manual FACS scoring suf-
fers from inconsistencies between scorers. In [8], a com-
prehensive comparative study of various approaches for an
automatic facial action recognition system was presented,
where techniques such as optical flow analysis, local fea-
ture analysis, Gabor wavelets, principal component anal-
ysis (PCA), linear discriminant analysis (LDA), and inde-
pendent component analysis (ICA) were employed. More
recently in [3], various machine learning techniques were
compared, coupled with appearance based features for fa-
cial expression and action recognition. However, one of the
major drawbacks for all these is that they ignore the spatial
arrangement and motion of the anatomical features, such as
eyes, mouth, eyebrows and chin. As a result, these methods
are highly susceptible to changes in pose, illumination and
other sources of variation regularly encountered in a real
world environment [16].
In recent years, a powerful technique based on de-
formable models has become popular for non-rigid object
tracking and has started to make its way into the field of
real-time face and facial expression recognition. In this de-
formable model based approach, the non-rigid shape and vi-
sual texture (intensity and color) of an object are statistically
modelled using a low dimensional representation obtained
by applying PCA to a set of labelled training data. After
these models have been created, they can be parametrised
to fit a new image of the object, which might vary in shape
or texture or both. One of the deformable model based ap-
proaches, known as the Active Appearance Model [9], has
become very popular for tracking non-rigid objects such as
the human face.
The utility of AAM tracking in the context of real-time
analysis of facial expressions has previously been demon-
strated in a number of works. In [19], the authors present an
approach for gender-based expression recognition based on
AAM tracking followed by the classification via SVM into
4 basic expressions (happy, sad, angry and neutral). The ex-
periments were performed on still images and a maximum
accuracy of 79.9% for gender based expression classifica-
tion and 76.4% for gender independent expression classifi-
cation was reported. In [16], the authors compare 3 differ-
ent feature representations and subsequently utilise SVM
for the classification of different expressions and AUs. The
authors also state that the Nearest Neighbor (NN) classi-
fier based on PCA or LDA can be used, although no im-
provement in the performance was reported when using
NN instead of SVM. The three types of evaluated features
were S-PTS (similarity normalised shape), S-APP (similar-
ity normalised appearance) and C-APP (canonical appear-
ance). (S-PTS)+(C-APP) features performed better than S-
PTS and S-APP. (S-PTS)+(C-APP) features are obtained
by concatenating the similarity normalised shape and the
shape normalised (canonical) appearance. That work vali-
dates the assertion that features based on AAMs can be used
for accurate expression recognition. However, the authors
used a person-dependent AAM tracker to extract the feature
vectors for the experiment and also reported a major diffi-
culty in tackling the problem of subject head movement.
In [7], a real-time approach for expression recognition in
video by utilising AAM tracking and spectral graph clus-
tering was presented. However, the tracking was limited
to the mouth region only. In contrast, a template-based fa-
cial feature tracker was used in [17], followed by a SVM-
based expression classification. An accuracy of 71.8% for
person-independent expression recognition and 87.5% for
person-dependent expression recognition was reported. In
other work, different intermediate representations of AAM

tracked shape and appearance vectors for training the ex-
pression classifiers have been investigated (see [6], for ex-
ample) and the application of rough set theory for AAM
based expression recognition has also been pursued [4].
3. Active Appearance Model (AAM)
The AAM is a powerful generative class of methods for
modelling and registering deformable visual objects which
has been very popular in recent years due to its excellent
performance. The power of this generative model stems
both from its compact representation of appearance (com-
prising shape and texture) as well as its rapid fitting to un-
seen images.
For constructing the AAM [9], each annotated training
image is aligned into a common coordinate frame by Pro-
crustes analysis. The modes of shape variation are obtained
by applying PCA to the set of aligned shapes. The texture
variation is similarly modelled by applying PCA to a set of
images, warped to a canonical frame defined using the mean
shape of the aligned shapes. As a result, a parametrised
model is formed that is capable of representing large varia-
tion in shape and texture by a small set of parameters.
The process of finding the model parameters p that best
fit the given image I is known as AAM fitting and is per-
formed by updating the model parameters p iteratively:
2
p = U (; p) F (I; p) where p p + p (1)
where, F is a feature extraction function that represents im-
age I at its current parameter settings, p are the updates
to be applied to the current parameters and U is the vec-
tor valued update function. The accuracy of prediction for
updating the parameter p generally depends on a good cou-
pling between F and U . The AAM fitting algorithms can
be broadly classified into two categories [21]:
Generative fitting deals with the problem of fitting as min-
imisation/maximisation of some measure of fitness be-
tween the model’s texture and the warped image re-
gion. This approach is attractive as it has a clear in-
tuitive basis for its formulation. However, it suffers
from a number of drawbacks, such as limited general-
isability as well as difficulties in attaining rapid fitting.
Some examples of generative fitting, compared in the
paper, are FJ, POIC, SIC and RIC (see below).
Discriminative Fitting deals with the problem of fitting by
directly learning a fixed relationship between the fea-
tures and the parameter updates, by using the features
extracted from parameter settings which are perturbed
2
Notation: Vectors are written in lowercase bold. Functions are writ-
ten in upper case calligraphic font with denoting their composition, for
example: A(B(x); y) = A(; y) B(x).
from their optimal setting in each training image. Al-
though this approach lacks the elegance of the genera-
tive approach, it has been shown to overcome some of
the limitations of its generative counterpart. Some dis-
criminative AAM methods, compared in the paper, are
IEBM and HFBID (see below). Other methods also
exist, e.g. [15].
3.1. Fixed Jacobian Method (FJ)
Proposed in [5], it is one of the original algorithms de-
veloped for AAM fitting and deals with the problem of fit-
ting as the minimisation of the least squares fit between the
model’s texture and the warped image region, where it is as-
sumed that the Jacobian of the error is fixed for all settings
of the model parameters. This enables a linear update model
to be pre-computed through the pseudo-inverse of the fixed
Jacobian, estimated offline through numerical differentia-
tion, averaging over the training set. Since the assumption
of a fixed Jacobian holds only loosely, the method requires
the use of an adjustable step size, where at each iteration the
predicted parameter updates are halved until a reduction in
the texture difference between the model and the cropped
image is attained. The result is a reasonably efficient and
accurate fitting procedure. However, if the object exhibits
large variation in shape and texture, its performance dete-
riorates because of the assumption of a fixed linear update
model which can be too restrictive.
3.2. Project-out Inverse Compositional Method
(POIC)
Proposed in [2], it is one of the fastest AAM fitting al-
gorithms to date and belongs to the class of fitting methods
using the inverse compositional approach, where the roles
of image and model in the error function are reversed. In
this adaptation of inverse-compositional image alignment,
the fitness function, which measures the difference between
the model’s appearance and the cropped image region, is
grouped into two components: one which lies within the
subspace of appearance deformations and another which is
orthogonal to it. This procedure requires optimisation over
the shape parameters only, assuming the optimal choice (in
a maximum likelihood sense) of the texture parameters is
chosen at each iteration. Since the minimisation of the
fitness function depends only on the subspace orthogonal
to the texture variation, a fixed linear update model can
be computed analytically over the shape parameters only.
This better justifies the assumption of a linear update model
as compared to FJ and is also extremely fast. However,
this approach does not work well when the object exhibits
large variation in shape and texture with respect to the mean
shape and texture, limiting its usage to relatively simple ap-
plications (e.g. PDFER) [12].

3.3. Simultaneous Inverse Compositional Method
(SIC)
Proposed in [1], it is an another adaptation of inverse
compositional image alignment for AAM fitting that ad-
dresses the problem of the significant shape and texture
variability by finding the optimal shape and texture parame-
ters simultaneously. Although the derivative of the warping
function can be pre-computed, the linear update model has
to be recomputed at each iteration as it depends on the cur-
rent appearance parameters. However, rather than recom-
puting the linear update model at every iteration using the
current estimate of appearance parameters, it can be approx-
imated by evaluating it at the mean appearance parameter
values, allowing the update model to be pre-computed, thus
dramatically improving the computational efficiency. Also,
since the work presented in this paper deals with the prob-
lem of AAM fitting, building the linear update model based
on the mean appearance parameters, which on average are
closer to the true parameter values, is an optimal choice.
3.4. Robust Inverse Compositional Method (RIC)
In [1], the idea of the inverse compositional method for
AAM fitting is extended further by using an M-estimator
(robust penaliser) instead of the least squares fitting crite-
rion, resulting in an iteratively reweighed least squares fit-
ting scheme. This method requires the normalisation of the
mean subtracted cropped image (error image) w.r.t. the di-
rection of appearance variability [1]. For this purpose, the
error image is first projected onto the subspace of appear-
ance variability. This projected error image is used for gen-
erating the model’s appearance that is later subtracted from
the error image to get the measure of the fitness function.
An assumption of spatial coherence of the outliers helps in
reducing the computational complexity to a certain extent,
but it is still slower than the efficient approximation of SIC.
3.5. Iterative Error Bound Minimisation Methods
(IEBM)
A novel linear update scheme is proposed in [20], which
is based on reducing the error bounds over the data, rather
than the typical least squares criterion. It uses the optimal-
ity property of Support Vector Regression (SVR), i.e. each
sample is adjusted to achieve its respective parameter set-
ting where the error is minimised, giving priority to those
samples that produce maximum error. Combined with an
iterative scheme, all samples in the training set are guided
towards their solution, placing a higher priority on samples
with large errors. IEBM focuses on building the update
model by utilising the information from various combina-
tions of parameter settings. Since this update model is learnt
offline, the method is extremely efficient.
Action Units Regional Description Feature IDs
AU1, AU2, AU4. Intra-feature movement between
Eyebrows, Inter-feature move-
ment between Eyebrow and Eyes
V1, V2, V3,
H1
AU5, AU7, AU41,
AU42, AU43, AU44,
AU45, AU46.
Intra-feature movement of the
Eyes
V6, V7, H2,
H3
AU6 Intra-feature movement of the
Cheeks
H8
AU9 Intra-feature movement of the
Nose
V8, H4
AU10 Inter-feature movement between
Nose and Mouth
V13
AU12, AU15, AU16,
AU18, AU20, AU22,
AU23, AU24, AU25,
AU28.
Intra-feature movement of the
Mouth
V9, V10,
V11, V12,
H5, H6, H7
AU27 Inter-feature movement between
Nose and Mouth
V5
AU17, AU26 Inter-feature movement between
Nose and Chin
V4
Table 1: Features Description
3.6. Haar-like Feature Based Iterative-
Discriminative Method (HFBID)
The IEBM method was extended further in [21] by using
a nonlinear update model for AAM fitting that uses multi-
modal weak learners, based on Haar-like features, which
allow efficient online evaluation using the integral image.
To avoid overlearning, the boosting procedure is embedded
into an iterative framework with an intermediate resampling
step. This process affords well regularised update models
through limiting the ensemble size and indirectly increas-
ing the sample size. As with IEBM, this method was shown
to exhibit high fitting speed and accuracy. However, its per-
formance was not compared with IEBM. Implementation
details are described in [21].
4. Interpreting Action Units (AUs)
The design of AUs is anatomically motivated and makes
FACS highly adaptable and capable of representing almost
every expression [11]. However, since the scope of most of
the vision based expression recognition systems is based on
changes in appearance, it might not be possible to extract
the information needed to indicate the activation of AUs
based on anatomical facts. For example, it is the contrac-
tion of the corrugator muscle that activates AU4 and the
contraction of the frontalis muscle that activates AU1 [22].
To overcome this problem, we group the AUs together on
a regional basis (Table 1) and use a set of rules to extract
the information within the region, regarding the appear-
ance changes, that can be utilised for expression recognition
(Figure 2). V
η
and H
η
are the normalisation distances used
to normalise the feature vector with respect to the varying

Figure 2: Features for regional scheme
size of faces of different people. The advantage of using
such a scheme is that we no longer require information on
individual muscles that activate an AU - for example AU25,
which is activated by 3 different muscles [11] -, but that we
nevertheless extract the necessary information for the ap-
pearance changes from the region concerned.
5. Classification of Facial Expressions
Inherently, the support vector machine (SVM) is a two-
class problem classifier. Since our implementation of FER
is a 7-class problem (Neutral and 6 basic emotions [10]),
the ‘one-against-one’ method [13] is adopted to construct
a multi-class SVM to handle this problem. Another issue
concerned with designing an SVM based classifier is the
choice of the kernel. We used the Linear kernel for PDFER
and Radial Basis Function (RBF) kernel for PIFER. The
problems of PDFER and PIFER differ in the level of com-
plexity of the data to be classified. For PDFER, a single
classifier needs to be trained on the features of a single per-
son, which makes it a much simpler classification problem.
The mapping of data in a plane using a linear kernel will
suffice, whereas for PIFER, the single classifier needs to be
trained on the features extracted from all the diverse people
in the database. Hence, the RBF kernel is used for PIFER,
as it has the capacity to map the features in the higher di-
mension and provide better classification for this complex
data.
Since we treat PDFER and PIFER as separate problems,
the use of different kernels does not affect the performance
evaluation for various AAM fitting methods. The important
point to note here is that a linear kernel is used to evalu-
ate the performance of all fitting methods for PDFER, and
a RBF kernel is used to evaluate the performance of all fit-
ting methods for PIFER. Hence, the consistency in the eval-
uation process is maintained as far as the classification of
expressions is concerned.
6. Experiments
The proposed PDFER and PIFER have been success-
fully implemented using each of the fitting algorithms dis-
cussed in Section 3. Previously in [12], it was shown that
POIC does not work well for objects varying significantly
in shape and texture, i.e. it is only suited for tracking of
simpler objects. Hence, POIC was excluded from our per-
son independent experiments (PIFER), where adaptability
to changes in size, shape and texture is the essence of the
problem. The linear regressor of IEBM and non-linear re-
gressor of HFBID were trained with the initial bounds of
±10
o
, ±0.1, ±20 pixels, and ±1.5 standard deviations of
rotation, scale, translation and non-rigid shape parameters,
respectively. Both PDFER and PIFER can be used to recog-
nise any arbitrary set of expressions, but are evaluated here
in a 7-class setup.
Our experimental dataset contained 3424 images of 30
subjects (15 females / 15 males) chosen randomly from the
Cohn-Kanade Database (CKDb) [14], with each speaker ex-
pressing 6 basic expressions starting from a Neutral expres-
sion. Overall, the dataset contained 992 images for Neu-
tral, 448 images for Anger, 296 images for Disgust, 346
images for Fear, 532 images for Joy, 423 images for Sor-
row and 387 images for Surprise. Further, the arbitrary se-
lection of speakers from CKDb ensured that the diversity of
the database with respect to the gender of the speaker and
shape, size and texture of the faces was maintained.
The experiments, for the results presented in this paper,
have been independently conducted for a person dependent
scenario (PDFER) and for a person independent scenario
(PIFER). For PDFER, 30 real-time AAM trackers, one for
each subject, were trained separately, whereas, for PIFER,
a single real-time AAM tracker was used that was trained
to track the facial features across the 30 speakers in the
database. It should be noted here that 30 images per person
were used to train the AAM trackers. The shape vector of
length 138, representing 69 landmark points tracked by the
AAM was further processed using the strategy presented in
Section 4 and a feature vector of length 21 was extracted us-
ing the scheme given in Figure 2. This feature vector is used
throughout our experiments for expression recognition. A
5-fold cross-validation scheme is used to evaluate the per-
formance and utility of each fitting algorithm for PDFER
and PIFER.
In the experiments, the AAM parameters for each im-
age in the database were perturbed for initialisation to sim-
ulate the misalignments, in the initialisation, that might be
encountered by the use of any generic face detector under
real world conditions. This testing strategy helps us to test
the robustness of the system based on each of the fitting
algorithm. For PDFER, the AAM parameters were per-

Citations
More filters
Journal ArticleDOI

Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications

TL;DR: This survey explicitly explores the multidisciplinary foundation that underlies all AC applications by describing how AC researchers have incorporated psychological theories of emotion and how these theories affect research questions, methods, results, and their interpretations.
Proceedings ArticleDOI

Robust Discriminative Response Map Fitting with Constrained Local Models

TL;DR: A novel discriminative regression based approach for the Constrained Local Models (CLMs) framework, referred to as the Discriminative Response Map Fitting (DRMF) method, which shows impressive performance in the generic face fitting scenario.
Journal ArticleDOI

Automatic Facial Expression Recognition Using Features of Salient Facial Patches

TL;DR: An automated learning-free facial landmark detection technique has been proposed, which achieves similar performances as that of other state-of-art landmark detection methods, yet requires significantly less execution time.
Proceedings ArticleDOI

Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark

TL;DR: A person independent training and testing protocol for expression recognition as part of the BEFIT workshop is proposed and a new static facial expression database Static Facial Expressions in the Wild (SFEW) is presented.
Journal ArticleDOI

Automatic Facial Expression Recognition Using Features of Salient Facial Patches

TL;DR: In this article, a few prominent facial patches, depending on the position of facial landmarks, are extracted which are active during emotion elicitation, and these active patches are further processed to obtain the salient patches which contain discriminative features for classification of each pair of expressions, thereby selecting different facial patches as salient for different pair of expression classes.
References
More filters

A Practical Guide to Support Vector Classication

TL;DR: A simple procedure is proposed, which usually gives reasonable results and is suitable for beginners who are not familiar with SVM.
Journal ArticleDOI

Active appearance models

Abstract: We describe a new method of matching statistical models of appearance to images. A set of model parameters control modes of shape and gray-level variation learned from a training set. We construct an efficient iterative matching algorithm by learning the relationship between perturbations in the model parameters and the induced image errors.
Book

Affective Computing

TL;DR: Key issues in affective computing, " computing that relates to, arises from, or influences emotions", are presented and new applications are presented for computer-assisted learning, perceptual information retrieval, arts and entertainment, and human health and interaction.
Book ChapterDOI

Active Appearance Models

TL;DR: A novel method of interpreting images using an Active Appearance Model (AAM), a statistical model of the shape and grey-level appearance of the object of interest which can generalise to almost any valid example.
Related Papers (5)
Frequently Asked Questions (18)
Q1. What is the way to reduce the texture variation of an object?

Since the minimisation of the fitness function depends only on the subspace orthogonal to the texture variation, a fixed linear update model can be computed analytically over the shape parameters only. 

This paper evaluates various Active Appearance Model ( AAM ) fitting methods, including both the original formulation as well as several state-of-the-art methods, for the task of automatic facial expression recognition. 

A future direction for this research is to take advantage of accurate ID fitting algorithms to tackle the problem of pose-invariant expression recognition. 

A future direction for this research is to take advantage of accurate ID fitting algorithms to tackle the problem of pose-invariant expression recognition. 

For PDFER, 30 real-time AAM trackers, one for each subject, were trained separately, whereas, for PIFER, a single real-time AAM tracker was used that was trained to track the facial features across the 30 speakers in the database. 

Vη and Hη are the normalisation distances used to normalise the feature vector with respect to the varyingsize of faces of different people. 

if the object exhibits large variation in shape and texture, its performance deteriorates because of the assumption of a fixed linear update model which can be too restrictive. 

since the work presented in this paper deals with the problem of AAM fitting, building the linear update model based on the mean appearance parameters, which on average are closer to the true parameter values, is an optimal choice. 

The AAM fitting algorithms can be broadly classified into two categories [21]:Generative fitting deals with the problem of fitting as minimisation/maximisation of some measure of fitness between the model’s texture and the warped image region. 

Under extreme conditions, IEBM marginally outperforms HFBID by taking advantage of its linear regressor, whose predictive domain is much simpler than that of the nonlinear regressor used by HFBID. 

Since the authors treat PDFER and PIFER as separate problems, the use of different kernels does not affect the performance evaluation for various AAM fitting methods. 

In comparison for PIFER, on increasing the perturbation to ±25 pixels for initialisation, FJ, SIC, and RIC are unable to converge, however, both HFBID and IEBM still maintain high accuracy. 

As the perturbation is increased to ±10 pixels for initialisation, a slight dip in accuracy for FJ, POIC and SIC is observed, however, HFBID, IEBM and RIC are able to maintain almost the same accuracy for PDFER, whereas for PIFER, the accuracy of FJ, SIC and RIC deteriorates even further. 

one of the inherent difficulties with the FACS coding scheme is that it requires a highly trained human expert to manually score each frame of a video. 

The experiments were performed on still images and a maximum accuracy of 79.9% for gender based expression classification and 76.4% for gender independent expression classification was reported. 

In [1], the idea of the inverse compositional method for AAM fitting is extended further by using an M-estimator (robust penaliser) instead of the least squares fitting criterion, resulting in an iteratively reweighed least squares fitting scheme. 

Going a step further for PDFER, on increasing the perturbation to ±30 pixels for initialisation, FJ, POIC, SIC and RIC are unable to converge, whereas, HFBID and IEBM still maintain high accuracy with IEBM performing better at 94.19% accuracy compared to the HFBID’s which achieves 88.03% accuracy. 

After the system has been trained for recognising a set of expressions, it accepts an image as input, followed by the same process of tracking facial features and extracting a feature vector.