scispace - formally typeset
Open AccessProceedings ArticleDOI

Facial expression recognition using clustering discriminant Non-negative Matrix Factorization

Reads0
Chats0
TLDR
The proposed method combines appropriate discriminant constraints in the NMF decomposition cost function in order to address the problem of finding discriminant projections that enhance class separability in the reduced dimensional projection space.
Abstract
Non-negative Matrix Factorization (NMF) is among the most popular subspace methods widely used in a variety of image processing problems. Recently, a discriminant NMF method that incorporates Linear Discriminant Analysis criteria and achieves an efficient decomposition of the provided data to its discriminant parts has been proposed. However, this approach poses several limitations since it assumes that the underline data distribution forms compact sets which is often unrealistic. To remedy this limitation we regard that data inside each class form various number of clusters and apply a Clustering based Discriminant Analysis. The proposed method combines appropriate discriminant constraints in the NMF decomposition cost function in order to address the problem of finding discriminant projections that enhance class separability in the reduced dimensional projection space. Experimental results performed on the Cohn-Kanade database verified the effectiveness of the proposed method in the facial expression recognition task.

read more

Content maybe subject to copyright    Report

FACIAL EXPRESSION RECOGNITION USING CLUSTERING DISCRIMINANT
NON-NEGATIVE MATRIX FACTORIZATION
Symeon Nikitidis
†∗
, Anastasios Tefas
, Nikos Nikolaidis
†∗
and Ioannis Pitas
†∗
Informatics and Telematics Institute, Center for Research and Technology Hellas, Greece
Department of Informatics, Aristotle University of Thessaloniki, Greece
{nikitidis,tefas,nikolaid,pitas}@aiia.csd.auth.gr
ABSTRACT
Non-negative Matrix Factorization (NMF) is among the most
popular subspace methods widely used in a variety of image
processing problems. Recently, a discriminant NMF method
that incorporates Linear Discriminant Analysis criteria and
achieves an efficient decomposition of the provided data to
its discriminant parts has been proposed. However, this ap-
proach poses several limitations since it assumes that the un-
derline data distribution forms compact sets which is often un-
realistic. To remedy this limitation we regard that data inside
each class form various number of clusters and apply a Clus-
tering based Discriminant Analysis. The proposed method
combines appropriate discriminant constraints in the NMF
decomposition cost function in order to address the problem
of finding discriminant projections that enhance class separa-
bility in the reduced dimensional projection space. Experi-
mental results performed on the Cohn-Kanade database ver-
ified the effectiveness of the proposed method in the facial
expression recognition task.
Index Terms Non-negative matrix factorization, sub-
space methods, clustering discriminant analysis, facial ex-
pression recognition
1. INTRODUCTION
NMF [1] is a matrix decomposition algorithm that requires
both the data matrix being decomposed and the yielding fac-
tors to contain non negative elements. The non negativity
constraint imposed has been exploited by a variety of applica-
tions since many types of data in practical problems are non
negative. For instance, numerous NMF-based methods oper-
ating on data derived from text documents [2, 3] or images,
have been developed based on NMF in image processing and
pattern recognition and proved efficient compared with other
traditional dimensionality reduction algorithms.
Recently numerous practical applications have been pro-
posed, creating specialized NMF based algorithms applied
The research leading to these results has received funding from the Euro-
pean Community’s Seventh Framework Programme (FP7/2007-2013) under
grant agreement n
o
211471 (i3DPost).
in various problems in diverse fields. A supervised NMF
learning method that aims to extract discriminant facial parts,
is the Discriminant NMF (DNMF) algorithm proposed in
[4]. DNMF combines Fisher’s criterion in the NMF decom-
position and achieves a more efficient decomposition of the
provided data to its discriminant parts, thus enhancing sepa-
rability between classes compared with conventional NMF.
However, the incorporation of Linear Discriminant Analysis
(LDA) [5] inside DNMF poses certain deficiencies. More
precisely, there are two main disadvantages in this approach.
Firstly, LDA assumes that the sample vectors of the classes
are generated from underlying multivariate Normal distribu-
tions of common covariance matrix but with different means.
Secondly, since LDA assumes that each class is represented
by a single cluster, the problem of nonlinearly separable
classes can not be solved. Unfortunately, in real world ap-
plications, data distribution usually do not correspond to
compact sets.
To remedy the aforementioned limitations we relax the
assumption that each class is expected to consist of a single
compact cluster and regard that data inside each class form
various clusters, where each one is approximated by a Gaus-
sian distribution. Consequently, we approximate the under-
lying data samples distribution of each class as a mixture of
Gaussians and use the corresponding criteria from the Clus-
tering based Discriminant Analysis (CDA) introduced in [6].
By incorporating these discriminant constraints in the NMF
decomposition we derive the proposed method called Sub-
class Discriminant NMF (SDNMF). The SDNMF algorithm
addresses the general problem of finding discriminant projec-
tions that enhance class separability in the reduced dimen-
sionality projection space.
2. NMF BASICS
Focusing on the application of the NMF algorithm on facial
image data, NMF aims to approximate a facial image by a
linear combination of elements the so called basis images,
that correspond to facial parts. Thus, the non negativity con-
straints imply that the combinations of the multiple basis im-
2011 18th IEEE International Conference on Image Processing
978-1-4577-1303-3/11/$26.00 ©2011 IEEE 3001

ages are practically additions of ideally non-overlapping fa-
cial parts that attempt to reconstruct accurately the image be-
ing decomposed. Let I be a facial image database comprised
of L images belonging to n different classes and X R
F ×L
+
is the data matrix whose columns are F -dimensional feature
vectors obtained by scanning row-wise each facial image in
the database. Thus x
i,j
is the i-th element of the j-th column
vector x
j
. NMF considers factorizations of the form:
X ZH (1)
where Z R
F ×M
+
is a matrix containing the basis images,
while matrix H R
M×L
+
contains the coefficients of the lin-
ear combinations of the basis images required to reconstruct
each original facial image in the database. Obviously, use-
ful factorizations for real world applications appear when the
linear subspace transformation projects data from the origi-
nal F -dimensional space to a M -dimensional subspace with
M F .
To measure the cost of the decomposition in (1), one pop-
ular approach is to use the Kullback-Leibler (KL) divergence
metric [7, 8]. Thus the cost of the decomposition in (1) can
be measured as the sum of all KL divergences between all
images in the database and their respective reconstructed ver-
sions, obtained from the factorization. Consequently, the cost
for factorizing X into ZH is evaluated as:
D
NMF
(X||ZH) =
L
X
j=1
KL(x
j
||Zh
j
) =
=
L
X
j=1
F
X
i=1
x
i,j
ln(
x
i,j
P
k
z
i,k
h
k,j
) +
X
k
z
i,k
h
k,j
x
i,j
!
. (2)
Using the Expectation Maximization (EM) algorithm
and an appropriately designed auxiliary function, it has been
shown in [9] that the following multiplicative update rules
update h
k,j
and z
i,k
, yielding the desired factors, while guar-
antee a non increasing behavior of the cost function in (2).
The update rule for the t-th iteration for h
(t)
k,j
is given by:
h
(t)
k,j
= h
(t1)
k,j
P
i
z
(t1)
i,k
x
i,j
P
l
z
(t1)
i,l
h
(t1)
l,j
P
i
z
(t1)
i,k
, (3)
while for z
(t)
i,k
the update rule is given by:
´z
(t)
i,k
= z
(t1)
i,k
P
j
h
(t)
k,j
x
i,j
P
l
z
(t1)
i,l
h
(t)
l,j
P
j
h
(t)
k,j
. (4)
Finally, the basis images matrix Z is normalized so that its
column vectors elements sum up to one:
z
(t)
i,k
=
´z
(t)
i,k
P
l
´z
(t)
l,k
. (5)
3. PROPOSED METHOD
In this section we present the performed clustering based dis-
criminant analysis and demonstrate how the derived discrim-
inant constraints are incorporated in the NMF decomposi-
tion cost function creating the proposed SDNMF optimization
problem. Next, we derive the proposed multiplicative update
rule that optimize SDNMF.
3.1. Clustering based Discriminant Analysis
To facilitate CDA in the n-class facial image database I, let
us denote the number of clusters composing the r-th class by
C
r
, the total number of formed clusters in the database by
C, where C =
P
n
i
C
i
, and the number of facial images be-
longing to the θ-th cluster of the r-th class by N
(r)(θ)
. Let
us also define the mean vector for the θ-th cluster of the r-
th class by µ
(r)(θ)
= [µ
(r)(θ)
1
. . . µ
(r)(θ)
M
]
T
which is evalu-
ated over the N
(r)(θ)
facial images, while vector η
(r)(θ)
ρ
=
[η
(r)(θ)
ρ,1
. . . η
(r)(θ)
ρ,M
]
T
corresponds to the feature vector of the
ρ-th image of the θ-th cluster of the r-th class. Using the
above notations we can define the within cluster scatter ma-
trix S
w
as:
S
w
=
n
X
r=1
C
r
X
θ=1
N
(r)(θ)
X
ρ=1
η
(r)(θ)
ρ
µ
(r)(θ)
η
(r)(θ)
ρ
µ
(r)(θ)
T
(6)
and the between cluster scatter matrix S
b
as:
S
b
=
n
X
i=1
n
X
r,r6=i
C
i
X
j=1
C
r
X
θ=1
µ
(i)(j)
µ
(r)(θ)
µ
(i)(j)
µ
(r)(θ)
T
.
(7)
Matrix S
w
represents the scatter of sample vector coeffi-
cients around their cluster mean. It is rationale to desire the
dispersion of those samples that belong to the same cluster of
a certain class to be as small as possible, since this would de-
note a high concentration of these samples around their clus-
ter mean and consequently, more compact clusters formation.
In order to measure the samples dispersion inside clusters we
compute the trace of the within cluster scatter matrix S
w
. Fur-
thermore, matrix S
b
defines the scatter of the mean vectors be-
tween all clusters that belong to different classes. To separate
clusters belonging to different classes we desire to maximize
the mean difference between every cluster of a certain class
to every cluster of each other class. Therefore, the trace of S
b
is desired to be as large as possible.
3.2. Subclass Discriminant Non-negative Matrix Factor-
ization (SDNMF)
In order to incorporate clustering based discriminant con-
straints derived from the performed CDA in the NMF de-
composition, we reformulate the NMF cost function adding
2011 18th IEEE International Conference on Image Processing
3002

appropriate penalty terms. Since we desire the trace of ma-
trix S
w
to be as small as possible and at the same time the
trace of S
b
to be as large as possible, the new cost function is
formulated as:
D
SDN M F
(X||ZH) = D
NMF
(X||ZH)+
α
2
tr[S
w
]
β
2
tr[S
b
]
(8)
where α and β are positive constants, tr[.] is the trace opera-
tor, while
1
2
is used to simplify subsequent derivations. Con-
sequently, the new minimization problem is formulated as:
min
Z,H
D
SDN M F
(X||ZH)
subject to: z
i,k
0 , h
k,j
0,
X
i
z
i,k
= 1, i, j, k.
(9)
which requires the minimization of (8) subject to the non-
negativity constraints applied on the elements of both the
weights matrix H and the basis images matrix Z.
The constrained optimization problem in (9) is solved by
introducing Lagrangian multipliers φ = [φ
i,k
] R
F ×M
and
ψ = [ψ
j,k
] R
M×L
each one associated with constraints
z
i,k
0 and h
k,j
0, respectively. Thus the Lagrangian
function L is formulated as:
L = D
NMF
(X||ZH)) +
α
2
tr[S
w
]
β
2
tr[S
b
] +
+
X
i
X
k
φ
i,k
z
i,k
+
X
j
X
k
ψ
j,k
h
j,k
L = D
NMF
(X||ZH)) +
α
2
tr[S
w
]
β
2
tr[S
b
] +
+ tr[φZ
T
] + tr[ψH
T
]. (10)
Consequently, the optimization problem in (9) is equivalent to
the minimization of the Lagrangian arg min
Z,H
L. To minimize
L, we first obtain its partial derivatives with respect to z
i,j
and
h
i,j
and set them equal to zero:
L
h
i,j
=
X
k
x
k,j
z
k,i
P
l
z
k,l
h
l,j
+
X
l
z
l,i
+ ψ
i,j
+
α
2
tr[S
w
]
h
i,j
β
2
tr[S
b
]
h
i,j
= 0
L
z
i,j
=
X
l
x
i,l
h
j,l
P
k
z
i,k
h
k,l
+
X
l
h
j,l
+ φ
i,j
+
α
2
tr[S
w
]
z
i,j
β
2
tr[S
b
]
z
i,j
= 0. (11)
According to KKT conditions [10], φ
i,j
z
i,j
= 0 and also
ψ
i,j
h
i,j
= 0. Consequently, we obtain the following equali-
ties:
L
h
i,j
h
i,j
= 0
X
k
x
k,j
z
k,i
P
l
z
k,l
h
l,j
h
i,j
+
X
l
z
l,i
h
i,j
+ α
h
i,j
µ
(r)(θ)
i
h
i,j
β
N
(r)(θ)
µ
(r)(θ)
i
(C C
r
) h
i,j
+
β
N
(r)(θ)
n
X
m,m6=r
C
m
X
g =1
µ
(m)(g)
i
h
i,j
= 0 (12)
L
z
i,j
z
i,j
= 0
X
l
x
i,l
h
j,l
P
k
z
i,k
h
k,l
z
i,j
+
X
l
h
j,l
z
i,j
= 0.
(13)
Solving the resulting from equation (12) quadratic function
for h
i,j
, leads to the proposed multiplicative update rule for
the weight coefficients which for the t-th iteration is defined
as:
h
(t)
i,j
=
A +
A
2
+ T
2
α
h
α +
β
N
(r)(θ)
(C C
r
)
i
1
N
(r)(θ)
(14)
where T = 4
α
α +
β
N
(r)(θ)
(C C
r
)
1
N
(r)(θ)
·
h
(t1)
i,j
X
k
z
(t1)
k,i
x
k,j
P
n
z
(t1)
k,n
h
(t1)
n,j
,
h
i,j
denotes the j-th feature element of the ρ-th image belong-
ing to the θ-th cluster of the r-th facial class and A is defined
as:
A =
α +
β
N
(r)(θ)
(C C
r
)
1
N
(r)(θ)
X
λ,λ6=j
h
i,λ
β
N
(r)(θ)
n
X
m,m6=r
C
m
X
g =1
µ
(m)(g)
i
1. (15)
The update rule for z
i,j
is directly derived by solving (13) and
is the same as in (4).
4. EXPERIMENTAL RESULTS
We compared the performance of the proposed SDNMF
method with the DNMF and the conventional NMF algo-
rithm on the facial expression recognition problem using the
popular Cohn-Kanade [11] database. In order to form the
training and test sets, face detection was performed and the
resulting Regions Of Interest (ROIs) were manually aligned
with respect to the eyes position. Each facial image in the
database was isotropically scaled, so as to have fixed size of
30 × 40 pixels (where 30 and 40 are the columns and rows
of the image, respectively) and was converted to grayscale.
Each such fixed size facial image was scanned row-wise so
as to form a feature vector x = [f
1
. . . f
1200
]
T
(f
i
being the
2011 18th IEEE International Conference on Image Processing
3003

luminance of the i-th pixel) which is used to form the training
and test sets.
We have performed 5-fold cross-validation on the avail-
able data samples where the training set was used in order
to learn the basis images for the low dimensional projection
space while the test set to report the facial expression recogni-
tion accuracy rates in the respective learned projection space.
Classification was performed by feeding the projected to the
lower dimensional discriminant facial expression representa-
tions to a linear SVM classifier.
20 40 60 80 100 120 140 160 180 200
40
45
50
55
60
65
70
75
Projection Subspace Dimensionality
Average Recognition Accuracy Rate (%)
DNMF
SDNMF C
r
=3
SDNMF C
r
=2
NMF
Fig. 1. Average facial expression recognition accuracy rate
versus the dimensionality of the projection subspace in the
Cohn-Kanade database.
Figure 1 shows the average expression recognition accu-
racy rates versus the projection subspace dimensionality. The
highest measured recognition rates achieved by each exam-
ined method, as well as, the respective subspace dimension-
ality are summarized in Table 1. As it can be seen SDNMF
outperforms both NMF and DNMF methods.
Table 1. Best average expression recognition accuracy rates
in Cohn-Kanade database
Subspace
Method
Accuracy Rate Dimensionality
SDNMF C
r
= 2 69.05% 190
SDNMF C
r
= 3 68.31% 182
DNMF
66.08% 166
NMF
64.85% 134
5. CONCLUSIONS
We proposed a novel method that addresses the general prob-
lem of finding discriminant projections that enhance class
separability by incorporating CDA in the NMF decomposi-
tion. To solve the SDNMF problem, we develop a multiplica-
tive update rule that considers not only samples class origin
but also clusters formation inside each class. We compared
the performance of SDNMF algorithm with NMF and DNMF
and the experimental results verified the effectiveness of the
proposed method in the facial expression recognition task.
6. REFERENCES
[1] Daniel D. Lee and H. Sebastian Seung, “Learning the
parts of objects by non-negative matrix factorization,
Nature, vol. 401, pp. 788–791, 1999.
[2] W. Xu, X. Liu, and Y. Gong, “Document clustering
based on non-negative matrix factorization, in Pro-
ceedings of the 26th annual international ACM SIGIR
conference on Research and development in informaion
retrieval. ACM, 2003, pp. 267–273.
[3] V.P. Pauca, F. Shahnaz, M.W. Berry, and R.J. Plem-
mons, “Text mining using nonnegative matrix factor-
izations, in IEEE International Conference on Data
Mining (ICDM), 2004, pp. 452–456.
[4] Stefanos Zafeiriou, Anastasios Tefas, Ioan Buciu, and
Ioannis Pitas, “Exploiting discriminant information
in nonnegative matrix factorization with application to
frontal face verification, IEEE Transactions on Neural
Networks, vol. 17, no. 3, pp. 683–695, 2006.
[5] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman,
“Eigenfaces vs. fisherfaces: Recognition using class
specific linear projection, IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, vol. 19, no. 7,
pp. 711–720, July 1997.
[6] X. Chen and T. Huang, “Facial expression recognition:
a clustering-based approach, Pattern Recognition Let-
ters, vol. 24, no. 9-10, pp. 1295–1302, 2003.
[7] S. Della Pietra, V. Della Pietra, and J. Lafferty, “Statisti-
cal learning algorithms based on bregman distances, in
Proceedings of the Canadian Workshop on Information
Theory, Toronto, Canada, 1997.
[8] M. Collins, R. E. Schapire, and Y. Singer, “Logistic
regression, adaboost and bregman distances, Compu-
tational Learing Theory, pp. 158–169, 2000.
[9] Daniel D. Lee and H. Sebastian Seung, Algorithms
for non-negative matrix factorization, in Advances in
Neural Information Processing Systems (NIPS), 2000,
pp. 556–562.
[10] R. Fletcher, Practical methods of optimization; (2nd
ed.), Wiley-Interscience, New York, NY, USA, 1987.
[11] T. Kanade, J.F. Cohn, and Y. Tian, “Comprehensive
database for facial expression analysis, March 2000,
pp. 46–53.
2011 18th IEEE International Conference on Image Processing
3004
Citations
More filters
Journal ArticleDOI

Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition

TL;DR: This paper provides a comprehensive analysis of facial representations by uncovering their advantages and limitations, and elaborate on the type of information they encode and how they deal with the key challenges of illumination variations, registration errors, head-pose variations, occlusions, and identity bias.
Journal ArticleDOI

Projected gradients for subclass discriminant nonnegative subspace learning.

TL;DR: The proposed method combines appropriate clustering-based discriminant criteria in the NMF decomposition cost function, in order to find discriminant projections that enhance class separability in the reduced dimensional projection space, thus improving classification performance.
Journal ArticleDOI

Projective complex matrix factorization for facial expression recognition

TL;DR: Extensive experiments show that the proposed proCMF model provides even better performance than state-of-the-art methods for facial expression recognition, and was utilized to optimize a complex cost function.
Journal ArticleDOI

A Sparse Corruption Non-Negative Matrix Factorization method and application in face image processing & recognition

TL;DR: Sparse Corruption Non-Negative Matrix Factorization (SCNMF) separates a sparse noise matrix out of the corrupted input matrix, and the rest of the input matrix is represented as the product of two low-dimensional matrices.
Proceedings ArticleDOI

Sparse localized facial motion dictionary learning for facial expression recognition

TL;DR: A new framework for facial motion modeling with applications to facial expression recognition is presented, and proposed localized dictionaries are effective for local facial motion description as well as global facial motion analysis.
References
More filters
Journal ArticleDOI

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
Journal ArticleDOI

Learning the parts of objects by non-negative matrix factorization

TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.

Learning parts of objects by non-negative matrix factorization

D. D. Lee
TL;DR: In this article, non-negative matrix factorization is used to learn parts of faces and semantic features of text, which is in contrast to principal components analysis and vector quantization that learn holistic, not parts-based, representations.
Proceedings Article

Algorithms for Non-negative Matrix Factorization

TL;DR: Two different multiplicative algorithms for non-negative matrix factorization are analyzed and one algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What are the contributions mentioned in the paper "Facial expression recognition using clustering discriminant non-negative matrix factorization" ?

Recently, a discriminant NMF method that incorporates Linear Discriminant Analysis criteria and achieves an efficient decomposition of the provided data to its discriminant parts has been proposed. The proposed method combines appropriate discriminant constraints in the NMF decomposition cost function in order to address the problem of finding discriminant projections that enhance class separability in the reduced dimensional projection space. 

The SDNMF algorithm addresses the general problem of finding discriminant projections that enhance class separability in the reduced dimensionality projection space. 

The constrained optimization problem in (9) is solved by introducing Lagrangian multipliers φ = [φi,k] ∈ RF×M and ψ = [ψj,k] ∈ RM×L each one associated with constraints zi,k ≥ 0 and hk,j ≥ 0, respectively. 

Focusing on the application of the NMF algorithm on facial image data, NMF aims to approximate a facial image by a linear combination of elements the so called basis images, that correspond to facial parts. 

Fisher’s criterion in the NMF decomposition and achieves a more efficient decomposition of the provided data to its discriminant parts, thus enhancing separability between classes compared with conventional NMF. 

NMF considers factorizations of the form:X ≈ ZH (1)where Z ∈ RF×M+ is a matrix containing the basis images, while matrix H ∈ RM×L+ contains the coefficients of the linear combinations of the basis images required to reconstruct each original facial image in the database. 

LDA assumes that the sample vectors of the classes are generated from underlying multivariate Normal distributions of common covariance matrix but with different means. 

It is rationale to desire the dispersion of those samples that belong to the same cluster of a certain class to be as small as possible, since this would denote a high concentration of these samples around their cluster mean and consequently, more compact clusters formation. 

The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no 211471 (i3DPost). 

The non negativity constraint imposed has been exploited by a variety of applications since many types of data in practical problems are non negative. 

the cost for factorizing X into ZH is evaluated as:DNMF (X||ZH) = L ∑j=1KL(xj ||Zhj) ==L ∑j=1F ∑i=1(xi,j ln( xi,j ∑k zi,khk,j ) +∑kzi,khk,j − xi,j ) . 

To measure the cost of the decomposition in (1), one popular approach is to use the Kullback-Leibler (KL) divergence metric [7, 8]. 

A supervised NMF learning method that aims to extract discriminant facial parts, is the Discriminant NMF (DNMF) algorithm proposed in [4]. 

To remedy the aforementioned limitations the authors relax the assumption that each class is expected to consist of a single compact cluster and regard that data inside each class form various clusters, where each one is approximated by a Gaussian distribution.