What is the simplest way to approximate facial images?

NMF considers factorizations of the form:X ≈ ZH (1)where Z ∈ RF×M+ is a matrix containing the basis images, while matrix H ∈ RM×L+ contains the coefficients of the linear combinations of the basis images required to reconstruct each original facial image in the database.

What is the cost of factorizing X into ZH?

the cost for factorizing X into ZH is evaluated as:DNMF (X||ZH) = L ∑j=1KL(xj ||Zhj) ==L ∑j=1F ∑i=1(xi,j ln( xi,j ∑k zi,khk,j ) +∑kzi,khk,j − xi,j ) .

(Open Access) Facial expression recognition using clustering discriminant Non-negative Matrix Factorization (2011) | Symeon Nikitidis

Q: What is the purpose of the SDNMF algorithm?

The SDNMF algorithm addresses the general problem of finding discriminant projections that enhance class separability in the reduced dimensionality projection space.

Q: What is the main disadvantage of the DNMF?

Fisher’s criterion in the NMF decomposition and achieves a more efficient decomposition of the provided data to its discriminant parts, thus enhancing separability between classes compared with conventional NMF.

FACIAL EXPRESSION RECOGNITION USING CLUSTERING DISCRIMINANT

NON-NEGATIVE MATRIX FACTORIZATION

Symeon Nikitidis

†∗

, Anastasios Tefas

†

, Nikos Nikolaidis

†∗

and Ioannis Pitas

†∗

∗

Informatics and Telematics Institute, Center for Research and Technology Hellas, Greece

†

Department of Informatics, Aristotle University of Thessaloniki, Greece

{nikitidis,tefas,nikolaid,pitas}@aiia.csd.auth.gr

ABSTRACT

Non-negative Matrix Factorization (NMF) is among the most

popular subspace methods widely used in a variety of image

processing problems. Recently, a discriminant NMF method

that incorporates Linear Discriminant Analysis criteria and

achieves an efﬁcient decomposition of the provided data to

its discriminant parts has been proposed. However, this ap-

proach poses several limitations since it assumes that the un-

derline data distribution forms compact sets which is often un-

realistic. To remedy this limitation we regard that data inside

each class form various number of clusters and apply a Clus-

tering based Discriminant Analysis. The proposed method

combines appropriate discriminant constraints in the NMF

decomposition cost function in order to address the problem

of ﬁnding discriminant projections that enhance class separa-

bility in the reduced dimensional projection space. Experi-

mental results performed on the Cohn-Kanade database ver-

iﬁed the effectiveness of the proposed method in the facial

expression recognition task.

Index Terms— Non-negative matrix factorization, sub-

space methods, clustering discriminant analysis, facial ex-

pression recognition

1. INTRODUCTION

NMF [1] is a matrix decomposition algorithm that requires

both the data matrix being decomposed and the yielding fac-

tors to contain non negative elements. The non negativity

constraint imposed has been exploited by a variety of applica-

tions since many types of data in practical problems are non

negative. For instance, numerous NMF-based methods oper-

ating on data derived from text documents [2, 3] or images,

have been developed based on NMF in image processing and

pattern recognition and proved efﬁcient compared with other

traditional dimensionality reduction algorithms.

Recently numerous practical applications have been pro-

posed, creating specialized NMF based algorithms applied

The research leading to these results has received funding from the Euro-

pean Community’s Seventh Framework Programme (FP7/2007-2013) under

grant agreement n

211471 (i3DPost).

in various problems in diverse ﬁelds. A supervised NMF

learning method that aims to extract discriminant facial parts,

is the Discriminant NMF (DNMF) algorithm proposed in

[4]. DNMF combines Fisher’s criterion in the NMF decom-

position and achieves a more efﬁcient decomposition of the

provided data to its discriminant parts, thus enhancing sepa-

rability between classes compared with conventional NMF.

However, the incorporation of Linear Discriminant Analysis

(LDA) [5] inside DNMF poses certain deﬁciencies. More

precisely, there are two main disadvantages in this approach.

Firstly, LDA assumes that the sample vectors of the classes

are generated from underlying multivariate Normal distribu-

tions of common covariance matrix but with different means.

Secondly, since LDA assumes that each class is represented

by a single cluster, the problem of nonlinearly separable

classes can not be solved. Unfortunately, in real world ap-

plications, data distribution usually do not correspond to

compact sets.

To remedy the aforementioned limitations we relax the

assumption that each class is expected to consist of a single

compact cluster and regard that data inside each class form

various clusters, where each one is approximated by a Gaus-

sian distribution. Consequently, we approximate the under-

lying data samples distribution of each class as a mixture of

Gaussians and use the corresponding criteria from the Clus-

tering based Discriminant Analysis (CDA) introduced in [6].

By incorporating these discriminant constraints in the NMF

decomposition we derive the proposed method called Sub-

class Discriminant NMF (SDNMF). The SDNMF algorithm

addresses the general problem of ﬁnding discriminant projec-

tions that enhance class separability in the reduced dimen-

sionality projection space.

2. NMF BASICS

Focusing on the application of the NMF algorithm on facial

image data, NMF aims to approximate a facial image by a

linear combination of elements the so called basis images,

that correspond to facial parts. Thus, the non negativity con-

straints imply that the combinations of the multiple basis im-

2011 18th IEEE International Conference on Image Processing

ages are practically additions of ideally non-overlapping fa-

cial parts that attempt to reconstruct accurately the image be-

ing decomposed. Let I be a facial image database comprised

of L images belonging to n different classes and X ∈ R

F ×L

is the data matrix whose columns are F -dimensional feature

vectors obtained by scanning row-wise each facial image in

the database. Thus x

i,j

is the i-th element of the j-th column

vector x

. NMF considers factorizations of the form:

X ≈ ZH (1)

where Z ∈ R

F ×M

is a matrix containing the basis images,

while matrix H ∈ R

M×L

contains the coefﬁcients of the lin-

ear combinations of the basis images required to reconstruct

each original facial image in the database. Obviously, use-

ful factorizations for real world applications appear when the

linear subspace transformation projects data from the origi-

nal F -dimensional space to a M -dimensional subspace with

M ≪ F .

To measure the cost of the decomposition in (1), one pop-

ular approach is to use the Kullback-Leibler (KL) divergence

metric [7, 8]. Thus the cost of the decomposition in (1) can

be measured as the sum of all KL divergences between all

images in the database and their respective reconstructed ver-

sions, obtained from the factorization. Consequently, the cost

for factorizing X into ZH is evaluated as:

NMF

(X||ZH) =

j=1

KL(x

||Zh

) =

j=1

i=1

i,j

ln(

i,j

i,k

k,j

) +

i,k

k,j

− x

i,j

. (2)

Using the Expectation Maximization (EM) algorithm

and an appropriately designed auxiliary function, it has been

shown in [9] that the following multiplicative update rules

update h

k,j

and z

i,k

, yielding the desired factors, while guar-

antee a non increasing behavior of the cost function in (2).

The update rule for the t-th iteration for h

(t)

k,j

is given by:

(t)

k,j

= h

(t−1)

k,j

(t−1)

i,k

i,j

(t−1)

i,l

(t−1)

l,j

(t−1)

i,k

, (3)

while for z

(t)

i,k

the update rule is given by:

´z

(t)

i,k

= z

(t−1)

i,k

(t)

k,j

i,j

(t−1)

i,l

(t)

l,j

(t)

k,j

. (4)

Finally, the basis images matrix Z is normalized so that its

column vectors elements sum up to one:

(t)

i,k

´z

(t)

i,k

´z

(t)

l,k

. (5)

3. PROPOSED METHOD

In this section we present the performed clustering based dis-

criminant analysis and demonstrate how the derived discrim-

inant constraints are incorporated in the NMF decomposi-

tion cost function creating the proposed SDNMF optimization

problem. Next, we derive the proposed multiplicative update

rule that optimize SDNMF.

3.1. Clustering based Discriminant Analysis

To facilitate CDA in the n-class facial image database I, let

us denote the number of clusters composing the r-th class by

, the total number of formed clusters in the database by

C, where C =

, and the number of facial images be-

longing to the θ-th cluster of the r-th class by N

(r)(θ)

. Let

us also deﬁne the mean vector for the θ-th cluster of the r-

th class by µ

(r)(θ)

= [µ

(r)(θ)

. . . µ

(r)(θ)

]

which is evalu-

ated over the N

(r)(θ)

facial images, while vector η

(r)(θ)

[η

(r)(θ)

ρ,1

. . . η

(r)(θ)

ρ,M

]

corresponds to the feature vector of the

ρ-th image of the θ-th cluster of the r-th class. Using the

above notations we can deﬁne the within cluster scatter ma-

trix S

as:

r=1

θ=1

(r)(θ)

ρ=1



(r)(θ)

− µ

(r)(θ)



(r)(θ)

− µ

(r)(θ)



(6)

and the between cluster scatter matrix S

as:

i=1

r,r6=i

j=1

θ=1



(i)(j)

− µ

(r)(θ)



(i)(j)

− µ

(r)(θ)



(7)

Matrix S

represents the scatter of sample vector coefﬁ-

cients around their cluster mean. It is rationale to desire the

dispersion of those samples that belong to the same cluster of

a certain class to be as small as possible, since this would de-

note a high concentration of these samples around their clus-

ter mean and consequently, more compact clusters formation.

In order to measure the samples dispersion inside clusters we

compute the trace of the within cluster scatter matrix S

. Fur-

thermore, matrix S

deﬁnes the scatter of the mean vectors be-

tween all clusters that belong to different classes. To separate

clusters belonging to different classes we desire to maximize

the mean difference between every cluster of a certain class

to every cluster of each other class. Therefore, the trace of S

is desired to be as large as possible.

3.2. Subclass Discriminant Non-negative Matrix Factor-

ization (SDNMF)

In order to incorporate clustering based discriminant con-

straints derived from the performed CDA in the NMF de-

composition, we reformulate the NMF cost function adding

2011 18th IEEE International Conference on Image Processing

3002

appropriate penalty terms. Since we desire the trace of ma-

trix S

to be as small as possible and at the same time the

trace of S

to be as large as possible, the new cost function is

formulated as:

SDN M F

(X||ZH) = D

NMF

(X||ZH)+

tr[S

]−

tr[S

]

(8)

where α and β are positive constants, tr[.] is the trace opera-

tor, while

is used to simplify subsequent derivations. Con-

sequently, the new minimization problem is formulated as:

min

Z,H

SDN M F

(X||ZH)

subject to: z

i,k

≥ 0 , h

k,j

≥ 0,

i,k

= 1, ∀i, j, k.

(9)

which requires the minimization of (8) subject to the non-

negativity constraints applied on the elements of both the

weights matrix H and the basis images matrix Z.

The constrained optimization problem in (9) is solved by

introducing Lagrangian multipliers φ = [φ

i,k

] ∈ R

F ×M

and

ψ = [ψ

j,k

] ∈ R

M×L

each one associated with constraints

i,k

≥ 0 and h

k,j

≥ 0, respectively. Thus the Lagrangian

function L is formulated as:

L = D

NMF

(X||ZH)) +

tr[S

] −

tr[S

] +

i,k

j,k

⇔

L = D

NMF

(X||ZH)) +

tr[S

] −

tr[S

] +

+ tr[φZ

] + tr[ψH

]. (10)

Consequently, the optimization problem in (9) is equivalent to

the minimization of the Lagrangian arg min

Z,H

L. To minimize

L, we ﬁrst obtain its partial derivatives with respect to z

i,j

and

i,j

and set them equal to zero:

∂L

∂h

i,j

= −

k,j

k,i

k,l

l,j

l,i

+ ψ

i,j

∂tr[S

]

∂h

i,j

−

∂tr[S

]

∂h

i,j

= 0

∂L

∂z

i,j

= −

i,l

j,l

i,k

k,l

j,l

+ φ

i,j

∂tr[S

]

∂z

i,j

−

∂tr[S

]

∂z

i,j

= 0. (11)

According to KKT conditions [10], φ

i,j

= 0 and also

i,j

= 0. Consequently, we obtain the following equali-

ties:



∂L

∂h

i,j



i,j

= 0 ⇔ −

k,j

k,i

k,l

l,j

i,j

l,i

i,j

+ α



i,j

− µ

(r)(θ)



i,j

−

(r)(θ)

(C − C

) h

i,j

(r)(θ)

m,m6=r

g =1

(m)(g)

i,j

= 0 (12)



∂L

∂z

i,j



i,j

= 0 ⇔ −

i,l

j,l

i,k

k,l

i,j

j,l

i,j

= 0.

(13)

Solving the resulting from equation (12) quadratic function

for h

i,j

, leads to the proposed multiplicative update rule for

the weight coefﬁcients which for the t-th iteration is deﬁned

as:

(t)

i,j

A +

√

+ T



α −

α +

(r)(θ)

(C − C

)

(r)(θ)



(14)

where T = 4



α −



α +

(r)(θ)

(C − C

)



(r)(θ)



(t−1)

i,j

(t−1)

k,i

k,j

(t−1)

k,n

(t−1)

n,j

i,j

denotes the j-th feature element of the ρ-th image belong-

ing to the θ-th cluster of the r-th facial class and A is deﬁned

as:

A =



α +

(r)(θ)

(C − C

)



(r)(θ)

λ,λ6=j

i,λ

−

(r)(θ)

m,m6=r

g =1

(m)(g)

− 1. (15)

The update rule for z

i,j

is directly derived by solving (13) and

is the same as in (4).

4. EXPERIMENTAL RESULTS

We compared the performance of the proposed SDNMF

method with the DNMF and the conventional NMF algo-

rithm on the facial expression recognition problem using the

popular Cohn-Kanade [11] database. In order to form the

training and test sets, face detection was performed and the

resulting Regions Of Interest (ROIs) were manually aligned

with respect to the eyes position. Each facial image in the

database was isotropically scaled, so as to have ﬁxed size of

30 × 40 pixels (where 30 and 40 are the columns and rows

of the image, respectively) and was converted to grayscale.

Each such ﬁxed size facial image was scanned row-wise so

as to form a feature vector x = [f

. . . f

1200

]

being the

2011 18th IEEE International Conference on Image Processing

3003

luminance of the i-th pixel) which is used to form the training

and test sets.

We have performed 5-fold cross-validation on the avail-

able data samples where the training set was used in order

to learn the basis images for the low dimensional projection

space while the test set to report the facial expression recogni-

tion accuracy rates in the respective learned projection space.

Classiﬁcation was performed by feeding the projected to the

lower dimensional discriminant facial expression representa-

tions to a linear SVM classiﬁer.

20 40 60 80 100 120 140 160 180 200

Projection Subspace Dimensionality

Average Recognition Accuracy Rate (%)

DNMF

SDNMF C

NMF

Fig. 1. Average facial expression recognition accuracy rate

versus the dimensionality of the projection subspace in the

Cohn-Kanade database.

Figure 1 shows the average expression recognition accu-

racy rates versus the projection subspace dimensionality. The

highest measured recognition rates achieved by each exam-

ined method, as well as, the respective subspace dimension-

ality are summarized in Table 1. As it can be seen SDNMF

outperforms both NMF and DNMF methods.

Table 1. Best average expression recognition accuracy rates

in Cohn-Kanade database

Subspace

Method

Accuracy Rate Dimensionality

SDNMF C

= 2 69.05% 190

SDNMF C

= 3 68.31% 182

DNMF

66.08% 166

NMF

64.85% 134

5. CONCLUSIONS

We proposed a novel method that addresses the general prob-

lem of ﬁnding discriminant projections that enhance class

separability by incorporating CDA in the NMF decomposi-

tion. To solve the SDNMF problem, we develop a multiplica-

tive update rule that considers not only samples class origin

but also clusters formation inside each class. We compared

the performance of SDNMF algorithm with NMF and DNMF

and the experimental results veriﬁed the effectiveness of the

proposed method in the facial expression recognition task.

6. REFERENCES

[1] Daniel D. Lee and H. Sebastian Seung, “Learning the

parts of objects by non-negative matrix factorization,”

Nature, vol. 401, pp. 788–791, 1999.

[2] W. Xu, X. Liu, and Y. Gong, “Document clustering

based on non-negative matrix factorization,” in Pro-

ceedings of the 26th annual international ACM SIGIR

conference on Research and development in informaion

retrieval. ACM, 2003, pp. 267–273.

[3] V.P. Pauca, F. Shahnaz, M.W. Berry, and R.J. Plem-

mons, “Text mining using nonnegative matrix factor-

izations,” in IEEE International Conference on Data

Mining (ICDM), 2004, pp. 452–456.

[4] Stefanos Zafeiriou, Anastasios Tefas, Ioan Buciu, and

Ioannis Pitas, “Exploiting discriminant information

in nonnegative matrix factorization with application to

frontal face veriﬁcation,” IEEE Transactions on Neural

Networks, vol. 17, no. 3, pp. 683–695, 2006.

[5] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman,

“Eigenfaces vs. ﬁsherfaces: Recognition using class

speciﬁc linear projection,” IEEE Transactions on Pat-

tern Analysis and Machine Intelligence, vol. 19, no. 7,

pp. 711–720, July 1997.

[6] X. Chen and T. Huang, “Facial expression recognition:

a clustering-based approach,” Pattern Recognition Let-

ters, vol. 24, no. 9-10, pp. 1295–1302, 2003.

[7] S. Della Pietra, V. Della Pietra, and J. Lafferty, “Statisti-

cal learning algorithms based on bregman distances,” in

Proceedings of the Canadian Workshop on Information

Theory, Toronto, Canada, 1997.

[8] M. Collins, R. E. Schapire, and Y. Singer, “Logistic

regression, adaboost and bregman distances,” Compu-

tational Learing Theory, pp. 158–169, 2000.

[9] Daniel D. Lee and H. Sebastian Seung, “Algorithms

for non-negative matrix factorization,” in Advances in

Neural Information Processing Systems (NIPS), 2000,

pp. 556–562.

[10] R. Fletcher, Practical methods of optimization; (2nd

ed.), Wiley-Interscience, New York, NY, USA, 1987.

[11] T. Kanade, J.F. Cohn, and Y. Tian, “Comprehensive

database for facial expression analysis,” March 2000,

pp. 46–53.

2011 18th IEEE International Conference on Image Processing

3004

Facial expression recognition using clustering discriminant Non-negative Matrix Factorization

Figures

Citations

Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition

Projected gradients for subclass discriminant nonnegative subspace learning.

Projective complex matrix factorization for facial expression recognition

A Sparse Corruption Non-Negative Matrix Factorization method and application in face image processing & recognition

Sparse localized facial motion dictionary learning for facial expression recognition

References

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

Learning the parts of objects by non-negative matrix factorization

Learning parts of objects by non-negative matrix factorization

Practical Methods of Optimization.

Algorithms for Non-negative Matrix Factorization

Related Papers (5)

Subclass discriminant Nonnegative Matrix Factorization for facial image analysis

Face recognition using kernel scatter-difference-based discriminant analysis

Learning the parts of objects by non-negative matrix factorization

Discriminant Subspace Analysis: An Adaptive Approach for Image Classification

A Doubly Weighted Approach for Appearance-Based Subspace Learning Methods

Frequently Asked Questions (14)

Q1. What are the contributions mentioned in the paper "Facial expression recognition using clustering discriminant non-negative matrix factorization" ?

Q2. What is the purpose of the SDNMF algorithm?

Q3. What is the rationale to desire to have a constrained optimization problem in n-?

Q4. What is the purpose of the NMF algorithm?

Q5. What is the main disadvantage of the DNMF?

Q6. What is the simplest way to approximate facial images?

Q7. What is the main disadvantage of LDA?

Q8. What is the rationale for the dispersion of those samples that belong to the same class?

Q9. What is the name of the research that led to these results?

Q10. What is the non negative constraint in NMF?

Q11. What is the cost of factorizing X into ZH?

Q12. What is the cost of the decomposition in (1)?

Q13. What is the definition of a supervised NMF algorithm?

Q14. What is the main limitation of the NMF?