(Open Access) Face recognition using LDA-based algorithms (2003) | Juwei Lu

Q: What have the authors contributed in "Face recognition using lda-based algorithms" ?

In this short paper, the authors propose a new algorithm that deals with both of the shortcomings in an efficient and cost effective manner.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003 195

Brief Papers_______________________________________________________________________________

Face Recognition Using LDA-Based Algorithms

Juwei Lu, Kostantinos N. Plataniotis, and Anastasios N. Venetsanopoulos

Abstract—Low-dimensional feature representation with en-

hanced discriminatory power is of paramount importance to face

recognition (FR) systems. Most of traditional linear discriminant

analysis (LDA)-based methods suffer from the disadvantage that

their optimality criteria are not directly related to the classifi-

cation ability of the obtained feature representation. Moreover,

their classification accuracy is affected by the “small sample size”

(SSS) problem which is often encountered in FR tasks. In this

short paper, we propose a new algorithm that deals with both of

the shortcomings in an efficient and cost effective manner. The

proposed here method is compared, in terms of classification

accuracy, to other commonly used FR methods on two face

databases. Results indicate that the performance of the proposed

method is overall superior to those of traditional FR approaches,

such as the Eigenfaces, Fisherfaces, and D-LDA methods.

Index Terms—Direct LDA, Eigenfaces, face recognition, Fish-

erfaces, fractional-step LDA, linear discriminant analysis (LDA),

principle component analysis (PCA).

I. INTRODUCTION

EATURE selection for face representation is one of central

issues to face recognition (FR) systems. Among various

solutions to the problem (see [1], [2] for a survey), the most

successful seems to be those appearance-based approaches,

which generally operate directly on images or appearances

of face objects and process the images as two-dimensional

(2-D) holistic patterns, to avoid difficulties associated with

three-dimensional (3-D) modeling, and shape or landmark

detection [2]. Principle component analysis (PCA) and linear

discriminant analysis (LDA) are two powerful tools used for

data reduction and feature extraction in the appearance-based

approaches. Two state-of-the-art FR methods, Eigenfaces [3]

and Fisherfaces [4], built on the two techniques, respectively,

have been proved to be very successful.

It is generally believed that, when it comes to solving prob-

lems of pattern classification, LDA-based algorithms outper-

form PCA-based ones, since the former optimizes the low-di-

mensional representation of the objects with focus on the most

discriminant feature extraction while the latter achieves simply

object reconstruction [4]–[6]. However, the classification per-

formance of traditional LDA is often degraded by the fact that

their separability criteria are not directly related to their clas-

sification accuracy in the output space [7]. A solution to the

Manuscript received January 15, 2001; revised April 16, 2002.

The authors are with Multimedia Laboratory, Edward S. Rogers, Sr. Depart-

ment of Electrical and Computer Engineering, University of Toronto, Toronto,

ON M5S 3G4, Canada (e-mail: kostas@dsp.toronto.edu).

Digital Object Identifier 10.1109/TNN.2002.806647

problem is to introduce weighting functions into LDA. Object

classes that are closer together in the output space, and thus can

potentially result in misclassification, should be more heavily

weighted in the input space. This idea has been further extended

in [7] with the introduction of the fractional-step linear discrim-

inant analysis algorithm (F-LDA), where the dimensionality re-

duction is implemented in a few small fractional steps allowing

for the relevant distances to be more accurately weighted. Al-

though the method has been successfully tested on low-dimen-

sional patterns whose dimensionality is

, it cannot be di-

rectly applied to high-dimensional patterns, such as those face

images used in this paper [it should be noted at this point that

a typical image pattern of size (112

92) (Fig. 2) results to

a vector of dimension

], due to two factors: 1)

the computational difficulty of the eigen-decomposition of ma-

trices in the high-dimensional image space; 2) the degenerated

scatter matrices caused by the so-called “small sample size”

(SSS) problem, which widely exists in the FR tasks where the

number of training samples is smaller than the dimensionality

of the samples [4]–[6].

The traditional solution to the SSS problem requires the in-

corporation of a PCA step into the LDA framework. In this

approach, PCA is used as a preprocessing step for dimension-

ality reduction so as to discard the null space of the within-class

scatter matrix of the training data set. Then LDA is performed

in the lower dimensional PCA subspace [4]. However, it has

been shown that the discarded null space may contain signif-

icant discriminatory information [5], [6]. To prevent this from

happening, solutions without a separate PCA step, called direct

LDA(D-LDA) methods havebeenpresented recently [5],[6]. In

the D-LDA framework, data are processed directly in the orig-

inal high-dimensional input space avoiding the loss of signifi-

cant discriminatory information due to the PCA preprocessing

step.

In this paper, we introduce a new feature representation

method for FR tasks. The method combines the strengths of

the D-LDA and F-LDA approaches, while at the same time

overcomes their shortcomings and limitations. In the proposed

framework, hereafter DF-LDA, we first lower the dimension-

ality of the original input space by introducing a new variant

of D-LDA that results in a low-dimensional SSS-free subspace

where the most discriminatory features are preserved. The

variant of D-LDA developed here utilizes a modified Fisher’s

criterion to avoid a problem resulting from the wage of the

zero eigenvalues of the within-class scatter matrix as possible

divisors in [6]. Also, a weighting function is introduced into

the proposed variant of D-LDA, so that a subsequent F-LDA

196 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003

step can be applied to carefully reorient the SSS-free subspace

resulting in a set of optimal discriminant features for face

representation.

II. D

IRECT FRACTIONAL-STEP LDA (DF-LDA)

The problem of low-dimensional feature representation in FR

systems can be stated as follows. Given a set of

training face

images

, each of which is represented as a vector of

length

, i.e., belonging to one of

classes , where is the image size and de-

notes a

-dimensional real space, the objectiveis to find a trans-

formation

, based on optimization of certain separability cri-

teria, to produce a representation

, where

with . The representation should enhance the sepa-

rability of the different face objects under consideration.

A. Where are the Optimal Discriminant Features?

Let

and denote the between- and within-class

scatter matrices of the training image set, respectively. LDA-like

approaches such as the Fisherface method [4] find a set of basis

vectors, denoted by

that maximizes the ratio between

and

(1)

Assuming that

is nonsingular, the basis vectors cor-

respond to the first

eigenvectors with the largest eigenvalues

. The -dimensional representation is then

obtained by projecting the original face images onto the sub-

space spanned by the

eigenvectors. However, a degenerated

in (1) may be generated due to the SSS problem widely

existing in most FR tasks. It was noted in the introduction that

a possible solution is to apply a PCA step in order to remove

the null space of

prior to the maximization in (1). Never-

theless, it recently has been shown that the null space of

may contain significant discriminatory information [5], [6]. As

a consequence, some of significant discriminatory information

may be lost due to this preprocessing PCA step.

The basic premise of the D-LDA methods that attempt to

solve the SSS problem without a PCA step is, that the null space

contains significant discriminant information if the

projection of

is not zero in that direction, and that no

significant information will be lost if the null space of

discarded. Assuming that

and represent the null space of

and , while and

are the complement spaces of and , respectively, the op-

timal discriminant subspace sought by D-LDA is the intersec-

tion space

. The method in [6] first diagonalizes

to find when seek the solution of (1), while [5] diagonalizes

to find . Although it appears that the two methods are

not significantly different, it may be intractable to calculate

when the size of is large, which is the case in most FR ap-

plications. For example, a typical face pattern of (112

92) re-

sults to

and matrices with dimensionality (10 304

10 304). Fortunately, the rank of is determined by

rank

, with the number of image

classes, which is usually a small value in most of FR tasks, e.g.,

in the ORL database, resulting in rank .

can be easily found by solving eigenvectors of a (39 39)

matrix rather than the original (10304

10 304) matrix through

an algebraic transformation [3], [6]. Then

can be ob-

tained by solving the null space of projection of

into ,

while the projection is a small matrix of size (39

39).

Based on the analysis given above, it can be known that the

most significant discriminant information exist in the intersec-

tion subspace

, which is usually low-dimensionalso that

it becomes possible to further apply some sophisticated tech-

niques, such as the rotation strategy of the LDA subspace used

in F-LDA, to derive the optimal discriminant features from the

intersection subspace.

B. Variant of D-LDA

The maximization process in (1) is not directly linked to the

classification error which is the criterion of performance used

to measure the success of the FR procedure. Modified versions

of the method, such as the F-LDA approach, use a weighting

function in the input space, to penalize those classes that are

close and can potentially lead to misclassifications in the output

space. Thus, the weighted between-class scatter matrix can be

expressed as:

(2)

where

, is the

mean of class

, is the number of elements in , and

is the Euclidean distance between the means of class

and class . The weighting function is a monotonically

decreasing function of the distance

. The only constraint is

that the weight should drop faster than the Euclidean distance

between the means of class

and class with the authors in

[7] recommending weighting functions of the form

with .

Most LDA based algorithms including Fisherfaces [4] and

D-LDA [6] utilize the conventional Fisher’s criterion denoted

by (1). In this work we propose the utilization of a variant of the

conventional metric. The proposed metric can be expressed as

follows:

(3)

where

, and is the weighted be-

tween-class scatter matrix defined in (2). This modified Fisher’s

criterion can be proven to be equivalent to the conventional one

by introducing the analysis of [11] where it was shown that in

,if , and ,

and

, , the

function

has the maximum (including positive infinity)

at point

has the maximum at point .

For the reasons explained in Section II-A, we start by solving

the eigenvalue problem of

. It is intractable to directly

compute eigenvectors of

which is a large size

matrix. Fortunately, the first most significant eigen-

vectors of

, which correspond to nonzero eigenvalues,

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003 197

Fig. 1. Pseudocode for the computation of the DF-LDA algorithm.

can be indirectly derived from the eigenvectors of the matrix

with size , where [3]. Let

and be the th eigenvalue and its corresponding eigenvector

, , sorted in decreasing eigenvalue order.

Since

, is the eigenvector

To remove the null space of

, the first

eigenvectors: , whose cor-

responding eigenvalues are greater than 0, are used,

where

. It is not difficult to see that

, with diag ,a di-

agonal matrix. Let

. Projecting and

into the subspace spanned by ,wehave

and . Then, we diagonalize which is a

tractable matrix with size

. Let be the th eigenvector

, where , sorted in increasing order

according to corresponding eigenvalues

. In the set of ordered

eigenvectors, those that correspond to the smallest eigenvalues

maximize the ratio in (1) and they should be considered as the

most discriminatory features. We can discard the eigenvectors

with the largest eigenvalues, and denote the

selected

eigenvectors as

. Defining a matrix ,

we can obtain

, with diag ,

diagonal matrix.

Based on the derivation presented above, a set of optimal

discriminant feature basis vectors can be derived through

. To facilitate comparison, it should be mentioned at

this point that the D-LDA method of [6] uses the conventional

Fisher’s criterion of (1) with

replaced by .How-

ever, since the subspace spanned by

contains the intersection

space

, it is possible that there exist zero eigenvalues

. To prevent this from happening, a heuristic threshold

was introduced in [6]. A small threshold value

was set and

any value below

was adjusted to . Obviously, performance

heavily depends on the proper choice of the value for the artifi-

cial threshold

, which is done in a heuristic manner [6]. Unlike

the method in [6], due to the modified Fisher’s criterion of (3),

the nonsingularity of

can be guaranteed by

the following lemma.

Lemma 1: Suppose

is a real matrix of size . Fur-

thermore, let us assume that it can be represented as

where is a real matrix of size . Then, the matrix

is positive definite, i.e., , where is the

identity matrix.

Proof: Since

, is a real symmetric matrix.

Let

be any nonzero real vector, we have

. According to [12],

the matrix

that satisfies the above condition is positive

definite, i.e.,

Similar to , can be expressed as

, and then . Since

and is real symmetric it can

be easily seen that

is positive definite, and thus

is nonsingular.

198 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003

Fig. 2. Some sample images of three persons randomly chosen from the two databases. (Left): ORL. (Rright): UMIST.

(a) (b) (c)

Fig. 3. Distribution of 170 face images of five subjects (classes) randomly selected from the UMIST database in (a) PCA-based subspace, (b) D-LDA-based

subspace, and (c) DF-LDA-based subspace.

C. Rotation and Reorientation of the D-LDA Subspace

Through the enhanced D-LDA step discussed above, a low-

dimensional SSS-free subspace spanned by

has been derived

without losing the most important, for discrimination purposes,

information. In this subspace,

is nonsingular and has been

whitened due to

. Thus, an F-LDA step can be

safely applied to further reduce the dimensionality from

the required

now.

To this end, we firstly project the original face images into

the

-dimensional subspace, obtaining a representation

where . Let be the between-class

scatter matrix of

, and be the th eigenvector of

which corresponds to the smallest eigenvalue of . This

eigenvector will be discarded when dimensionality is reduced

from

to . A problem may be encountered during

the dimensionality reduction procedure. If classes

and

are well separated in the -dimensional input space, this will

produce a very small

. As a result, the two classes may

heavily overlapin the

-dimensional output space which

is orthogonal to

. To avoid the problem, a kind of “automatic

gain control” is introducedto the weighting procedure in F-LDA

[7], where dimensionality is reduced from

to at

fractional steps instead of one step directly. In each step,

and its eigenvectors are recomputed based on the changes

in the output space, so that the -dimensional

subspace is reoriented and severe overlap between classes in the

output space is avoided.

will not be discarded until itera-

tions are done.

It should be noted at this point that the approach of [7] has

only been applied in small dimensionality pattern spaces. To

the best of the author’s knowledge the work reported here

constitutes the first attempt to introduce fractional reorientation

in a realistic application involving large dimensionality spaces.

This becomes possible due to the integrated structure of the

DF-LDA algorithm, the pseudocode implementation of which

can be found in Fig. 1.

The effect of the above rotation strategy of the D-LDA sub-

space is illustrated in Fig. 3, where the first two most significant

features of each image extracted by PCA, D-LDA (the variant

proposed in Section II-B) and DF-LDA, respectively, are visu-

alized. The PCA-based representation shown in Fig. 3(a) is op-

timal in terms of image reconstruction, thereby provides some

insight on the original structure of image distribution, which

is highly complex and nonseparable. Although the separability

of subjects is greatly improved in the D-LDA-based subspace,

some classes still overlap as shown in Fig. 3(b). It can be seen

from Fig. 3(c) that the separability is further enhanced, and dif-

ferent classes tend to be equally spaced after a few fractional

(reorientation) steps.

III. E

XPERIMENTAL RESULTS

Two popular face databases, the ORL [8] and the UMIST

[13], are used to demonstrate the effectiveness of the proposed

DF-LDA framework. The ORL database contains 40 distinct

persons with ten images per person. The images are taken at

different time instances, with varying lighting conditions, fa-

cial expressions and facial details (glasses/no glasses). All per-

sons are in the upright, frontal position, with tolerance for some

side movement. The UMIST repository is a multiview database,

consisting of 575 images of 20 people, each covering a wide

range of poses from profile to frontal views. Fig. 2 depicts some

samples contained in the two databases, where each image is

scaled into (112

92), resulting in an input dimensionality of

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003 199

Fig. 4. Comparison of error rates obtained by the four FR methods as functions of the number of feature vectors, where

(

is used in DF-LDA for

the ORL,

(

for the UMIST, and

=20

for both.

To start the FR experiments, each one of the two databases

is randomly partitioned into a training set and a test set with

no overlap between the two. The partition of the ORL database

is done following the recommendation of [14], [15] which call

for five images per person randomly chosen for training, and the

other fivefortesting.Thus,atrainingsetof 200 images and a test

set with 200 images are created. For the UMIST database, eight

images per person are randomly chosen to produce a training set

of 160 images. The remaining 415 images are used to form the

test set. In the following experiments, the figures of merit are

error rates averaged over five runs (four runs in [14] and three

runs in [15]), each run being performed on such random parti-

tions in the two databases. It is worthy to mention here that both

experimental setups introduce SSS conditions since the number

of training samples are in both cases much smaller than the

dimensionality of the input space. Also, we do have observed

some partition cases, where zero eigenvalues occurred in

discussed in Section II-B. In these cases, in contrast with the

failure of D-LDA [6], DF-LDA was still able to perform well.

In addition to D-LDA [6], DF-LDA is compared against two

popular feature selection methods, namely: Eigenfaces [3] and

Fisherfaces [4]. For each of the four methods, the FR procedure

consists of 1) a feature extraction step where four kinds of fea-

ture representation of each training or test sample are extracted

by projecting the sample onto the four feature spaces general-

ized by Eigenface, Fisherface, D-LDA, and DF-LDA, respec-

tively, and 2) a classification step in which each feature repre-

sentation obtained in the first step is fed into a simple nearest

neighbor classifier. It should be noted at this point that, since the

focus in this short paper is on feature extraction, a very simple

classifier, namely nearest neighbor, is used in step 2). We antic-

ipate that the classification accuracy of all four methods com-

pared here will improve if a more sophisticated classifier is used

instead of the nearest neighbor. However, such an experiment is

beyond the scope of this short paper.

The error rate curves obtained for the four methods are shown

in Fig. 4 as functions of the number of feature vectors. The

number of fractional steps used in DF-LDA is

and the

weighted function utilized is

. From Fig. 4, it can be

seen that the performance of DF-LDA is overall superior to that

TABLE I

VERAGE PERCENTAGE OF ERROR RATES OF DF-LDA O

VER THAT OF OTHERS

of the other three methods on both databases. Let and be

the error rates of theDF-LDAandone of the other threemethods

respectively, where

is the number of feature vectors. We can

obtain the average percentage of the error rate of DF-LDA over

that of the other methods by

for the ORL

database and

for the UMIST database.

The results summarized in Table I indicate that the average error

rate of DF-LDA is approximately 50.5%, 43% and 80% of that

of Eigenface, Fisherface and D-LDA, respectively. It is of in-

terest to observe the performance of Eigenfaces vs that of Fish-

erfaces. Not surprisingly, Eigenfaces outperform Fisherfaces in

the ORL database, because Fisherfaces may lost significant dis-

criminant information due to the intermediate PCA step. The

similar observation has also been found in [10], [16].

The weighting function

influences the performance

of the DF-LDA method. For different feature extraction

tasks, appropriate values for the weighting exponent function

should be determined through experimentation using the

available training set. However, it appears that there is a set

of values for which good results can be obtained for a wide

range of applications. Following the recommendation in [7]

we examine the performance of the DF-LDA method for

. Results obtained through

the utilization of these weighting functions are depicted in

Fig. 5 where error rates are plotted against the feature vectors

selected (output space dimensionality). The lowest error rate

on the ORL database is approximately 4.0% and it is obtained

using a weighting function of

and a set of

feature basis vectors, a result comparable to the best

results reported previously in the literatures [14], [15].

Face recognition using LDA-based algorithms

Figures

Citations

2D and 3D face recognition: A survey

A Survey of Face Recognition Techniques

Template Matching Techniques in Computer Vision: Theory and Practice

Recent advances in visual and infrared face recognition: a review

Geometric Mean for Subspace Selection

References

Matrix Analysis

Eigenfaces for recognition

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

PCA versus LDA

Face recognition: a convolutional neural-network approach

Related Papers (5)

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

Eigenfaces for recognition

Face recognition: A literature survey

PCA versus LDA

The FERET evaluation methodology for face-recognition algorithms

Frequently Asked Questions (14)

Q1. What have the authors contributed in "Face recognition using lda-based algorithms" ?

Q2. What is the way to avoid the problem?

Q3. What is the objective of the problem of low-dimensional feature representation in FR systems?

Q4. What is the optimal discriminant subspace in the FR system?

Q5. How should the weighting exponent function be determined?

Q6. What is the heuristic threshold for a real matrix?

Q7. What is the rank of the intersection subspace?

Q8. What is the eigenvector of the dimensional subspace?

Q9. What is the lowest error rate on the ORL database?

Q10. How many image classes are in the rank of a FR task?

Q11. What is the weighted between class scatter matrix?

Q12. What is the representation of the D-LDA subspace?

Q13. What is the description of the work?

Q14. What is the difference between the two sets of experiments?