scispace - formally typeset
Open AccessJournal ArticleDOI

Face recognition using LDA-based algorithms

TLDR
A new algorithm is proposed that deals with both of the shortcomings in an efficient and cost effective manner of traditional linear discriminant analysis methods for face recognition systems.
Abstract
Low-dimensional feature representation with enhanced discriminatory power is of paramount importance to face recognition (FR) systems. Most of traditional linear discriminant analysis (LDA)-based methods suffer from the disadvantage that their optimality criteria are not directly related to the classification ability of the obtained feature representation. Moreover, their classification accuracy is affected by the "small sample size" (SSS) problem which is often encountered in FR tasks. In this paper, we propose a new algorithm that deals with both of the shortcomings in an efficient and cost effective manner. The proposed method is compared, in terms of classification accuracy, to other commonly used FR methods on two face databases. Results indicate that the performance of the proposed method is overall superior to those of traditional FR approaches, such as the eigenfaces, fisherfaces, and D-LDA methods.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003 195
Brief Papers_______________________________________________________________________________
Face Recognition Using LDA-Based Algorithms
Juwei Lu, Kostantinos N. Plataniotis, and Anastasios N. Venetsanopoulos
Abstract—Low-dimensional feature representation with en-
hanced discriminatory power is of paramount importance to face
recognition (FR) systems. Most of traditional linear discriminant
analysis (LDA)-based methods suffer from the disadvantage that
their optimality criteria are not directly related to the classifi-
cation ability of the obtained feature representation. Moreover,
their classification accuracy is affected by the “small sample size”
(SSS) problem which is often encountered in FR tasks. In this
short paper, we propose a new algorithm that deals with both of
the shortcomings in an efficient and cost effective manner. The
proposed here method is compared, in terms of classification
accuracy, to other commonly used FR methods on two face
databases. Results indicate that the performance of the proposed
method is overall superior to those of traditional FR approaches,
such as the Eigenfaces, Fisherfaces, and D-LDA methods.
Index Terms—Direct LDA, Eigenfaces, face recognition, Fish-
erfaces, fractional-step LDA, linear discriminant analysis (LDA),
principle component analysis (PCA).
I. INTRODUCTION
F
EATURE selection for face representation is one of central
issues to face recognition (FR) systems. Among various
solutions to the problem (see [1], [2] for a survey), the most
successful seems to be those appearance-based approaches,
which generally operate directly on images or appearances
of face objects and process the images as two-dimensional
(2-D) holistic patterns, to avoid difficulties associated with
three-dimensional (3-D) modeling, and shape or landmark
detection [2]. Principle component analysis (PCA) and linear
discriminant analysis (LDA) are two powerful tools used for
data reduction and feature extraction in the appearance-based
approaches. Two state-of-the-art FR methods, Eigenfaces [3]
and Fisherfaces [4], built on the two techniques, respectively,
have been proved to be very successful.
It is generally believed that, when it comes to solving prob-
lems of pattern classification, LDA-based algorithms outper-
form PCA-based ones, since the former optimizes the low-di-
mensional representation of the objects with focus on the most
discriminant feature extraction while the latter achieves simply
object reconstruction [4]–[6]. However, the classification per-
formance of traditional LDA is often degraded by the fact that
their separability criteria are not directly related to their clas-
sification accuracy in the output space [7]. A solution to the
Manuscript received January 15, 2001; revised April 16, 2002.
The authors are with Multimedia Laboratory, Edward S. Rogers, Sr. Depart-
ment of Electrical and Computer Engineering, University of Toronto, Toronto,
ON M5S 3G4, Canada (e-mail: kostas@dsp.toronto.edu).
Digital Object Identifier 10.1109/TNN.2002.806647
problem is to introduce weighting functions into LDA. Object
classes that are closer together in the output space, and thus can
potentially result in misclassification, should be more heavily
weighted in the input space. This idea has been further extended
in [7] with the introduction of the fractional-step linear discrim-
inant analysis algorithm (F-LDA), where the dimensionality re-
duction is implemented in a few small fractional steps allowing
for the relevant distances to be more accurately weighted. Al-
though the method has been successfully tested on low-dimen-
sional patterns whose dimensionality is
, it cannot be di-
rectly applied to high-dimensional patterns, such as those face
images used in this paper [it should be noted at this point that
a typical image pattern of size (112
92) (Fig. 2) results to
a vector of dimension
], due to two factors: 1)
the computational difficulty of the eigen-decomposition of ma-
trices in the high-dimensional image space; 2) the degenerated
scatter matrices caused by the so-called “small sample size”
(SSS) problem, which widely exists in the FR tasks where the
number of training samples is smaller than the dimensionality
of the samples [4]–[6].
The traditional solution to the SSS problem requires the in-
corporation of a PCA step into the LDA framework. In this
approach, PCA is used as a preprocessing step for dimension-
ality reduction so as to discard the null space of the within-class
scatter matrix of the training data set. Then LDA is performed
in the lower dimensional PCA subspace [4]. However, it has
been shown that the discarded null space may contain signif-
icant discriminatory information [5], [6]. To prevent this from
happening, solutions without a separate PCA step, called direct
LDA(D-LDA) methods havebeenpresented recently [5],[6]. In
the D-LDA framework, data are processed directly in the orig-
inal high-dimensional input space avoiding the loss of signifi-
cant discriminatory information due to the PCA preprocessing
step.
In this paper, we introduce a new feature representation
method for FR tasks. The method combines the strengths of
the D-LDA and F-LDA approaches, while at the same time
overcomes their shortcomings and limitations. In the proposed
framework, hereafter DF-LDA, we first lower the dimension-
ality of the original input space by introducing a new variant
of D-LDA that results in a low-dimensional SSS-free subspace
where the most discriminatory features are preserved. The
variant of D-LDA developed here utilizes a modified Fisher’s
criterion to avoid a problem resulting from the wage of the
zero eigenvalues of the within-class scatter matrix as possible
divisors in [6]. Also, a weighting function is introduced into
the proposed variant of D-LDA, so that a subsequent F-LDA
1045-9227/03$17.00 © 2003 IEEE

196 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003
step can be applied to carefully reorient the SSS-free subspace
resulting in a set of optimal discriminant features for face
representation.
II. D
IRECT FRACTIONAL-STEP LDA (DF-LDA)
The problem of low-dimensional feature representation in FR
systems can be stated as follows. Given a set of
training face
images
, each of which is represented as a vector of
length
, i.e., belonging to one of
classes , where is the image size and de-
notes a
-dimensional real space, the objectiveis to find a trans-
formation
, based on optimization of certain separability cri-
teria, to produce a representation
, where
with . The representation should enhance the sepa-
rability of the different face objects under consideration.
A. Where are the Optimal Discriminant Features?
Let
and denote the between- and within-class
scatter matrices of the training image set, respectively. LDA-like
approaches such as the Fisherface method [4] find a set of basis
vectors, denoted by
that maximizes the ratio between
and
(1)
Assuming that
is nonsingular, the basis vectors cor-
respond to the first
eigenvectors with the largest eigenvalues
of
. The -dimensional representation is then
obtained by projecting the original face images onto the sub-
space spanned by the
eigenvectors. However, a degenerated
in (1) may be generated due to the SSS problem widely
existing in most FR tasks. It was noted in the introduction that
a possible solution is to apply a PCA step in order to remove
the null space of
prior to the maximization in (1). Never-
theless, it recently has been shown that the null space of
may contain significant discriminatory information [5], [6]. As
a consequence, some of significant discriminatory information
may be lost due to this preprocessing PCA step.
The basic premise of the D-LDA methods that attempt to
solve the SSS problem without a PCA step is, that the null space
of
contains significant discriminant information if the
projection of
is not zero in that direction, and that no
significant information will be lost if the null space of
is
discarded. Assuming that
and represent the null space of
and , while and
are the complement spaces of and , respectively, the op-
timal discriminant subspace sought by D-LDA is the intersec-
tion space
. The method in [6] first diagonalizes
to find when seek the solution of (1), while [5] diagonalizes
to find . Although it appears that the two methods are
not significantly different, it may be intractable to calculate
when the size of is large, which is the case in most FR ap-
plications. For example, a typical face pattern of (112
92) re-
sults to
and matrices with dimensionality (10 304
10 304). Fortunately, the rank of is determined by
rank
, with the number of image
classes, which is usually a small value in most of FR tasks, e.g.,
in the ORL database, resulting in rank .
can be easily found by solving eigenvectors of a (39 39)
matrix rather than the original (10304
10 304) matrix through
an algebraic transformation [3], [6]. Then
can be ob-
tained by solving the null space of projection of
into ,
while the projection is a small matrix of size (39
39).
Based on the analysis given above, it can be known that the
most significant discriminant information exist in the intersec-
tion subspace
, which is usually low-dimensionalso that
it becomes possible to further apply some sophisticated tech-
niques, such as the rotation strategy of the LDA subspace used
in F-LDA, to derive the optimal discriminant features from the
intersection subspace.
B. Variant of D-LDA
The maximization process in (1) is not directly linked to the
classification error which is the criterion of performance used
to measure the success of the FR procedure. Modified versions
of the method, such as the F-LDA approach, use a weighting
function in the input space, to penalize those classes that are
close and can potentially lead to misclassifications in the output
space. Thus, the weighted between-class scatter matrix can be
expressed as:
(2)
where
, is the
mean of class
, is the number of elements in , and
is the Euclidean distance between the means of class
and class . The weighting function is a monotonically
decreasing function of the distance
. The only constraint is
that the weight should drop faster than the Euclidean distance
between the means of class
and class with the authors in
[7] recommending weighting functions of the form
with .
Most LDA based algorithms including Fisherfaces [4] and
D-LDA [6] utilize the conventional Fisher’s criterion denoted
by (1). In this work we propose the utilization of a variant of the
conventional metric. The proposed metric can be expressed as
follows:
(3)
where
, and is the weighted be-
tween-class scatter matrix defined in (2). This modified Fisher’s
criterion can be proven to be equivalent to the conventional one
by introducing the analysis of [11] where it was shown that in
,if , and ,
and
, , the
function
has the maximum (including positive infinity)
at point
has the maximum at point .
For the reasons explained in Section II-A, we start by solving
the eigenvalue problem of
. It is intractable to directly
compute eigenvectors of
which is a large size
matrix. Fortunately, the first most significant eigen-
vectors of
, which correspond to nonzero eigenvalues,

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003 197
Fig. 1. Pseudocode for the computation of the DF-LDA algorithm.
can be indirectly derived from the eigenvectors of the matrix
with size , where [3]. Let
and be the th eigenvalue and its corresponding eigenvector
of
, , sorted in decreasing eigenvalue order.
Since
, is the eigenvector
of
.
To remove the null space of
, the first
eigenvectors: , whose cor-
responding eigenvalues are greater than 0, are used,
where
. It is not difficult to see that
, with diag ,a di-
agonal matrix. Let
. Projecting and
into the subspace spanned by ,wehave
and . Then, we diagonalize which is a
tractable matrix with size
. Let be the th eigenvector
of
, where , sorted in increasing order
according to corresponding eigenvalues
. In the set of ordered
eigenvectors, those that correspond to the smallest eigenvalues
maximize the ratio in (1) and they should be considered as the
most discriminatory features. We can discard the eigenvectors
with the largest eigenvalues, and denote the
selected
eigenvectors as
. Defining a matrix ,
we can obtain
, with diag ,
a
diagonal matrix.
Based on the derivation presented above, a set of optimal
discriminant feature basis vectors can be derived through
. To facilitate comparison, it should be mentioned at
this point that the D-LDA method of [6] uses the conventional
Fisher’s criterion of (1) with
replaced by .How-
ever, since the subspace spanned by
contains the intersection
space
, it is possible that there exist zero eigenvalues
in
. To prevent this from happening, a heuristic threshold
was introduced in [6]. A small threshold value
was set and
any value below
was adjusted to . Obviously, performance
heavily depends on the proper choice of the value for the artifi-
cial threshold
, which is done in a heuristic manner [6]. Unlike
the method in [6], due to the modified Fisher’s criterion of (3),
the nonsingularity of
can be guaranteed by
the following lemma.
Lemma 1: Suppose
is a real matrix of size . Fur-
thermore, let us assume that it can be represented as
where is a real matrix of size . Then, the matrix
is positive definite, i.e., , where is the
identity matrix.
Proof: Since
, is a real symmetric matrix.
Let
be any nonzero real vector, we have
. According to [12],
the matrix
that satisfies the above condition is positive
definite, i.e.,
.
Similar to , can be expressed as
, and then . Since
and is real symmetric it can
be easily seen that
is positive definite, and thus
is nonsingular.

198 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003
Fig. 2. Some sample images of three persons randomly chosen from the two databases. (Left): ORL. (Rright): UMIST.
(a) (b) (c)
Fig. 3. Distribution of 170 face images of five subjects (classes) randomly selected from the UMIST database in (a) PCA-based subspace, (b) D-LDA-based
subspace, and (c) DF-LDA-based subspace.
C. Rotation and Reorientation of the D-LDA Subspace
Through the enhanced D-LDA step discussed above, a low-
dimensional SSS-free subspace spanned by
has been derived
without losing the most important, for discrimination purposes,
information. In this subspace,
is nonsingular and has been
whitened due to
. Thus, an F-LDA step can be
safely applied to further reduce the dimensionality from
to
the required
now.
To this end, we firstly project the original face images into
the
-dimensional subspace, obtaining a representation
where . Let be the between-class
scatter matrix of
, and be the th eigenvector of
which corresponds to the smallest eigenvalue of . This
eigenvector will be discarded when dimensionality is reduced
from
to . A problem may be encountered during
the dimensionality reduction procedure. If classes
and
are well separated in the -dimensional input space, this will
produce a very small
. As a result, the two classes may
heavily overlapin the
-dimensional output space which
is orthogonal to
. To avoid the problem, a kind of “automatic
gain control” is introducedto the weighting procedure in F-LDA
[7], where dimensionality is reduced from
to at
fractional steps instead of one step directly. In each step,
and its eigenvectors are recomputed based on the changes
of
in the output space, so that the -dimensional
subspace is reoriented and severe overlap between classes in the
output space is avoided.
will not be discarded until itera-
tions are done.
It should be noted at this point that the approach of [7] has
only been applied in small dimensionality pattern spaces. To
the best of the author’s knowledge the work reported here
constitutes the first attempt to introduce fractional reorientation
in a realistic application involving large dimensionality spaces.
This becomes possible due to the integrated structure of the
DF-LDA algorithm, the pseudocode implementation of which
can be found in Fig. 1.
The effect of the above rotation strategy of the D-LDA sub-
space is illustrated in Fig. 3, where the first two most significant
features of each image extracted by PCA, D-LDA (the variant
proposed in Section II-B) and DF-LDA, respectively, are visu-
alized. The PCA-based representation shown in Fig. 3(a) is op-
timal in terms of image reconstruction, thereby provides some
insight on the original structure of image distribution, which
is highly complex and nonseparable. Although the separability
of subjects is greatly improved in the D-LDA-based subspace,
some classes still overlap as shown in Fig. 3(b). It can be seen
from Fig. 3(c) that the separability is further enhanced, and dif-
ferent classes tend to be equally spaced after a few fractional
(reorientation) steps.
III. E
XPERIMENTAL RESULTS
Two popular face databases, the ORL [8] and the UMIST
[13], are used to demonstrate the effectiveness of the proposed
DF-LDA framework. The ORL database contains 40 distinct
persons with ten images per person. The images are taken at
different time instances, with varying lighting conditions, fa-
cial expressions and facial details (glasses/no glasses). All per-
sons are in the upright, frontal position, with tolerance for some
side movement. The UMIST repository is a multiview database,
consisting of 575 images of 20 people, each covering a wide
range of poses from profile to frontal views. Fig. 2 depicts some
samples contained in the two databases, where each image is
scaled into (112
92), resulting in an input dimensionality of
.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 1, JANUARY 2003 199
Fig. 4. Comparison of error rates obtained by the four FR methods as functions of the number of feature vectors, where
w
(
d
)=
d
is used in DF-LDA for
the ORL,
w
(
d
)=
d
for the UMIST, and
r
=20
for both.
To start the FR experiments, each one of the two databases
is randomly partitioned into a training set and a test set with
no overlap between the two. The partition of the ORL database
is done following the recommendation of [14], [15] which call
for five images per person randomly chosen for training, and the
other fivefortesting.Thus,atrainingsetof 200 images and a test
set with 200 images are created. For the UMIST database, eight
images per person are randomly chosen to produce a training set
of 160 images. The remaining 415 images are used to form the
test set. In the following experiments, the figures of merit are
error rates averaged over five runs (four runs in [14] and three
runs in [15]), each run being performed on such random parti-
tions in the two databases. It is worthy to mention here that both
experimental setups introduce SSS conditions since the number
of training samples are in both cases much smaller than the
dimensionality of the input space. Also, we do have observed
some partition cases, where zero eigenvalues occurred in
as
discussed in Section II-B. In these cases, in contrast with the
failure of D-LDA [6], DF-LDA was still able to perform well.
In addition to D-LDA [6], DF-LDA is compared against two
popular feature selection methods, namely: Eigenfaces [3] and
Fisherfaces [4]. For each of the four methods, the FR procedure
consists of 1) a feature extraction step where four kinds of fea-
ture representation of each training or test sample are extracted
by projecting the sample onto the four feature spaces general-
ized by Eigenface, Fisherface, D-LDA, and DF-LDA, respec-
tively, and 2) a classification step in which each feature repre-
sentation obtained in the first step is fed into a simple nearest
neighbor classifier. It should be noted at this point that, since the
focus in this short paper is on feature extraction, a very simple
classifier, namely nearest neighbor, is used in step 2). We antic-
ipate that the classification accuracy of all four methods com-
pared here will improve if a more sophisticated classifier is used
instead of the nearest neighbor. However, such an experiment is
beyond the scope of this short paper.
The error rate curves obtained for the four methods are shown
in Fig. 4 as functions of the number of feature vectors. The
number of fractional steps used in DF-LDA is
and the
weighted function utilized is
. From Fig. 4, it can be
seen that the performance of DF-LDA is overall superior to that
TABLE I
A
VERAGE PERCENTAGE OF ERROR RATES OF DF-LDA O
VER THAT OF OTHERS
of the other three methods on both databases. Let and be
the error rates of theDF-LDAandone of the other threemethods
respectively, where
is the number of feature vectors. We can
obtain the average percentage of the error rate of DF-LDA over
that of the other methods by
for the ORL
database and
for the UMIST database.
The results summarized in Table I indicate that the average error
rate of DF-LDA is approximately 50.5%, 43% and 80% of that
of Eigenface, Fisherface and D-LDA, respectively. It is of in-
terest to observe the performance of Eigenfaces vs that of Fish-
erfaces. Not surprisingly, Eigenfaces outperform Fisherfaces in
the ORL database, because Fisherfaces may lost significant dis-
criminant information due to the intermediate PCA step. The
similar observation has also been found in [10], [16].
The weighting function
influences the performance
of the DF-LDA method. For different feature extraction
tasks, appropriate values for the weighting exponent function
should be determined through experimentation using the
available training set. However, it appears that there is a set
of values for which good results can be obtained for a wide
range of applications. Following the recommendation in [7]
we examine the performance of the DF-LDA method for
. Results obtained through
the utilization of these weighting functions are depicted in
Fig. 5 where error rates are plotted against the feature vectors
selected (output space dimensionality). The lowest error rate
on the ORL database is approximately 4.0% and it is obtained
using a weighting function of
and a set of
feature basis vectors, a result comparable to the best
results reported previously in the literatures [14], [15].

Citations
More filters
Journal ArticleDOI

2D and 3D face recognition: A survey

TL;DR: This paper provides an ''ex cursus'' of recent face recognition research trends in 2D imagery and 3D model based algorithms and proposes possible future directions.
Journal ArticleDOI

A Survey of Face Recognition Techniques

TL;DR: A discussion outlining the incentive for using face recognition, the applications of this technology, and some of the difficulties plaguing current systems with regard to this task has been provided.
Book

Template Matching Techniques in Computer Vision: Theory and Practice

TL;DR: This book and the accompanying website, focus on template matching, a subset of object recognition techniques of wide applicability, which has proved to be particularly effective for face recognition applications.
Journal ArticleDOI

Recent advances in visual and infrared face recognition: a review

TL;DR: This paper provides an up-to-date review of research efforts in face recognition techniques based on two-dimensional images in the visual and infrared (IR) spectra.
Journal ArticleDOI

Geometric Mean for Subspace Selection

TL;DR: Preliminary experimental results show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.
References
More filters
Book

Matrix Analysis

TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.
Journal ArticleDOI

Eigenfaces for recognition

TL;DR: A near-real-time computer system that can locate and track a subject's head, and then recognize the person by comparing characteristics of the face to those of known individuals, and that is easy to implement using a neural network architecture.
Journal ArticleDOI

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
Journal ArticleDOI

PCA versus LDA

TL;DR: In this article, the authors show that when the training data set is small, PCA can outperform LDA and, also, that PCA is less sensitive to different training data sets.
Journal ArticleDOI

Face recognition: a convolutional neural-network approach

TL;DR: A hybrid neural-network for human face recognition which compares favourably with other methods and analyzes the computational complexity and discusses how new classes could be added to the trained recognizer.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What have the authors contributed in "Face recognition using lda-based algorithms" ?

In this short paper, the authors propose a new algorithm that deals with both of the shortcomings in an efficient and cost effective manner. 

To avoid the problem, a kind of “automatic gain control” is introduced to the weighting procedure in F-LDA [7], where dimensionality is reduced from to atfractional steps instead of one step directly. 

Given a set of training face images , each of which is represented as a vector of length , i.e., belonging to one of classes , where is the image size and denotes a -dimensional real space, the objective is to find a transformation , based on optimization of certain separability criteria, to produce a representation , where with . 

Assuming that and represent the null space of and , while and are the complement spaces of and , respectively, the optimal discriminant subspace sought by D-LDA is the intersection space . 

For different feature extraction tasks, appropriate values for the weighting exponent function should be determined through experimentation using the available training set. 

According to [12], the matrix that satisfies the above condition is positive definite, i.e., .Similar to , can be expressed as , and then . 

Based on the analysis given above, it can be known that the most significant discriminant information exist in the intersection subspace , which is usually low-dimensional so that it becomes possible to further apply some sophisticated techniques, such as the rotation strategy of the LDA subspace used in F-LDA, to derive the optimal discriminant features from the intersection subspace. 

In each step, and its eigenvectors are recomputed based on the changes of in the output space, so that the -dimensional subspace is reoriented and severe overlap between classes in the output space is avoided. 

The lowest error rate on the ORL database is approximately 4.0% and it is obtained using a weighting function of and a set of feature basis vectors, a result comparable to the best results reported previously in the literatures [14], [15]. 

the rank of is determined by rank , with the number of image classes, which is usually a small value in most of FR tasks, e.g.,in the ORL database, resulting in rank . 

the weighted between-class scatter matrix can be expressed as:(2)where , is the mean of class , is the number of elements in , andis the Euclidean distance between the means of class and class . 

The PCA-based representation shown in Fig. 3(a) is optimal in terms of image reconstruction, thereby provides some insight on the original structure of image distribution, which is highly complex and nonseparable. 

To the best of the author’s knowledge the work reported here constitutes the first attempt to introduce fractional reorientationin a realistic application involving large dimensionality spaces. 

It is worthy to mention here that both experimental setups introduce SSS conditions since the number of training samples are in both cases much smaller than the dimensionality of the input space.