scispace - formally typeset
Open AccessJournal ArticleDOI

Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Applications to Face and Palm Biometrics

Reads0
Chats0
TLDR
In this paper, an unsupervised discriminant projection (UDP) technique for dimensionality reduction of high-dimensional data in small sample size cases is proposed, which can be seen as a linear approximation of a multimanifolds-based learning framework taking into account both the local and nonlocal quantities.
Abstract
This paper develops an unsupervised discriminant projection (UDP) technique for dimensionality reduction of high-dimensional data in small sample size cases. UDP can be seen as a linear approximation of a multimanifolds-based learning framework which takes into account both the local and nonlocal quantities. UDP characterizes the local scatter as well as the nonlocal scatter, seeking to find a projection that simultaneously maximizes the nonlocal scatter and minimizes the local scatter. This characteristic makes UDP more intuitive and more powerful than the most up-to-date method, locality preserving projection (LPP), which considers only the local scatter for clustering or classification tasks. The proposed method is applied to face and palm biometrics and is examined using the Yale, FERET, and AR face image databases and the PolyU palmprint database. The experimental results show that UDP consistently outperforms LPP and PCA and outperforms LDA when the training sample size per class is small. This demonstrates that UDP is a good choice for real-world biometrics applications

read more

Content maybe subject to copyright    Report

Globally Maximizing, Locally Minimizing:
Unsupervised Discriminant Projection with
Applications to Face and Palm Biometrics
Jian Yang, David Zhang, Senior Member, IEEE, Jing-yu Yang, and Ben Niu
Abstract—This paper develops an unsupervised discriminant projection (UDP) technique for dimensionality reduction of high-
dimensional data in small sample size cases. UDP can be seen as a linear approximation of a multimanifolds-based learning framework
which takes into account both the local and nonlocal quantities. UDP characterizes the local scatter aswell as the nonlocal scatter, seeking
to find a projection that simultaneously maximizes the nonlocal scatter and minimizes the local scatter. This characteristic makes UDP
more intuitive and more powerful than the most up-to-date method, Locality Preserving Projection (LPP), which considers only the local
scatter for clustering or classification tasks. The proposed method is applied to face and palm biometrics and is examined using the Yale,
FERET, and AR face image databases and the PolyU palmprint database. The experimental results show that UDP consistently
outperforms LPP and PCA and outperforms LDA when the training sample size per class is small. This demonstrates that UDP is a good
choice for real-world biometrics applications.
Index Terms—Dimensionality reduction, feature extraction, subspace learning, Fisher linear discriminant analysis (LDA), manifold
learning, biometrics, face recognition, palmprint recognition.
Ç
1INTRODUCTION
D
IMENSIONALITY reduction is the construction of a mean-
ingful low-dimensional representation of high-dimen-
sional data. Since thereare large volumes of high-dimensional
data in numerous real-world applications, dimensionality
reduction is a fundamental problem in many scientific fields.
From the perspective of pattern recognition, dimensionality
reduction is an effective means of avoiding the “curse of
dimensionality” [1] and improving the computational effi-
ciency of pattern matching.
Researchers have developed many useful dimensionality
reduction techniques. These techniques can be broadly
categorized into two classes: linear and nonlinear. Linear
dimensionality reduction seeks to find a meaningful low-
dimensional subspace in a high-dimensional input space.
This subspace can provide a compact representation of
higher-dimensional data when the structure of data
embedded in the input space is linear. PCA and LDA are
two well-known linear subspace learning methods which
have been extensively used in pattern recognition and
computer vision areas and have become the most popular
techniques for face recognition and other biometrics [2], [3],
[4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [39].
Linear models, however, may fail to discover essential data
structures that are nonlinear. A number of nonlinear
dimensionality reduction techniques have been developed
to address this problem, withtwo in particular attracting wide
attention: kernel-based techniques and manifold learning-
based techniques. The basic idea of kernel-based techniques is
to implicitly map observed patterns into potentially much
higher dimensional feature vectors by using a nonlinear
mapping determined by a kernel. This makes it possible for
the nonlinear structure of data in observation space tobecome
linear in feature space, allowing the use of linear techniques to
deal with the data. The representative techniques are kernel
principal component analysis (KPCA) [15] and kernel Fisher
discriminant (KFD) [16], [17]. Both have proven to be effective
in many real-world applications [18], [19], [20].
In contrast with kernel-based techniques, the motivation
of manifold learning is straightforward as it seeks to
directly find the intrinsic low-dimensional nonlinear data
structures hidden in observation space. The past few years
have seen many manifold-based learning algorithms for
discovering intrinsic low-dimensional embedding of data
proposed. Among the most well-known are isometric
feature mapping (ISOMAP) [22], local linear embedding
(LLE) [23], and Laplacian Eigenmap [24]. Some experiments
have shown that these methods can find perceptually
meaningful embeddings for face or digit images. They also
yielded impressive results on other artificial and real-world
data sets. Recently, Yan et al. [33] proposed a general
dimensionality reduction framework called graph embed-
ding. LLE, ISOMAP, and Laplacian Eigenmap can all be
reformulated as a unified model in this framework.
650 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 4, APRIL 2007
. J. Yang is with the Biometric Research Centre, Department of Computing,
Hong Kong Polytechnic University , Kowloon, Hong Kong and the
Department of Computer Science, Nanjing University of Science and
Technology, Nanjing 210094, P.R. China.
E-mail: csjyang@comp.polyu.edu.hk.
. D. Zhang and B. Niu are with the Biometric Research Centre, Department
of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong.
E-mail: {csdzhang, csniuben}@comp.polyu.edu.hk.
. J.-y. Yang is with the Department of Computer Science, Nanjing
University of Science and Technology, Nanjing 210094, P.R. China.
E-mail: yangjy@mail.njust.edu.cn.
Manuscript received 17 Jan. 2006; revised 5 June 2006; accepted 26 Sept.
2006; published online 18 Jan. 2007.
Recommended for acceptance by S. Prabhakar, J. Kittler, D. Maltoni,
L. O’Gorman, and T. Tan.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.organdreference IEEECSLog Number TPAMISI-0021-0106.
Digital Object Identifier no. 10.1109/TPAMI.2007.1008.
0162-8828/07/$25.00 ß 2007 IEEE Published by the IEEE Computer Society

One problem with current manifold learning techniques
is that they might be unsuitable for pattern recognition
tasks. There are two reasons for this. First, as it is currently
conceived, manifold learning is limited in that it is modeled
based on a characterization of “locality,” a modeling that has
no direct connection to classification. This is unproblematic
for existing manifold learning algorithms as they seek to
mode l a simple manifold, for example, to recover an
embedding of one person’s face images [21], [22], [23].
However, if face images do exist on a manifold, different
persons’ face images could lie on different manifolds. To
recognize faces, it would be necessary to distinguish
between images from different manifolds. For achieving
an optimal recognition result, the recovered embeddings
corresponding to different face manifolds should be as
separate as possible in the final embedding space. This poses
a problem that we might call “classification-oriented multi-
manifolds learning.” This problem cannot be addressed by
current manifold learning algorithms, including some
supervised versions [25], [26], [27] because they are all
based on the characterization of “locality.” The local
quantity suffices for modeling a single manifold, but does
not suffice for modeling multimanifolds for classification
purposes. To make different embeddings corresponding to
different classes mutually separate, however, it is crucial to
have the “nonlocal” quantity, which embodies the distance
between embeddings. In short, it is necessary to characterize
the “nonlocality” when modeling multimanifolds.
The second reason why most manifold learning algo-
rithms, for example, ISOMAP, LLE, and Laplacian Eigenmap,
are unsuitable for pattern recognition tasks is that they can
yield an embedding directly based on the training data set
but, because of the implicitness of the nonlinear map, when
applied to a new sample, they cannot find the sample’s image
in the embedding space. This limits the applications of these
algorithms to pattern recognition problems. Although some
research has shown that it is possible to construct an explicit
map from input space to embedding space [28], [29], [30], the
effectiveness of these kinds of maps on real-world classifica-
tion problems still needs to be demonstrated.
Recently, He et al. [31], [32] proposed Locality Preserving
Projections (LPP), which is a linear subspace learning
method derived from Laplacian Eigenmap. In contrast to
most manifold learning algorithms, LPP possesses the
remarkable advantage that it can generate an explicit map.
This map is linear and easily computable, like that of PCA or
LDA. It is also effective, yielding encouraging results on face
recognition tasks. Yet, as it is modeled on the basis of
“locality,” LPP, like most manifold learning algorithms, has
the weakness of having no direct connection to classification.
The objective function of LPP is to minimize the local
quantity, i.e., the local scatter of the projected data. In some
cases, this criterion cannot be guaranteed to yield a good
projection for classification purposes. Assume, for example,
that there exist two clusters of two-dimensional samples
scattering uniformly in two ellipses C
1
and C
2
, as shown in
Fig. 1. If the locality radius is set as the length of the
semimajor axis of the larger ellipse, the direction w
1
is a nice
projection according to the criterion of LPP since, after all
samples are projected onto w
1
, the local scatter is minimal.
But, it is obvious that w
1
is not good in terms of classification;
the projected samples overlap in this direction. This example
also shows that the nonlocal quantity, i.e., the intercluster
scatter, may provide crucial information for discrimination.
In this paper, we will address this issue and explore more
effective projections for classification purposes.
Motivated by the idea of classification-oriented multi-
manifolds learning, we consider two quantities, local and
nonlocal, at the same time in the modeling process. It should
be pointed out that we don’t attempt to build a framework
for multimanifolds-based learning in this paper (although it
is very interesting). We are more interested in its linear
approximation, i.e., finding a simple and practical linear map
for biometrics applications. To this end, we first present the
techniques to characterize the local and nonlocal scatters of
data. Then, based on this characterization, we propose a
criterion which seeks to maximize the ratio of the nonlocal
scatter to the local scatter. This criterion, similar to the
classical Fisher criterion, is a Rayleigh quotient in form. Thus,
it is not hard to find its optimal solutions by solving a
generalized eigen-equation. Since the proposed method does
not use the class-label information of samples in the learning
process, this method is called the unsupervised discriminant
projection (UDP), in contrast with the supervised discrimi-
nant projection of LDA.
In contrast with LPP, UDP has direct relations to
classification since it utilizes the information of the
“nonlocality.” Provided that each cluster of samples in the
observation space is exactly within a local neighbor, UDP
can yield an optimal projection for clustering in the projected
space, while LPP cannot. As shown in Fig. 1, w
2
is a good
projection direction according the criterion of UDP, which is
more discriminative than w
1
. In addition, UDP will be
demonstrated to be more effective than LPP in real-world
biometrics applications, based on our experiments with
three face image databases and one palmprint database.
In the literature, besides LPP, there are two methods most
relevant to ours. One is Marginal Fisher Analysis (MFA)
presented by Yan et al. [33] and the other is Local
Discriminant Embedding (LDE) suggested by Chen et al.
[34]. The two methods are very similar in formulation. Both of
them combine locality and class label information to represent
the intraclass compactness and interclass separability. So,
MFA and LDE can be viewed as supervised variants of LPP or
as localized variants of LDA since both methods focus on the
YANG ET AL.: GLOBALLY MAXIMIZING, LOCALLY MINIMIZING: UNSUPERVISED DISCRIMINANT PROJECTION WITH APPLICATIONS TO... 651
Fig. 1. Illustration of two clusters of samples in two-dimensional space
and the projection directions.

characterization of intraclass locality and interclass locality.In
contrast, the proposed UDP retains the unsupervised char-
acteristic of LPP and seeks to combine locality and globality
information for discriminator design.
The remainder of this paper is organized as follows:
Section 2 outlines PCA and LDA. Section 3 develops the idea
of UDP and the relevant theory and algorithm. Section 4
describes a kernel weighted version of UDP. Section 5
discusses the relations between UDP and LDA/LPP. Sec-
tion 6 describes some biometrics applications and the related
experiments. Section 7 offers our conclusions.
2OUTLINE OF PCA AND LDA
2.1 PCA
PCA seeks to find a projection axis such that the global scatter
is maximized after the projection of samples. The global
scatter can be characterized by the mean square of the
Euclidean distance between any pair of the projected sample
points. Specifically, given a set of M training samples
(pattern vectors) x
1
; x
2
; ; x
M
in IR
n
, we get their images
y
1
;y
2
; ;y
M
after the projection onto the projection axis w.
The global scatter is defined by
J
T
ðwÞ¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
ðy
i
y
j
Þ
2
: ð1Þ
It follows that
J
T
ðwÞ¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
ðw
T
x
i
w
T
x
j
Þ
2
¼ w
T
1
2
1
MM
X
M
i¼1
X
M
j¼1
ðx
i
x
j
Þðx
i
x
j
Þ
T
"#
w:
ð2Þ
Let us denote
S
T
¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
ðx
i
x
j
Þðx
i
x
j
Þ
T
ð3Þ
and the mean vector m
0
¼
1
M
P
M
j¼1
x
j
. Then, it can be proven
that
S
T
¼
1
MM
M
X
M
i¼1
x
i
x
T
i
X
M
i¼1
x
i
!
X
M
j¼1
x
T
j
!"#
¼
1
M
X
M
i¼1
ðx
i
m
0
Þðx
i
m
0
Þ
T
:
ð4Þ
Equation (4) indicates that S
T
is essentially the covariance
matrix of data. So, the projection axis w that maximizes (2)
can be selected as the eigenvector of S
T
corresponding to
the largest eigenvalue. Similarly, we can obtain a set of
projection axes of PCA by selecting the d eigenvectors of S
T
corresponding to the d largest eigenvalues.
2.2 LDA
LDA seeks to find a projection axis such that the Fisher
criterion (i.e., the ratio of the between-class scatter to the
within-class scatter) is maximized after the projection of
samples. The between-class and within-class scatter ma-
trices S
B
and S
w
are defined by
S
B
¼
1
M
X
c
i¼1
l
i
m
i
m
0
ðÞm
i
m
0
ðÞ
T
; ð5Þ
S
W
¼
X
c
i¼1
l
i
M
S
ðiÞ
W
¼
1
M
X
c
i¼1
X
l
i
j¼1
x
ij
m
i

x
ij
m
i

T
; ð6Þ
where x
ij
denotes the jth training sample in class i, c is the
number of classes, l
i
is the number of training samples in
class i, m
i
is the mean of the training samples in class i, and
S
ðiÞ
W
denotes the covariance matrix of samples in class i.
It is easy to show that S
B
and S
W
are both nonnegative
definite matrix and satisfy S
T
¼ S
B
þ S
W
.
The Fisher criterion is defined by
J
F
ðwÞ¼
w
T
S
B
w
w
T
S
W
w
: ð7Þ
The stationary points of J
F
ðwÞ are the generalized eigen-
vectors w
1
; w
2
; ; w
d
of S
B
w ¼ S
W
w corresponding to
the d largest eigenvalues. These stationary points form the
coordinate system of LDA.
3UNSUPERVISED DISCRIMINANT PROJECTION
(UDP)
3.1 Basic Idea of UDP
As discussed in Section 1, the locality characterization-
based model does not guarantee a good projection for
classification purposes. To address this, we will introduce
the concept of nonlocality and give the characterizations of
the nonlocal scatter and the local scatter. This will allow us
to obtain a concise criterion for feature extraction by
maximizing the ratio of nonlocal scatter to local scatter.
3.1.1 Characterize the Local Scatter
Recall that, in PCA, in order to preserve the global geometric
structure of data in a transformed low-dimensional space,
account is taken of the global scatter of samples. Correspond-
ingly, if we aim to discover the local structure of data, we
should take account of the local scatter (or intralocality scatter)
of samples. The local scatter can be characterized by the mean
square of the Euclidean d istance between any pair of
the projected sample points that are within any local
-neighborhood (>0). Specifically, two samples x
i
and x
j
are viewed within a local -neighborhood p rovided that
jjx
i
x
j
jj
2
<. Let us denote the set U
¼fði; jÞ
jjx
i
x
j
jj
2
<g. After the projection of x
i
and x
j
onto a direction w,we
get their images y
i
and y
j
. The local scatter is then defined by
J
L
ðwÞ¼
1
2
1
M
L
X
ði;jÞ2U
ðy
i
y
j
Þ
2
/
1
2
1
MM
X
ði;jÞ2U
ðy
i
y
j
Þ
2
;
ð8Þ
where M
L
is the number of sample pairs satisfying
jjx
i
x
j
jj
2
<.
Let us define the adjacency matrix H, whose elements
are given below:
H
ij
¼
1; jjx
i
x
j
jj
2
<
0 otherwise:
ð9Þ
652 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 4, APRIL 2007

It is obvious that the adjacency matrix H is a symmetric
matrix. By virtue of the adjacency matrix H, (8) can be
rewritten by
1
J
L
ðwÞ¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
H
ij
ðy
i
y
j
Þ
2
: ð10Þ
It follows from (10) that
J
L
ðwÞ¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
H
ij
ðw
T
x
i
w
T
x
j
Þ
2
¼ w
T
"
1
2
1
MM
X
M
i¼1
X
M
j¼1
H
ij
ðx
i
x
j
Þðx
i
x
j
Þ
T
#
w
¼ w
T
S
L
w;
ð11Þ
where
S
L
¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
H
ij
ðx
i
x
j
Þðx
i
x
j
Þ
T
: ð12Þ
S
L
is called the local scatter (covariance) matrix.
Due to the symmetry of H, we have
S
L
¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
H
ij
x
i
x
T
i
þ
X
M
i¼1
X
M
j¼1
H
ij
x
j
x
T
j
2
X
M
i¼1
X
M
j¼1
H
ij
x
i
x
T
j
!
¼
1
MM
X
M
i¼1
D
ii
x
i
x
T
i
X
M
i¼1
X
M
j¼1
H
ij
x
i
x
T
j
!
¼
1
MM
XDX
T
XHX
T
¼
1
MM
XLX
T
;
ð13Þ
where X ¼ðx
1
; x
2
; ; x
M
Þand D is a diagonal matrix whose
elements on diagonal are column (or row since H is a
symmetric matrix) sum of H, i.e., D
ii
¼
P
M
j¼1
H
ij
. L ¼ D H
is called the local scatter kernel (LSK) matrix in this paper (this
matrix is called the Laplacian matrix in [24]).
It is obvious that L and S
L
are both real symmetric
matrices. From (11) and (13), we know that w
T
S
L
w 0 for
any nonzero vector w. So, the local scatter matrix S
L
must
be nonnegative definite.
In th e above discussion, we use -neighborhoods to
characterize the “locality” and the local scatter. This way is
geometrically intuitive but unpopular because, in practice, it
is hard to choose a proper neighborhood radius . To avoid
this difficulty, the method of K-nearest neighbors is always
used instead in real-world applicatio ns. The K-nearest
neighbors method can determine the following adjacency
matrix H, with elements given by:
H
ij
¼
1; if x
j
is among K nearest nieghbors of x
i
and x
i
is among K nearest nieghbors of x
j
0 otherwise:
8
<
:
ð14Þ
The local scatter can be characterized similarly by a K-nearest
neighbor adjacency matrix if (9) is replaced by (14).
3.1.2 Characterize the Nonlocal Scatter
The nonlocal scatter (i.e., the interlocality scatter) can be
characterized by the mean square of the Euclidean distance
between any pair of the projected sample points that are
outside any lo cal -neighborhoods (>0). The nonlocal
scatter is defined by
J
N
ðwÞ¼
1
2
1
M
N
X
ði;jÞ=2U
ðy
i
y
j
Þ
2
/
1
2
1
MM
X
ði;jÞ=2U
ðy
i
y
j
Þ
2
;
ð15Þ
where M
N
is the number of sample pairs satisfying
jjx
i
x
j
jj
2
.
By virtue of the adjacency matrix H in (9) or (14), the
nonlocal scatter can be rewritten by
J
N
ðwÞ¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
ð1 H
ij
Þðy
i
y
j
Þ
2
: ð16Þ
It follows from (16) that
J
N
ðwÞ¼w
T
1
2
1
MM
X
M
i¼1
X
M
j¼1
ð1 H
ij
Þðx
i
x
j
Þðx
i
x
j
Þ
T
"#
w
¼ w
T
S
N
w;
ð17Þ
where
S
N
¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
ð1 H
ij
Þðx
i
x
j
Þðx
i
x
j
Þ
T
: ð18Þ
S
N
is called the nonlocal scatter (covariance) matrix. It is
easy to show S
N
is also a nonnegative definite matrix. And,
it follows that
S
N
¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
ð1 H
ij
Þðx
i
x
j
Þðx
i
x
j
Þ
T
¼
1
2
1
MM
X
M
i¼1
X
M
j¼1
ðx
i
x
j
Þðx
i
x
j
Þ
T
1
2
1
MM
X
M
i¼1
X
M
j¼1
H
ij
ðx
i
x
j
Þðx
i
x
j
Þ
T
¼ S
T
S
L
:
That is, S
T
¼S
L
þ S
N
. Thus, we have J
T
ðwÞ¼J
L
ðwÞþJ
N
ðwÞ.
3.1.3 Determine a Criterion: Maximizing the Ratio of
Nonlocal Scatter to Local Scatter
The technique of Locality Preserving Projection (LPP) [31]
seeks to find a linear subspace which can preserve the local
structure of data. The objective of LPP is actually to minimize
the local scatter J
L
ðwÞ. Obviously, the projection direction
determined by LPP can ensure that, if samples x
i
and x
j
are
close, their projections y
i
and y
j
are close as well. But, LPP
cannot guarantee that, if samples x
i
and x
j
are not close, their
projections y
i
and y
j
are not either. This means that it may
happen that two mutually distant samples belonging to
YANG ET AL.: GLOBALLY MAXIMIZING, LOCALLY MINIMIZING: UNSUPERVISED DISCRIMINANT PROJECTION WITH APPLICATIONS TO... 653
1. In (8), the only difference between expressions in the middle and on
the right is a coefficient. This difference is meaningless for the characteriza-
tion of the scatter. For convenience, we use the expression on the right. The
same operation is used in (15).

different classes may result in close images after the
projection of LPP. Therefore, LPP does not necessarily yield
a good projection suitable for classification.
For the purpose of classification, we try is to find a
projection which will draw the close samples closer
together while simultaneously making the mutually distant
samples even more distant from each other. From this point
of view, a desirable projection should be the one that, at the
same time, minimizes the local scatter J
L
ðwÞ and maximizes
the nonlocal scatter J
N
ðwÞ. As it happens, we can obtain just
such a projection by maximizing the following criterion:
JðwÞ¼
J
N
ðwÞ
J
L
ðwÞ
¼
w
T
S
N
w
w
T
S
L
w
: ð19Þ
Since J
T
ðwÞ¼J
L
ðwÞþJ
N
ðwÞ and S
T
¼ S
L
þ S
N
, the above
criterion is equivalent to
J
e
ðwÞ¼
J
T
ðwÞ
J
L
ðwÞ
¼
w
T
S
T
w
w
T
S
L
w
: ð20Þ
The criterion in (20) indicates that we can find the projection
by at the same time globally maximizing (maximizing the
global scatter) and locally minimizing (minimizing the local
scatter).
The criterion in (19) or (20) is formally similar to the
Fisher criterion in (7) since they are both Rayleigh quotients.
Differently, the matrices S
L
and S
N
in (19) can be
constructed without knowing the class-label of samples,
while S
B
and S
W
in (7) cannot be so constructed. This
means Fisher discriminant projection is supervised, while
the projection determined by JðwÞ can be obtained in an
unsupervised manner. In this paper, then, this projection is
called an Unsupervised Discriminant Projection (UDP).
3.2 Algorithmic Derivations of UDP in Small Sample
Size Cases
If the local scatter matrix S
L
is nonsingular, the criterion in
(19) can be maximized directly by calculating the generalized
eigenvectors of the following generalized eigen-equation:
S
N
w ¼ S
L
w: ð21Þ
The projection axes of UDP can be selected as the generalized
eigenvectors w
1
; w
2
; ; w
d
of S
N
w ¼ S
L
w corresponding
to d largest positive eigenvalues
1
2

d
.
Inreal-world biometrics applicationsof such faceand palm
recognition, however, S
L
is always singular due to the limited
number of training samples. In such cases, the classical
algorithm cannot be used directly to solve the generalized
eigen-equation. In addition, from (12) and (18), we know S
L
and S
N
are both n n matrices (where n is the dimension of
the image vector space). It is computationally very expensive
to construct these large-sized matrices in the high-dimen-
sional input space. Fortunately, we can avoid these difficulties
by virtue of the theory we built for LDA (or KFD) in small
sample size cases [9], [20]. Based on this theory, the local and
nonlocal scatter matrices can be constructed using the PCA-
transformed low-dimensional data and the singularity
difficulty can be avoided. The relevant theory is given below.
Suppose
1
;
2
; ;
n
are n orthonormal eigenvectors of
S
T
and the first m (m ¼ rankðS
T
Þ) eigenvectors correspond
to positive eigenvalues
1
2

m
.Definethe
subspace
T
¼ spanf
1
; ;
m
g and denote its orthogonal
complement
?
T
¼ spanf
mþ1
; ;
n
g. Obviously,
T
is the
range space of S
T
and
?
T
is the corresponding null space.
Lemma 1 [4], [36]. Suppose that A is an n n nonnegative
definite matrix and is an n-dimensional vector, then
T
A ¼ 0 if and only if A ¼ 0.
Since S
L
, S
N
, and S
T
are all nonnegative definite and
S
T
¼ S
L
þ S
N
, it’s easy to get:
Lemma 2. If S
T
is singular,
T
S
T
¼ 0 if and only if
T
S
L
¼
0 and
T
S
N
¼ 0.
Since IR
n
¼ spanf
1
;
2
; ;
n
g, for an arbitrary 2 IR
n
,
can be denoted by
¼ k
1
1
þþk
m
m
þ k
mþ1
mþ1
þþk
n
n
: ð22Þ
Let w ¼ k
1
1
þþk
m
m
and u ¼ k
mþ1
mþ1
þþk
n
n
,
then, from the definition of
T
and
?
T
, can be denoted by
¼ w þ u, where w 2
T
and u 2
?
T
.
Definition 1. For an arbitrary 2 IR
n
, can be denoted by
¼ w þ u, where w 2
T
and u 2
?
T
. The compression
mapping L :IR!
T
is defined by ¼ w þ u ! w.
It is easy to verify that L is a linear transformation from
IR
n
to its subspace
T
.
Theorem 1. Under the compression mapping L :IR
n
!
T
determined by ¼ w þ u ! w, the UDP criterion satisfies
JðÞ¼JðwÞ.
Proof. Since
?
T
isthe nullspace of S
T
,for anyu 2
?
T
,we have
u
T
S
T
u ¼ 0.
From Lemma 2, it follows that u
T
S
L
u ¼ 0. Since S
L
is
a nonnegative definite matrix, we have S
L
u ¼ 0 by
Lemma 1. Hence,
T
S
L
¼ w
T
S
L
w þ 2w
T
S
L
u þ u
T
S
L
u ¼ w
T
S
L
w:
Similarly, it can be derived that
T
S
N
¼ w
T
S
N
w þ 2w
T
S
N
u þ u
T
S
N
u ¼ w
T
S
N
w:
Therefore, JðÞ¼JðwÞ. tu
According to Theorem 1, we can conclude that the
optimal projection axes can be derived from
T
without any
loss of effective discriminatory information with respect to
the UDP criterion. From linear al gebra the ory,
T
is
isomorphic to an m-dimensional Euclidean space IR
m
and
the corresponding isomorphic mapping is
w ¼ Pv; where P ¼ð
1
;
2
; ;
m
Þ; v 2 IR
m
; ð23Þ
which is a one-to-one mapping from IR
m
onto
T
.
From the isomorphic mapping w ¼ Pv, the UDP criter-
ion function J wðÞbecomes
J wðÞ¼
v
T
ðP
T
S
N
PÞv
v
T
ðP
T
S
L
PÞv
¼
v
T
~
S
N
v
v
T
~
S
L
v
¼
~
JðvÞ; ð24Þ
where
~
S
N
¼ P
T
S
N
P and
~
S
L
¼ P
T
S
L
P. It is easy to prove that
~
S
N
and
~
S
L
are both m m semipositive definite matrices.
This means
~
JðvÞ is a function of a generalized Rayleigh
quotient like J wðÞ.
By the property of isomorphic mapping and (24), the
following theorem holds:
Theorem 2. Let w ¼ Pv be an isomorphic mapping from IR
m
onto
T
. Then, w
¼ Pv
is the stationary point of the UDP
654 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 4, APRIL 2007

Citations
More filters
Journal ArticleDOI

Maximum neighborhood margin discriminant projection for classification

TL;DR: A novel maximum neighborhood margin discriminant projection technique for dimensionality reduction of high-dimensional data that cannot only detect the true intrinsic manifold structure of the data but also strengthen the pattern discrimination among different classes.
Journal ArticleDOI

Extended SRC: Undersampled Face Recognition via Intraclass Variant Dictionary

TL;DR: Experimental results on the AR and FERET databases show that ESRC has better generalization ability than SRC for undersampled face recognition under variable expressions, illuminations, disguises, and ages.
Journal ArticleDOI

A survey of palmprint recognition

TL;DR: An overview of current palmprint research is provided, describing in particular capture devices, preprocessing, verification algorithms, palmprint-related fusion, algorithms especially designed for real-time palmprint identification in large databases and measures for protecting palmprint systems and users' privacy.
Book ChapterDOI

Gabor feature based sparse representation for face recognition with gabor occlusion dictionary

TL;DR: The number of atoms is significantly reduced in the computed Gabor occlusion dictionary, which greatly reduces the computational cost in coding the occluded face images while improving greatly the SRC accuracy.
Journal ArticleDOI

Fusing Local Patterns of Gabor Magnitude and Phase for Face Recognition

TL;DR: This paper proposes local Gabor XOR patterns (LGXP), which encodes the Gabor phase by using the local XOR pattern (LXP) operator, and introduces block-based Fisher's linear discriminant (BFLD) to reduce the dimensionality of the proposed descriptor and at the same time enhance its discriminative power.
References
More filters
Journal ArticleDOI

Nonlinear dimensionality reduction by locally linear embedding.

TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Journal ArticleDOI

Eigenfaces for recognition

TL;DR: A near-real-time computer system that can locate and track a subject's head, and then recognize the person by comparing characteristics of the face to those of known individuals, and that is easy to implement using a neural network architecture.
Journal ArticleDOI

A global geometric framework for nonlinear dimensionality reduction.

TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Journal ArticleDOI

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
Journal ArticleDOI

Nonlinear component analysis as a kernel eigenvalue problem

TL;DR: A new method for performing a nonlinear form of principal component analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What have the authors contributed in "Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics" ?

This paper develops an unsupervised discriminant projection ( UDP ) technique for dimensionality reduction of highdimensional data in small sample size cases. UDP characterizes the local scatter as well as the nonlocal scatter, seeking to find a projection that simultaneously maximizes the nonlocal scatter and minimizes the local scatter. This characteristic makes UDP more intuitive and more powerful than the most up-to-date method, Locality Preserving Projection ( LPP ), which considers only the local scatter for clustering or classification tasks. The proposed method is applied to face and palm biometrics and is examined using the Yale, FERET, and AR face image databases and the PolyU palmprint database. 

Among the most well-known are isometric feature mapping (ISOMAP) [22], local linear embedding (LLE) [23], and Laplacian Eigenmap [24]. 

The second reason why most manifold learning algorithms, for example, ISOMAP, LLE, and Laplacian Eigenmap, are unsuitable for pattern recognition tasks is that they can yield an embedding directly based on the training data set but, because of the implicitness of the nonlinear map, when applied to a new sample, they cannot find the sample’s image in the embedding space. 

1. If the locality radius is set as the length of the semimajor axis of the larger ellipse, the direction w1 is a nice projection according to the criterion of LPP since, after all samples are projected onto w1, the local scatter is minimal. 

Since the cosine distance is more effective than the Euclidean distance for LDA, Laplacianface, and UDP, in the following experiments the authors use only this distance metric. 

For achieving an optimal recognition result, the recovered embeddings corresponding to different face manifolds should be as separate as possible in the final embedding space. 

Provided that each cluster of samples in the observation space is exactly within a local neighbor, UDP can yield an optimal projection for clustering in the projected space, while LPP cannot. 

The projection of UDP can be viewed as a linear approximation of the nonlinear map that uncovers and separates embeddings corresponding to different manifolds in the final embedding space. 

In their experiments, l images (l varies from 2 to 6) are randomly selected from the image gallery of each individual to form the training sample set. 

the cosine distancemetric can significantly improve the performance of LDA,Laplacianface, and UDP, but it has no substantial effect on theperformance of PCA. 

LDA seeks to find a projection axis such that the Fisher criterion (i.e., the ratio of the between-class scatter to the within-class scatter) is maximized after the projection of samples. 

A number of nonlinear dimensionality reduction techniques have been developed to address this problem, with two in particular attracting wide attention: kernel-based techniques and manifold learningbased techniques. 

It is composed of the images whose names aremarked with two-character strings: “ba,” “bj,” “bk,” “be,” “bf.”This subset involves variations in facial expression, illumina-tion, and pose.