What is the locality radius of the ellipse?

1. If the locality radius is set as the length of the semimajor axis of the larger ellipse, the direction w1 is a nice projection according to the criterion of LPP since, after all samples are projected onto w1, the local scatter is minimal.

What is the reason why the authors use only the cosine distance metric?

Since the cosine distance is more effective than the Euclidean distance for LDA, Laplacianface, and UDP, in the following experiments the authors use only this distance metric.

What is the definition of a linear approximation of the nonlinear map?

The projection of UDP can be viewed as a linear approximation of the nonlinear map that uncovers and separates embeddings corresponding to different manifolds in the final embedding space.

How many images are used in the training sample set?

In their experiments, l images (l varies from 2 to 6) are randomly selected from the image gallery of each individual to form the training sample set.

What is the way to improve the performance of PCA?

the cosine distancemetric can significantly improve the performance of LDA,Laplacianface, and UDP, but it has no substantial effect on theperformance of PCA.

What is the subset of images that are marked with two-character strings?

It is composed of the images whose names aremarked with two-character strings: “ba,” “bj,” “bk,” “be,” “bf.”This subset involves variations in facial expression, illumina-tion, and pose.

(Open Access) Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Applications to Face and Palm Biometrics (2007) | Jian Yang

Q: Why are manifold learning algorithms unsuitable for pattern recognition tasks?

The second reason why most manifold learning algorithms, for example, ISOMAP, LLE, and Laplacian Eigenmap, are unsuitable for pattern recognition tasks is that they can yield an embedding directly based on the training data set but, because of the implicitness of the nonlinear map, when applied to a new sample, they cannot find the sample’s image in the embedding space.

Globally Maximizing, Locally Minimizing:

Unsupervised Discriminant Projection with

Applications to Face and Palm Biometrics

Jian Yang, David Zhang, Senior Member, IEEE, Jing-yu Yang, and Ben Niu

Abstract—This paper develops an unsupervised discriminant projection (UDP) technique for dimensionality reduction of high-

dimensional data in small sample size cases. UDP can be seen as a linear approximation of a multimanifolds-based learning framework

which takes into account both the local and nonlocal quantities. UDP characterizes the local scatter aswell as the nonlocal scatter, seeking

to find a projection that simultaneously maximizes the nonlocal scatter and minimizes the local scatter. This characteristic makes UDP

more intuitive and more powerful than the most up-to-date method, Locality Preserving Projection (LPP), which considers only the local

scatter for clustering or classification tasks. The proposed method is applied to face and palm biometrics and is examined using the Yale,

FERET, and AR face image databases and the PolyU palmprint database. The experimental results show that UDP consistently

outperforms LPP and PCA and outperforms LDA when the training sample size per class is small. This demonstrates that UDP is a good

choice for real-world biometrics applications.

Index Terms—Dimensionality reduction, feature extraction, subspace learning, Fisher linear discriminant analysis (LDA), manifold

learning, biometrics, face recognition, palmprint recognition.

1INTRODUCTION

IMENSIONALITY reduction is the construction of a mean-

ingful low-dimensional representation of high-dimen-

sional data. Since thereare large volumes of high-dimensional

data in numerous real-world applications, dimensionality

reduction is a fundamental problem in many scientific fields.

From the perspective of pattern recognition, dimensionality

reduction is an effective means of avoiding the “curse of

dimensionality” [1] and improving the computational effi-

ciency of pattern matching.

Researchers have developed many useful dimensionality

reduction techniques. These techniques can be broadly

categorized into two classes: linear and nonlinear. Linear

dimensionality reduction seeks to find a meaningful low-

dimensional subspace in a high-dimensional input space.

This subspace can provide a compact representation of

higher-dimensional data when the structure of data

embedded in the input space is linear. PCA and LDA are

two well-known linear subspace learning methods which

have been extensively used in pattern recognition and

computer vision areas and have become the most popular

techniques for face recognition and other biometrics [2], [3],

[4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [39].

Linear models, however, may fail to discover essential data

structures that are nonlinear. A number of nonlinear

dimensionality reduction techniques have been developed

to address this problem, withtwo in particular attracting wide

attention: kernel-based techniques and manifold learning-

based techniques. The basic idea of kernel-based techniques is

to implicitly map observed patterns into potentially much

higher dimensional feature vectors by using a nonlinear

mapping determined by a kernel. This makes it possible for

the nonlinear structure of data in observation space tobecome

linear in feature space, allowing the use of linear techniques to

deal with the data. The representative techniques are kernel

principal component analysis (KPCA) [15] and kernel Fisher

discriminant (KFD) [16], [17]. Both have proven to be effective

in many real-world applications [18], [19], [20].

In contrast with kernel-based techniques, the motivation

of manifold learning is straightforward as it seeks to

directly find the intrinsic low-dimensional nonlinear data

structures hidden in observation space. The past few years

have seen many manifold-based learning algorithms for

discovering intrinsic low-dimensional embedding of data

proposed. Among the most well-known are isometric

feature mapping (ISOMAP) [22], local linear embedding

(LLE) [23], and Laplacian Eigenmap [24]. Some experiments

have shown that these methods can find perceptually

meaningful embeddings for face or digit images. They also

yielded impressive results on other artificial and real-world

data sets. Recently, Yan et al. [33] proposed a general

dimensionality reduction framework called graph embed-

ding. LLE, ISOMAP, and Laplacian Eigenmap can all be

reformulated as a unified model in this framework.

650 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 4, APRIL 2007

. J. Yang is with the Biometric Research Centre, Department of Computing,

Hong Kong Polytechnic University , Kowloon, Hong Kong and the

Department of Computer Science, Nanjing University of Science and

Technology, Nanjing 210094, P.R. China.

E-mail: csjyang@comp.polyu.edu.hk.

. D. Zhang and B. Niu are with the Biometric Research Centre, Department

of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong.

E-mail: {csdzhang, csniuben}@comp.polyu.edu.hk.

. J.-y. Yang is with the Department of Computer Science, Nanjing

University of Science and Technology, Nanjing 210094, P.R. China.

E-mail: yangjy@mail.njust.edu.cn.

Manuscript received 17 Jan. 2006; revised 5 June 2006; accepted 26 Sept.

2006; published online 18 Jan. 2007.

Recommended for acceptance by S. Prabhakar, J. Kittler, D. Maltoni,

L. O’Gorman, and T. Tan.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.organdreference IEEECSLog Number TPAMISI-0021-0106.

Digital Object Identifier no. 10.1109/TPAMI.2007.1008.

0162-8828/07/$25.00 ß 2007 IEEE Published by the IEEE Computer Society

One problem with current manifold learning techniques

is that they might be unsuitable for pattern recognition

tasks. There are two reasons for this. First, as it is currently

conceived, manifold learning is limited in that it is modeled

based on a characterization of “locality,” a modeling that has

no direct connection to classification. This is unproblematic

for existing manifold learning algorithms as they seek to

mode l a simple manifold, for example, to recover an

embedding of one person’s face images [21], [22], [23].

However, if face images do exist on a manifold, different

persons’ face images could lie on different manifolds. To

recognize faces, it would be necessary to distinguish

between images from different manifolds. For achieving

an optimal recognition result, the recovered embeddings

corresponding to different face manifolds should be as

separate as possible in the final embedding space. This poses

a problem that we might call “classification-oriented multi-

manifolds learning.” This problem cannot be addressed by

current manifold learning algorithms, including some

supervised versions [25], [26], [27] because they are all

based on the characterization of “locality.” The local

quantity suffices for modeling a single manifold, but does

not suffice for modeling multimanifolds for classification

purposes. To make different embeddings corresponding to

different classes mutually separate, however, it is crucial to

have the “nonlocal” quantity, which embodies the distance

between embeddings. In short, it is necessary to characterize

the “nonlocality” when modeling multimanifolds.

The second reason why most manifold learning algo-

rithms, for example, ISOMAP, LLE, and Laplacian Eigenmap,

are unsuitable for pattern recognition tasks is that they can

yield an embedding directly based on the training data set

but, because of the implicitness of the nonlinear map, when

applied to a new sample, they cannot find the sample’s image

in the embedding space. This limits the applications of these

algorithms to pattern recognition problems. Although some

research has shown that it is possible to construct an explicit

map from input space to embedding space [28], [29], [30], the

effectiveness of these kinds of maps on real-world classifica-

tion problems still needs to be demonstrated.

Recently, He et al. [31], [32] proposed Locality Preserving

Projections (LPP), which is a linear subspace learning

method derived from Laplacian Eigenmap. In contrast to

most manifold learning algorithms, LPP possesses the

remarkable advantage that it can generate an explicit map.

This map is linear and easily computable, like that of PCA or

LDA. It is also effective, yielding encouraging results on face

recognition tasks. Yet, as it is modeled on the basis of

“locality,” LPP, like most manifold learning algorithms, has

the weakness of having no direct connection to classification.

The objective function of LPP is to minimize the local

quantity, i.e., the local scatter of the projected data. In some

cases, this criterion cannot be guaranteed to yield a good

projection for classification purposes. Assume, for example,

that there exist two clusters of two-dimensional samples

scattering uniformly in two ellipses C

and C

, as shown in

Fig. 1. If the locality radius  is set as the length of the

semimajor axis of the larger ellipse, the direction w

is a nice

projection according to the criterion of LPP since, after all

samples are projected onto w

, the local scatter is minimal.

But, it is obvious that w

is not good in terms of classification;

the projected samples overlap in this direction. This example

also shows that the nonlocal quantity, i.e., the intercluster

scatter, may provide crucial information for discrimination.

In this paper, we will address this issue and explore more

effective projections for classification purposes.

Motivated by the idea of classification-oriented multi-

manifolds learning, we consider two quantities, local and

nonlocal, at the same time in the modeling process. It should

be pointed out that we don’t attempt to build a framework

for multimanifolds-based learning in this paper (although it

is very interesting). We are more interested in its linear

approximation, i.e., finding a simple and practical linear map

for biometrics applications. To this end, we first present the

techniques to characterize the local and nonlocal scatters of

data. Then, based on this characterization, we propose a

criterion which seeks to maximize the ratio of the nonlocal

scatter to the local scatter. This criterion, similar to the

classical Fisher criterion, is a Rayleigh quotient in form. Thus,

it is not hard to find its optimal solutions by solving a

generalized eigen-equation. Since the proposed method does

not use the class-label information of samples in the learning

process, this method is called the unsupervised discriminant

projection (UDP), in contrast with the supervised discrimi-

nant projection of LDA.

In contrast with LPP, UDP has direct relations to

classification since it utilizes the information of the

“nonlocality.” Provided that each cluster of samples in the

observation space is exactly within a local neighbor, UDP

can yield an optimal projection for clustering in the projected

space, while LPP cannot. As shown in Fig. 1, w

is a good

projection direction according the criterion of UDP, which is

more discriminative than w

. In addition, UDP will be

demonstrated to be more effective than LPP in real-world

biometrics applications, based on our experiments with

three face image databases and one palmprint database.

In the literature, besides LPP, there are two methods most

relevant to ours. One is Marginal Fisher Analysis (MFA)

presented by Yan et al. [33] and the other is Local

Discriminant Embedding (LDE) suggested by Chen et al.

[34]. The two methods are very similar in formulation. Both of

them combine locality and class label information to represent

the intraclass compactness and interclass separability. So,

MFA and LDE can be viewed as supervised variants of LPP or

as localized variants of LDA since both methods focus on the

YANG ET AL.: GLOBALLY MAXIMIZING, LOCALLY MINIMIZING: UNSUPERVISED DISCRIMINANT PROJECTION WITH APPLICATIONS TO... 651

Fig. 1. Illustration of two clusters of samples in two-dimensional space

and the projection directions.

characterization of intraclass locality and interclass locality.In

contrast, the proposed UDP retains the unsupervised char-

acteristic of LPP and seeks to combine locality and globality

information for discriminator design.

The remainder of this paper is organized as follows:

Section 2 outlines PCA and LDA. Section 3 develops the idea

of UDP and the relevant theory and algorithm. Section 4

describes a kernel weighted version of UDP. Section 5

discusses the relations between UDP and LDA/LPP. Sec-

tion 6 describes some biometrics applications and the related

experiments. Section 7 offers our conclusions.

2OUTLINE OF PCA AND LDA

2.1 PCA

PCA seeks to find a projection axis such that the global scatter

is maximized after the projection of samples. The global

scatter can be characterized by the mean square of the

Euclidean distance between any pair of the projected sample

points. Specifically, given a set of M training samples

(pattern vectors) x

; x

; ; x

in IR

, we get their images

; ;y

after the projection onto the projection axis w.

The global scatter is defined by

ðwÞ¼



i¼1

j¼1

ðy

 y

: ð1Þ

It follows that

ðwÞ¼

i¼1

j¼1

ðw

w

¼ w

i¼1

j¼1

ðx

 x

Þðx

 x

ð2Þ

Let us denote

i¼1

j¼1

ðx

 x

Þðx

 x

ð3Þ

and the mean vector m

j¼1

. Then, it can be proven

that

i¼1



i¼1

j¼1

!"#

i¼1

ðx

 m

Þðx

 m

ð4Þ

Equation (4) indicates that S

is essentially the covariance

matrix of data. So, the projection axis w that maximizes (2)

can be selected as the eigenvector of S

corresponding to

the largest eigenvalue. Similarly, we can obtain a set of

projection axes of PCA by selecting the d eigenvectors of S

corresponding to the d largest eigenvalues.

2.2 LDA

LDA seeks to find a projection axis such that the Fisher

criterion (i.e., the ratio of the between-class scatter to the

within-class scatter) is maximized after the projection of

samples. The between-class and within-class scatter ma-

trices S

and S

are defined by

i¼1

 m

ðÞm

 m

ðÞ

; ð5Þ

i¼1

ðiÞ

i¼1

j¼1

 m



 m



; ð6Þ

where x

denotes the jth training sample in class i, c is the

number of classes, l

is the number of training samples in

class i, m

is the mean of the training samples in class i, and

ðiÞ

denotes the covariance matrix of samples in class i.

It is easy to show that S

and S

are both nonnegative

definite matrix and satisfy S

¼ S

þ S

The Fisher criterion is defined by

ðwÞ¼

: ð7Þ

The stationary points of J

ðwÞ are the generalized eigen-

vectors w

; w

; ; w

of S

w ¼ S

w corresponding to

the d largest eigenvalues. These stationary points form the

coordinate system of LDA.

3UNSUPERVISED DISCRIMINANT PROJECTION

(UDP)

3.1 Basic Idea of UDP

As discussed in Section 1, the locality characterization-

based model does not guarantee a good projection for

classification purposes. To address this, we will introduce

the concept of nonlocality and give the characterizations of

the nonlocal scatter and the local scatter. This will allow us

to obtain a concise criterion for feature extraction by

maximizing the ratio of nonlocal scatter to local scatter.

3.1.1 Characterize the Local Scatter

Recall that, in PCA, in order to preserve the global geometric

structure of data in a transformed low-dimensional space,

account is taken of the global scatter of samples. Correspond-

ingly, if we aim to discover the local structure of data, we

should take account of the local scatter (or intralocality scatter)

of samples. The local scatter can be characterized by the mean

square of the Euclidean d istance between any pair of

the projected sample points that are within any local

-neighborhood (>0). Specifically, two samples x

and x

are viewed within a local -neighborhood p rovided that

jjx

 x

<. Let us denote the set U



¼fði; jÞ



jjx

 x

<g. After the projection of x

and x

onto a direction w,we

get their images y

and y

. The local scatter is then defined by

ðwÞ¼



ði;jÞ2U



ðy

 y

ði;jÞ2U



ðy

 y

;

ð8Þ

where M

is the number of sample pairs satisfying

jjx

 x

<.

Let us define the adjacency matrix H, whose elements

are given below:

1; jjx

 x

<

0 otherwise:



ð9Þ

652 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 4, APRIL 2007

It is obvious that the adjacency matrix H is a symmetric

matrix. By virtue of the adjacency matrix H, (8) can be

rewritten by

ðwÞ¼

i¼1

j¼1

ðy

 y

: ð10Þ

It follows from (10) that

ðwÞ¼

i¼1

j¼1

ðw

 w

¼ w

i¼1

j¼1

ðx

 x

Þðx

 x

¼ w

ð11Þ

where

i¼1

j¼1

ðx

 x

Þðx

 x

: ð12Þ

is called the local scatter (covariance) matrix.

Due to the symmetry of H, we have

i¼1

j¼1

i¼1

j¼1

 2

i¼1

j¼1

i¼1



i¼1

j¼1



XDX

 XHX



XLX

;

ð13Þ

where X ¼ðx

; x

; ; x

Þand D is a diagonal matrix whose

elements on diagonal are column (or row since H is a

symmetric matrix) sum of H, i.e., D

j¼1

. L ¼ D  H

is called the local scatter kernel (LSK) matrix in this paper (this

matrix is called the Laplacian matrix in [24]).

It is obvious that L and S

are both real symmetric

matrices. From (11) and (13), we know that w

w  0 for

any nonzero vector w. So, the local scatter matrix S

must

be nonnegative definite.

In th e above discussion, we use -neighborhoods to

characterize the “locality” and the local scatter. This way is

geometrically intuitive but unpopular because, in practice, it

is hard to choose a proper neighborhood radius . To avoid

this difficulty, the method of K-nearest neighbors is always

used instead in real-world applicatio ns. The K-nearest

neighbors method can determine the following adjacency

matrix H, with elements given by:

1; if x

is among K nearest nieghbors of x

and x

is among K nearest nieghbors of x

0 otherwise:

ð14Þ

The local scatter can be characterized similarly by a K-nearest

neighbor adjacency matrix if (9) is replaced by (14).

3.1.2 Characterize the Nonlocal Scatter

The nonlocal scatter (i.e., the interlocality scatter) can be

characterized by the mean square of the Euclidean distance

between any pair of the projected sample points that are

outside any lo cal -neighborhoods (>0). The nonlocal

scatter is defined by

ðwÞ¼



ði;jÞ=2U



ðy

 y

ði;jÞ=2U



ðy

 y

;

ð15Þ

where M

is the number of sample pairs satisfying

jjx

 x

 .

By virtue of the adjacency matrix H in (9) or (14), the

nonlocal scatter can be rewritten by

ðwÞ¼

i¼1

j¼1

ð1  H

Þðy

 y

: ð16Þ

It follows from (16) that

ðwÞ¼w

i¼1

j¼1

ð1  H

Þðx

 x

Þðx

 x

¼ w

ð17Þ

where

i¼1

j¼1

ð1  H

Þðx

 x

Þðx

 x

: ð18Þ

is called the nonlocal scatter (covariance) matrix. It is

easy to show S

is also a nonnegative definite matrix. And,

it follows that

i¼1

j¼1

ð1  H

Þðx

 x

Þðx

 x

i¼1

j¼1

ðx

 x

Þðx

 x



i¼1

j¼1

ðx

 x

Þðx

 x

¼ S

 S

That is, S

¼S

þ S

. Thus, we have J

ðwÞ¼J

ðwÞþJ

ðwÞ.

3.1.3 Determine a Criterion: Maximizing the Ratio of

Nonlocal Scatter to Local Scatter

The technique of Locality Preserving Projection (LPP) [31]

seeks to find a linear subspace which can preserve the local

structure of data. The objective of LPP is actually to minimize

the local scatter J

ðwÞ. Obviously, the projection direction

determined by LPP can ensure that, if samples x

and x

are

close, their projections y

and y

are close as well. But, LPP

cannot guarantee that, if samples x

and x

are not close, their

projections y

and y

are not either. This means that it may

happen that two mutually distant samples belonging to

YANG ET AL.: GLOBALLY MAXIMIZING, LOCALLY MINIMIZING: UNSUPERVISED DISCRIMINANT PROJECTION WITH APPLICATIONS TO... 653

1. In (8), the only difference between expressions in the middle and on

the right is a coefficient. This difference is meaningless for the characteriza-

tion of the scatter. For convenience, we use the expression on the right. The

same operation is used in (15).

different classes may result in close images after the

projection of LPP. Therefore, LPP does not necessarily yield

a good projection suitable for classification.

For the purpose of classification, we try is to find a

projection which will draw the close samples closer

together while simultaneously making the mutually distant

samples even more distant from each other. From this point

of view, a desirable projection should be the one that, at the

same time, minimizes the local scatter J

ðwÞ and maximizes

the nonlocal scatter J

ðwÞ. As it happens, we can obtain just

such a projection by maximizing the following criterion:

JðwÞ¼

ðwÞ

: ð19Þ

Since J

ðwÞ¼J

ðwÞþJ

ðwÞ and S

¼ S

þ S

, the above

criterion is equivalent to

ðwÞ¼

ðwÞ

: ð20Þ

The criterion in (20) indicates that we can find the projection

by at the same time globally maximizing (maximizing the

global scatter) and locally minimizing (minimizing the local

scatter).

The criterion in (19) or (20) is formally similar to the

Fisher criterion in (7) since they are both Rayleigh quotients.

Differently, the matrices S

and S

in (19) can be

constructed without knowing the class-label of samples,

while S

and S

in (7) cannot be so constructed. This

means Fisher discriminant projection is supervised, while

the projection determined by JðwÞ can be obtained in an

unsupervised manner. In this paper, then, this projection is

called an Unsupervised Discriminant Projection (UDP).

3.2 Algorithmic Derivations of UDP in Small Sample

Size Cases

If the local scatter matrix S

is nonsingular, the criterion in

(19) can be maximized directly by calculating the generalized

eigenvectors of the following generalized eigen-equation:

w ¼ S

w: ð21Þ

The projection axes of UDP can be selected as the generalized

eigenvectors w

; w

; ; w

of S

w ¼ S

w corresponding

to d largest positive eigenvalues 

 



Inreal-world biometrics applicationsof such faceand palm

recognition, however, S

is always singular due to the limited

number of training samples. In such cases, the classical

algorithm cannot be used directly to solve the generalized

eigen-equation. In addition, from (12) and (18), we know S

and S

are both n  n matrices (where n is the dimension of

the image vector space). It is computationally very expensive

to construct these large-sized matrices in the high-dimen-

sional input space. Fortunately, we can avoid these difficulties

by virtue of the theory we built for LDA (or KFD) in small

sample size cases [9], [20]. Based on this theory, the local and

nonlocal scatter matrices can be constructed using the PCA-

transformed low-dimensional data and the singularity

difficulty can be avoided. The relevant theory is given below.

Suppose 

;

; ;

are n orthonormal eigenvectors of

and the first m (m ¼ rankðS

Þ) eigenvectors correspond

to positive eigenvalues 

 



.Definethe

subspace 

¼ spanf

; ;

g and denote its orthogonal

complement 

¼ spanf

mþ1

; ;

g. Obviously, 

is the

range space of S

and 

is the corresponding null space.

Lemma 1 [4], [36]. Suppose that A is an n  n nonnegative

definite matrix and ’ is an n-dimensional vector, then

’

A’ ¼ 0 if and only if A’ ¼ 0.

Since S

, S

, and S

are all nonnegative definite and

¼ S

þ S

, it’s easy to get:

Lemma 2. If S

is singular, ’

’ ¼ 0 if and only if ’

’ ¼

0 and ’

’ ¼ 0.

Since IR

¼ spanf

;

; ;

g, for an arbitrary ’ 2 IR

’ can be denoted by

’ ¼ k



þþk



þ k

mþ1



mþ1

þþk



: ð22Þ

Let w ¼ k



þþk



and u ¼ k

mþ1



mþ1

þþk



then, from the definition of 

and 

, ’ can be denoted by

’ ¼ w þ u, where w 2 

and u 2 

Definition 1. For an arbitrary ’ 2 IR

, ’ can be denoted by

’ ¼ w þ u, where w 2 

and u 2 

. The compression

mapping L :IR! 

is defined by ’ ¼ w þ u ! w.

It is easy to verify that L is a linear transformation from

to its subspace 

Theorem 1. Under the compression mapping L :IR

! 

determined by ’ ¼ w þ u ! w, the UDP criterion satisfies

Jð’Þ¼JðwÞ.

Proof. Since 

isthe nullspace of S

,for anyu 2 

,we have

u ¼ 0.

From Lemma 2, it follows that u

u ¼ 0. Since S

a nonnegative definite matrix, we have S

u ¼ 0 by

Lemma 1. Hence,

’

’ ¼ w

w þ 2w

u þ u

u ¼ w

Similarly, it can be derived that

’

’ ¼ w

w þ 2w

u þ u

u ¼ w

Therefore, Jð’Þ¼JðwÞ. tu

According to Theorem 1, we can conclude that the

optimal projection axes can be derived from 

without any

loss of effective discriminatory information with respect to

the UDP criterion. From linear al gebra the ory, 

isomorphic to an m-dimensional Euclidean space IR

and

the corresponding isomorphic mapping is

w ¼ Pv; where P ¼ð

;

; ;

Þ; v 2 IR

; ð23Þ

which is a one-to-one mapping from IR

onto 

From the isomorphic mapping w ¼ Pv, the UDP criter-

ion function J wðÞbecomes

J wðÞ¼

ðP

PÞv

ðP

PÞv



JðvÞ; ð24Þ

where

¼ P

P and

¼ P

P. It is easy to prove that

and

are both m  m semipositive definite matrices.

This means

JðvÞ is a function of a generalized Rayleigh

quotient like J wðÞ.

By the property of isomorphic mapping and (24), the

following theorem holds:

Theorem 2. Let w ¼ Pv be an isomorphic mapping from IR

onto 

. Then, w



¼ Pv



is the stationary point of the UDP

654 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 4, APRIL 2007

Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Applications to Face and Palm Biometrics

Figures

Citations

Maximum neighborhood margin discriminant projection for classification

Extended SRC: Undersampled Face Recognition via Intraclass Variant Dictionary

A survey of palmprint recognition

Gabor feature based sparse representation for face recognition with gabor occlusion dictionary

Fusing Local Patterns of Gabor Magnitude and Phase for Face Recognition

References

Nonlinear dimensionality reduction by locally linear embedding.

Eigenfaces for recognition

A global geometric framework for nonlinear dimensionality reduction.

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

Nonlinear component analysis as a kernel eigenvalue problem

Related Papers (5)

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

Nonlinear dimensionality reduction by locally linear embedding.

A global geometric framework for nonlinear dimensionality reduction.

Laplacian Eigenmaps for dimensionality reduction and data representation

Locality Preserving Projections

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics" ?

Q2. What are the well-known methods for finding intrinsic low-dimensional embedding of data?

Q3. Why are manifold learning algorithms unsuitable for pattern recognition tasks?

Q4. What is the locality radius of the ellipse?

Q5. What is the reason why the authors use only the cosine distance metric?

Q6. What is the way to achieve an optimal recognition result?

Q7. What is the way to find the optimal projection for clustering in the observed space?

Q8. What is the definition of a linear approximation of the nonlinear map?

Q9. How many images are used in the training sample set?

Q10. What is the way to improve the performance of PCA?

Q11. What is the criterion for the projection of samples?

Q12. What are the two popular nonlinear dimensionality reduction techniques?

Q13. What is the subset of images that are marked with two-character strings?