scispace - formally typeset
Open AccessJournal ArticleDOI

Multiple Kernel Learning for Dimensionality Reduction

Reads0
Chats0
TLDR
The proposed approach generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: first, the method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data, and consequently improves their effectiveness.
Abstract
In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. The resulting data representations are typically high-dimensional and assume diverse forms. Hence, finding a way of transforming them into a unified space of lower dimension generally facilitates the underlying tasks such as object recognition or clustering. To this end, the proposed approach (termed MKL-DR) generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: First, our method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data. Second, it extends a broad set of existing dimensionality reduction techniques to consider multiple kernel learning, and consequently improves their effectiveness. Third, by focusing on the techniques pertaining to dimensionality reduction, the formulation introduces a new class of applications with the multiple kernel learning framework to address not only the supervised learning problems but also the unsupervised and semi-supervised ones.

read more

Content maybe subject to copyright    Report

Multiple Kernel Learning
for Dimensionality Reduction
Yen-Yu Lin, Tyng-Luh Liu, Member, IEEE, and Chiou-Shann Fuh, Member, IEEE
Abstract—In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a
feasible way for improving performance. The resulting data representations are typically high-dimensional and assume diverse forms.
Hence, finding a way of transforming them into a unified space of lower dimension generally facilitates the underlying tasks such as
object recognition or clustering. To this end, the proposed approach (termed MKL-DR) generalizes the framework of multiple kernel
learning for dimensionality reduction, and distinguishes itself with the following three main contributions: First, our method provides the
convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data.
Second, it extends a broad set of existing dimensionality reduction techniques to consider multiple kernel learning, and consequently
improves their effectiveness. Third, by focusing on the techniques pertaining to dimensionality reduction, the formulation introduces a
new class of applications with the multiple kernel learning framework to address not only the supervised learning problems but also the
unsupervised and semi-supervised ones.
Index Terms—Dimensionality reduction, multiple kernel learning, object categorization, image clustering, face recognition.
Ç
1INTRODUCTION
T
HE fact that most visual learning problems deal with
high-dimensional data has made dimensionality reduc-
tion an inherent part of the current research. Besides having
the potential for a more efficient approach, working with a
new space of lower dimension can often gain the advantage
of better analyzing the intrinsic structures in the data for
various applications. For example, dimensionality reduc-
tion can be performed to compress data for a compact
representation [25], [56], to visualize high-dimensional data
[40], [47], to exclude unfavorable data variations [8], or to
improve the classification power of the nearest neighbor
rule [9], [54].
Despite the great applicability, existing dimensionality
reduction methods often suffer from two main restrictions.
First, many of them, especially the linear ones, require data
to be represented in the form of feature vectors. The
limitation may eventually reduce the effectiveness of the
overall algorithms when the data of interest could be more
precisely characterized in other forms, e.g., bag-of-features
[2], [33], matrices, or high-order tensors [54], [57]. Second,
there seems to be a lack of a systematic way of integrating
multiple image features for dimensionality reduction. When
addressing applications where no single descriptor can
appropriately depict the whole data set, this shortcoming
becomes even more evident. Alas, it is usually the case for
addressing today’s vision applications, such as the recogni-
tion task in the Caltech-101 data set [14] or the classification
and detection tasks in the Pascal VOC challenge [13]. On the
other hand, the advantage of using multiple features has
indeed been consistently pointed out in a number of recent
research efforts, e.g., [7], [18], [31], [50], [51].
Aiming to overcome the above-mentioned restrictions,
we introduce a framework called MKL-DR that incorpo-
rates multiple kernel learning (MKL) into the training process
of dimensionality reduction (DR) methods. It works with
multiple base kernels, each of which is created based on a
specific kind of data descriptor, and fuses the descriptors in
the domain of kernel matrices. We will illustrate the
formulation of MKL-DR with graph embedding [54], which
provides a unified view for a large family of DR methods.
Any DR technique expressible by graph embedding can
therefore be generalized by MKL-DR to boost their power
by simultaneously taking account of data characteristics
captured in different descriptors. It follows that the
proposed approach can extend the MKL framework to
address, as the corresponding DR methods would do, not
only the supervised learning problems but also the unsuper-
vised and semi-supervised ones.
2RELATED WORK
Since the relevant literature is quite extensive, our survey
instead emphasizes the key concepts crucial to the establish-
ment of the proposed framework.
2.1 Dimensionality Reduction
Techniques to perform dimensionality reduction for high-
dimensional data can vary considerably from each other
due to, e.g., different assumptions about the data distribu-
tion or the availability of the data labeling. We categorize
them as follows:
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. X, XXXXXXX 2011 1
. Y.-Y. Lin and T.-L. Liu are with the Institute of Information Science,
Academia Sinica, Nankang, Taipei 115, Taiwan.
E-mail: {yylin, liutyng}@iis.sinica.edu.tw.
. C.-S. Fuh is with the Department of Computer Science and Information
Engineering, National Taiwan University, Taipei 106, Taiwan.
E-mail: fuh@csie.ntu.edu.tw.
Manuscript received 25 Jan. 2010; revised 9 July 2010; accepted 28 July 2010;
published online 30 Sept. 2010.
Recommended for acceptance by S. Belongie.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number
TPAMI-2010-01-0054.
Digital Object Identifier no. 10.1109/TPAMI.2010.183.
0162-8828/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society

2.1.1 Unsupervised DR
Principal component analysis (PCA) [25] is the most well-
known one that finds a linear mapping by maximizing the
projected variances. For nonlinear DR techniques, isometric
feature mapping (Isomap) [47] and locally linear embedding
(LLE) [40] both exploit the manifold assumption to yield the
embeddings. And, to resolve the out-of-sample problem in
Isomap and LLE, locality preserving projections (LPP) [23] are
proposed to uncover the data manifold by a linear relaxation.
2.1.2 Supervised DR
Linear discriminant analysis (LDA) assumes that the data of
each class have a Gaussian distribution, and derives a
projection from simultaneously maximizing the between-
class scatter and minimizing the within-class sc atter.
Alternatively, marginal Fisher analysis (MFA) [54] and local
discriminant embedding (LDE) [9] adopt the assumption that
the data of each class spread as a submanifold, and seek a
discriminant embedding over these submanifolds.
2.1.3 Semi-Supervised DR
If the observed data are partially labeled, dimensionality
reduction can be performed by carrying out discriminant
analysis over the labeled ones while preserving the intrinsic
geometric structures of the remaining. Such techniques are
useful, say, for vision applications where user interactions are
involved, e.g., semi-supervised discriminant analysis (SDA) [6]
for content-based image retrieval with relevance feedback.
2.1.4 Kernelization
It is possible to kern elize a certain typ e of linear DR
techniques into nonlinear ones. As shown in [6], [9], [23],
[34], [41], [54], the kernelized versions generally can achieve
significant improvements. In addition, kernelization pro-
vides a convenient way for DR methods to handle data not
in vector form by specifying an associated kernel, e.g., the
pyramid matching kernel [21] for data in the form of bag-of-
features or the dissimilarity kernel [38] based on the pairwise
distances.
2.2 Graph Embedding
A number of dimensionality reduction methods focus on
modeling the pairwise relationships among data and utilize
graph-based structures. In particular, the framework of
graph embedding [54] provides a unified formulation for a
broad set of such DR algorithms. Let ¼fx
i
2 IR
d
g
N
i¼1
be
the data set. A DR scheme accounted for by graph
embedding involves a complete graph G whose vertices
are over . A corresponding affinity matrix W ¼½w
ij
2
IR
NN
is used to record the edge weights that characterize
the similarity relationships between pairs of training
samples. Then, the optimal linear embedding v
2 IR
d
can
be obtained by solving
v
¼ arg min
v
>
XDX
>
v¼1; or
v
>
XL
0
X
>
v¼1
v
>
XLX
>
v; ð1Þ
where X ¼½x
1
x
2
 x
N
is the data matrix and L ¼
diagðW 11ÞW is the graph Laplacian of G. Depending
on the property of a problem, one of the two constraints in
(1) will be used in the optimization. If the first constraint is
chosen, a diagonal matrix D ¼½d
ij
2IR
NN
is included for
scale normalization. Otherwise, another complete graph G
0
over is required for the second constraint, where L
0
and
W
0
¼½w
0
ij
2IR
NN
are, respectively, the graph Laplacian
and affinity matrix of G
0
. The optimization problem (1) has
an intuitive interpretation: v
>
X ¼½v
>
x
1
 v
>
x
N
repre-
sents the projected data; graph Laplacian L (or L
0
)isto
explore the pairwise distances of the projected data, while
diagonal matrix D is to weightedly combine their distances
to the origin. More precisely, the meaning of (1) can be better
understood with the following equivalent problem:
min
v
X
N
i;j¼1
kv
>
x
i
v
>
x
j
k
2
w
ij
ð2Þ
subject to
X
N
i¼1
kv
>
x
i
k
2
d
ii
¼ 1; or ð3Þ
X
N
i;j¼1
kv
>
x
i
v
>
x
j
k
2
w
0
ij
¼ 1: ð4Þ
The constrained optimization problem (2) implies that only
distances to the origin or pairwise distances of projected
data (in the form of v
>
x) are modeled by the framework. By
specifying W and D (or W and W
0
), Yan et al. [54] show that
a set of dimensionality reduction methods, such as PCA
[25], LPP [23], LDA, and MFA [54] can be expressed by (1).
Clearly, LDE [9] and SDA [6] are also in the class of graph
embedding.
2.3 Multiple Kernel Learning
MKL refers to the process of learning a kernel machine with
multiple kernel functions or kernel matrices. Recent
research efforts on MKL, e.g., [1], [20], [29], [39], [45], have
shown that learning SVMs with multiple kernels not only
increases the accuracy but also enhances the interpretability
of the resulting classifiers. Our MKL formulation is to find
an optimal way to linearly combine the given kernels.
Suppose we have a set of base kernel functions fk
m
g
M
m¼1
(or
base kernel matrices fK
m
g
M
m¼1
). An ensemble kernel function k
(or an ensemble kernel matrix K) is then defined by
kðx
i
; x
j
Þ¼
X
M
m¼1
m
k
m
ðx
i
; x
j
Þ;
m
0; ð5Þ
K ¼
X
M
m¼1
m
K
m
;
m
0: ð6Þ
Consequently, an often-used MKL model from binary-class
data x
i
;y
i
21Þg
N
i¼1
is
fðxÞ¼
X
N
i¼1
i
y
i
kðx
i
; xÞþb ð7Þ
¼
X
N
i¼1
i
y
i
X
M
m¼1
m
k
m
ðx
i
; xÞþb: ð8Þ
Optimizing over both the coefficients f
i
g
N
i¼1
and f
m
g
M
m¼1
is one particular form of the MKL problems. Our approach
utilizes such an MKL optimization to yield more flexible
dimensionality reduction schemes for data in different
feature representations.
2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. X, XXXXXXX 2011

2.4 Dimensionalit y Reduction with Multiple Kernels
Our approach is related to the work of Kim et al. [27], where
learning an optimal kernel over a given convex set of
kernels is coupled with kernel Fisher discriminant analysis
(KFDA) for binary-class data. Motivated by their idea of
learning an opti mal kernel for improving the KFDA
performance, we instead consider establishing a general
framework of dimensionality reduction for data in various
feature representations via multiple kernel learning [32]. As
we will show later, MKL-DR can be used to conveniently
deal with image data depicted by different descriptors, and
effectively tackle not only supervised but also semi-
supervised and unsupervised learning tasks. To the best
of our knowledge, such a generalization of multiple kernel
learning is novel.
3THE MKL-DR FRAMEWORK
We first discuss the construction of base kernels from
multiple descriptors, and then explain how to integrate
them for dimensionality reduction. Finally, we present an
optimization procedure to complete the framework.
3.1 Kernel as a Unified Feature Representation
Consider again a data set of N samples, and M kinds of
descriptors to characterize each sample. Let ¼fx
i
g
N
i¼1
,
x
i
¼fx
i;m
2X
m
g
M
m¼1
, and d
m
: X
m
X
m
! 0 [ IR
þ
be the
distance function for data representation under the
mth descriptor. In general, the domains resulting from
distinct descriptors, e.g., feature vectors, histograms, or
bags of features, are different. To eliminate such variances
in representation, we express data under each descriptor as
a kernel matrix. There are several ways to accomplish this
goal, such as using the RBF kernel for data in the form of
vector or the pyramid match kernel [21] for data in the form
of bag-of-features. We may also convert pairwise distances
between data samples to a kernel matrix [50], [58]. By
coupling each representation with its corresponding dis-
tance function, we obtain a set of M dissimilarity-based
kernel matrices fK
m
g
M
m¼1
, where
K
m
ði; jÞ¼k
m
ðx
i
; x
j
Þ¼exp
d
2
m
ðx
i;m
; x
j;m
Þ
2
m

ð9Þ
and
m
is a positive constant. Our use of dissimilarity-based
kernels is convenient and advantageous in solving visual
learning tasks, especially due to the fact that a number of well-
designed descriptors and their associated distance functions
have been introduced over the years. However, K
m
in (9) is
not always guaranteed to be positive semidefinite. Following
[58], we resolve this issue by first computing the smallest
eigenvalue of K
m
. Then, if it is negative, we add its absolute
value to the diagonal of K
m
. With (5), (6), and (9), determining
a set of optimal ensemble coefficients f
1
;
2
; ...;
M
g can
now be interpreted as finding appropriate weights for best
fusing the M feature representations.
Note that in our formulation, accessing the data is
restricted to referencing the resulting M kernels defined in
(9). The main advantage of doing so is that it enables our
approach to work with different descriptors and distance
functions, without the need to explicitly handle the
variations among the representations.
3.2 The MKL-DR Algorithm
Instead of designing a specific dimensionality reduction
algorithm, we choose to describe MKL-DR upon graph
embedding. This way we can emphasize the flexibility of
the proposed approach: If a dimensionality reduction
scheme is explained by graph embedding, then it will
also be extendible by MKL-DR to handle data in multiple
feature representations. Recall that there are two possible
types of constraints in graph embedding. For ease of
presentation, we discuss how to develop MKL-DR subject
to constraint (4). However, the derivation can be analo-
gously applied when using constraint (3).
Kernelization in MKL-DR is accomplished in a similar way
to that in kernel PCA [41], but with the key difference in using
multiple kernels fK
m
g
M
m¼1
. Suppose the ensemble kernel K in
MKL-DR is generated by linearly combining the base kernels
fK
m
g
M
m¼1
as in (6). Let : X!Fdenote the feature mapping
induced by K. Via , the training data can be implicitly
mapped to a high-dimensional Hilbert space, i.e.,
x
i
7! ðx
i
Þ; for i ¼ 1; 2; ...;N: ð10Þ
Since optimizing (1) or (2) can be reduced to solving the
eigenvalue problem XLX
>
v ¼ XL
0
X
>
v, it implies that an
optimal v lies in the span of training data, i.e.,
v ¼
X
N
n¼1
n
ðx
n
Þ: ð11Þ
To show that the underlying algorithm can be reformulated
in the form of inner product and accomplished in the new
feature space F , we observe that by plugging each mapped
sample ðx
i
Þ into (2), projection v would appear exclusively
in the form of v
>
ðx
i
Þ. Hence, it suffices to show that in
MKL-DR, v
>
ðx
i
Þ can be evaluated via the kernel trick
v
>
ðx
i
Þ¼
X
N
n¼1
X
M
m¼1
n
m
k
m
ðx
n
; x
i
Þ¼
>
IK
ðiÞ
; ð12Þ
where
¼
1

N
½
>
2 IR
N
; ð13Þ
¼
1

M
½
>
2 IR
M
; ð14Þ
IK
ðiÞ
¼
K
1
ð1;iÞ  K
M
ð1;iÞ
.
.
.
.
.
.
.
.
.
K
1
ðN;iÞ  K
M
ðN;iÞ
2
6
6
4
3
7
7
5
2 IR
NM
: ð15Þ
With (2) and (12), we define the constrained optimization
problem for 1D MKL-DR as follows:
min
;
X
N
i;j¼1
k
>
IK
ðiÞ
>
IK
ðjÞ
k
2
w
ij
ð16Þ
subject to
X
N
i;j¼1
k
>
IK
ðiÞ
>
IK
ðjÞ
k
2
w
0
ij
¼ 1; ð17Þ
m
0;m¼ 1; 2; ...;M: ð18Þ
The additional constraints in (18) arise from the use of the
ensemble kernel in (5) or (6), and are to ensure that the
resulting kernel K in MKL-DR is a nonnegative combina-
tion of base kernels.
LIN ET AL.: MULTIPLE KERNEL LEARNING FOR DIMENSIONALITY REDUCTION 3

Observe from (12) that the one-dimensional projection v
of MKL-DR is specified by a sample coefficient vector and a
kernel weight vector . The two vectors, respectively,
account for the relative importance among the samples
and the base kernels in the construction of the projection.
To generalize the formulation to uncover a multidimen-
sional projection, we consider a set of P sample coefficient
vectors, denoted by
A ¼½
1
2

P
: ð19Þ
With A and , each 1D projection v
i
is determined by a
specific sample coefficient vector
i
and the (shared) kernel
weight vector . The resulting projection V ¼½v
1
v
2
 v
P
will map samples to a P -dimensional euclidean space.
Analogously to the 1D case, a projected sample x
i
can be
written as
V
>
ðx
i
Þ¼A
>
IK
ðiÞ
2 IR
P
: ð20Þ
The optimization problem (16) can now be extended to
accommodate the multidimensional projection
min
A;
X
N
i;j¼1
kA
>
IK
ðiÞ
A
>
IK
ðjÞ
k
2
w
ij
ð21Þ
subject to
X
N
i;j¼1
kA
>
IK
ðiÞ
A
>
IK
ðjÞ
k
2
w
0
ij
¼ 1; ð22Þ
m
0;m¼ 1; 2; ...;M: ð23Þ
Before specifying the details of how to solve the
constrained optimization problem (21) in the next section,
we give an illustration of the four kinds of spaces related to
MKL-DR and the connections among them in Fig. 1. The
four spaces, in order, are the input space of each feature
representation, the reproducing kernel Hilbert space
(RKHS) induced by each base kernel and the ensemble
kernel, and the projected euclidean space.
3.3 Optimization
Since direct optimization to (21) is difficult, we instead
adopt an iterative, two-step strategy to alternately optimize
A and . At each iteration, one of A and is optimized
while the other is fixed, and then the roles of A and are
switched. Iterations are repeated until convergence or a
maximum number of iterations is reached.
On optimizing A. By fixing and using the property
kuk
2
¼ traceðuu
>
Þ for a column vector u, the optimization
problem (21) is reduced to
min
A
traceðA
>
S
W
AÞ
subject to traceðA
>
S
W
0
AÞ¼1;
ð24Þ
where
S
W
¼
X
N
i;j¼1
w
ij
ðIK
ðiÞ
IK
ðjÞ
Þ
>
ðIK
ðiÞ
IK
ðjÞ
Þ
>
; ð25Þ
S
W
0
¼
X
N
i;j¼1
w
0
ij
ðIK
ðiÞ
IK
ðjÞ
Þ
>
ðIK
ðiÞ
IK
ðjÞ
Þ
>
: ð26Þ
The optimization problem (24) is a trace ratio problem, i.e.,
min
A
traceðA
>
S
W
AÞ=traceðA
>
S
W
0
AÞ. Following [9] and [52],
one can obtain a closed-form solution by transforming
(24) into the corresponding ratio trace pro blem, i.e.,
min
A
trace½ðA
>
S
W
0
AÞ
1
ðA
>
S
W
AÞ.Consequently,thecol-
umns of the optimal A
¼½
1
2

P
are the eigenvectors
corresponding to the first P smallest eigenvalues in
S
W
¼ S
W
0
: ð27Þ
On optimizing . By fixing A and kuk
2
¼ u
>
u, the
optimization problem (21) becomes
min
>
S
A
W
subject to
>
S
A
W
0
¼ 1 and 0;
ð28Þ
where
S
A
W
¼
X
N
i;j¼1
w
ij
ðIK
ðiÞ
IK
ðjÞ
Þ
>
AA
>
ðIK
ðiÞ
IK
ðjÞ
Þ; ð29Þ
S
A
W
0
¼
X
N
i;j¼1
w
0
ij
ðIK
ðiÞ
IK
ðjÞ
Þ
>
AA
>
ðIK
ðiÞ
IK
ðjÞ
Þ: ð30Þ
The additional constraints 0 cause the optimization to
(28) no longer be formulatable as a generalized eigenvalue
4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. X, XXXXXXX 2011
Fig. 1. Four kinds of spaces in MKL-DR: (a) the input space of each feature representation, (b) the RKHS induced by each base kernel, (c) the RKHS
by the ensemble kernel, and (d) the projected euclidean space.

problem. Indeed, it now becomes a nonconvex quadrati-
cally constrained quadratic programming (QCQP) problem,
and is known to be hard to solve. We instead consider
solving its convex relaxation by adding an auxiliary
variable B of size M M:
min
;B
traceðS
A
W
BÞð31Þ
subject to traceðS
A
W
0
BÞ¼1; ð32Þ
e
>
m
0;m¼ 1; 2; ...;M; ð33Þ
1
>
B
"#
0; ð34Þ
where e
m
in (33) is a column vector whose elements are 0
except that its mth element is 1, and the constraint in (34)
means that the square matrix is a positive semidefinite. The
optimization problem (31) is a semidefinite programming
(SDP) relaxation of the nonconvex QCQP problem (28), and
can be efficiently solved by SDP. One can verify the
equivalence between the two optimization problems (28)
and (31) by replacing the constraint (34) with B ¼ 
>
.In
view of that the constraint B ¼ 
>
is nonconvex, it is
relaxed to B 
>
. Applying the Schur complement
lemma, B 
>
can be equivalently expressed by the
constraint in (34). (Refer to [49] for the details.) Concerning
the computational complexity, we note that the numbers of
constraints and variables in (31) are, respectively, linear
and quadratic to M, the number of the adopted descrip-
tors. In practice, the value of M is often small. (M ¼ 4 10
in our experiments.) Thus, like most of the other DR
methods, the computational bottleneck of MKL-DR is still
in solving the generalized eigenvalue problems, whose
complexity is N
3
Þ.
Listed in Algorithm 1 (Fig. 2), the procedure of MKL-DR
requires an initial guess to either A or in the alternating
optimization. We have tried two possibilities: 1) is
initialized by setting all of its elements as 1 to equally
weight base kernels; 2) A is initialized by assuming
AA
>
¼ I. In our empirical testing, the second initialization
strategy gives more stable performances and is thus
adopted in the experiments. Pertaining to the convergence
of the optimization procedure, since SDP relaxation has
been used, the values of the objective function are not
guaranteed to monotonically decrease throughout the
iterations. Still, the optimization procedures rapidly con-
verge after only a few iterations in all of our experiments.
3.4 Novel Sample Embedding
After accomplishing the training procedure of MKL-DR, we
are ready to project a testing sample, say z, into the learned
space of lower dimension by
z 7! A
>
IK
ðzÞ
; where ð35Þ
IK
ðzÞ
2 IR
NM
and IK
ðzÞ
ðn; mÞ¼k
m
ðx
n
; zÞ: ð36Þ
Depending on the applications, some postprocessing, such
as the nearest neighbor rule for classification or k-means
clustering for data grouping, is then applied to the projected
sample(s) to complete the task. In the remainder of this
paper, we specifically discuss three sets of experimental
results to demonstrate the effectiveness of MKL-DR,
including supervised learning for object categorization,
unsupervised learning for image clustering, and semi-
supervised learning for face recognition.
4EXPERIMENTAL RESULTS: SUPERVISED
LEARNING FOR OBJECT CATEGORIZATION
Applying MKL-DR to object categorization is appropriate
as the complexity of the task often requires the use of
multiple feature descriptors. And in our experiments, the
effectiveness of MKL-DR will be investigated through a
supervised learning formulation.
4.1 Data Set
The Caltech-101 data set [14], collected by Fei-Fei et al., is
used in our experiments for object categorization. It consists
of 101 object categories and one additional class of
background images. The total number of categories is 102,
and each category contains roughly 40 to 800 images.
Although each target object often appears in the central
region of an image, the large class number and the
substantial intraclass variations still make the data set very
challenging. Indeed, the data set provides a good test bed to
demonstrate the advantage of using multiple image
descriptors for complex recognition tasks. Note that as the
LIN ET AL.: MULTIPLE KERNEL LEARNING FOR DIMENSIONALITY REDUCTION 5
Fig. 2. Algorithm 1.

Citations
More filters
Journal ArticleDOI

A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment

TL;DR: In this paper, a Parallel Random Forest (PRF) algorithm for big data on the Apache Spark platform is presented. And the PRF algorithm is optimized based on a hybrid approach combining dataparallel and task-parallel optimization, and a dual parallel approach is carried out in the training process of RF and a task Directed Acyclic Graph (DAG) is created according to the parallel training process.
Posted Content

A survey of dimensionality reduction techniques.

TL;DR: In this review, the plethora of dimension reduction techniques available are categorized and the mathematical insight behind them are given.
Proceedings ArticleDOI

Affinity aggregation for spectral clustering

TL;DR: AASC seeks for an optimal combination of affinity matrices so that it is more immune to ineffective affinities and irrelevant features, which enables the construction of similarity or distance-metric measures for clustering less crucial.
Journal ArticleDOI

Hi-Net: Hybrid-Fusion Network for Multi-Modal MR Image Synthesis

TL;DR: A novel Hybrid-fusion Network (Hi-Net) is proposed for multi-modal MR image synthesis, which learns a mapping from multi- modal source images to target images, and effectively exploits the correlations among multiple modalities.
References
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Book

Principal Component Analysis

TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.
Journal ArticleDOI

Nonlinear dimensionality reduction by locally linear embedding.

TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Journal ArticleDOI

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Journal ArticleDOI

A global geometric framework for nonlinear dimensionality reduction.

TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Related Papers (5)
Frequently Asked Questions (18)
Q1. What have the authors contributed in "Multiple kernel learning for dimensionality reduction" ?

To this end, the proposed approach ( termed MKL-DR ) generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: Third, by focusing on the techniques pertaining to dimensionality reduction, the formulation introduces a new class of applications with the multiple kernel learning framework to address not only the supervised learning problems but also the unsupervised and semi-supervised ones. 

If the observed data are partially labeled, dimensionality reduction can be performed by carrying out discriminant analysis over the labeled ones while preserving the intrinsic geometric structures of the remaining. 

Pertaining to the convergence of the optimization procedure, since SDP relaxation hasbeen used, the values of the objective function are not guaranteed to monotonically decrease throughout the iterations. 

A number of dimensionality reduction methods focus on modeling the pairwise relationships among data and utilize graph-based structures. 

Applying MKL-DR to object categorization is appropriate as the complexity of the task often requires the use of multiple feature descriptors. 

Let ¼ fxigNi¼1, xi ¼ fxi;m 2 XmgMm¼1, and dm : Xm Xm ! 0 [ IRþ be the distance function for data representation under the mth descriptor. 

The optimization problem (31) is a semidefinite programming (SDP) relaxation of the nonconvex QCQP problem (28), and can be efficiently solved by SDP. 

THE fact that most visual learning problems deal withhigh-dimensional data has made dimensionality reduction an inherent part of the current research. 

Their related work [31] that performs adaptive feature fusing via locally combining kernel matrices has a recognition rate of 59.8 percent, while merging 12 kernel matrices from the support kernel machines (SKMs) [1] by Kumar and Sminchisescu [28] yields 57.3 percent. 

Since the data set is now a subset of Caltech-101, it is convenient to use the same 10 descriptors and distance functions that are discussed in Section 4.2 to establish the base kernels for MKL-DR. 

Throughout this work, MKL-DR has been comprehensively evaluated in three important computer vision applications, including supervised object recognition, unsupervised image clustering, and semi-supervised face recognition. 

To relieve the effect of sampling, the whole process of performance evaluation is redone 20 times by using different random splits between the training and testing subsets. 

Concerning the computational complexity, the authors note that the numbers of constraints and variables in (31) are, respectively, linear and quadratic to M, the number of the adopted descriptors. 

A corresponding affinity matrix W ¼ ½wij 2 IRN N is used to record the edge weights that characterize the similarity relationships between pairs of training samples. 

Using the pyramid matching kernel over data in the bag-of-features representation, the recognition rate by Grauman and Darrell [21] is 50 percent. 

The quantitative results in Table 4 show that MKL-SDA can boost the recognition rate about 10 percent by making use of the additional information from the unlabeled training data. 

The authors follow the setting in [12], where affinity propagation [15] is used for unsupervised image categorization, and select the same 20 categories from Caltech-101 for the image clustering experiments. 

marginal Fisher analysis (MFA) [54] and local discriminant embedding (LDE) [9] adopt the assumption that the data of each class spread as a submanifold, and seek a discriminant embedding over these submanifolds.