How many percent of the recognition rate is achieved by combining the pyramid matching kernel?

Their related work [31] that performs adaptive feature fusing via locally combining kernel matrices has a recognition rate of 59.8 percent, while merging 12 kernel matrices from the support kernel machines (SKMs) [1] by Kumar and Sminchisescu [28] yields 57.3 percent.

What is the main advantage of using the same descriptors and distance functions?

Since the data set is now a subset of Caltech-101, it is convenient to use the same 10 descriptors and distance functions that are discussed in Section 4.2 to establish the base kernels for MKL-DR.

What are the main features of MKL-DR?

Throughout this work, MKL-DR has been comprehensively evaluated in three important computer vision applications, including supervised object recognition, unsupervised image clustering, and semi-supervised face recognition.

How is the process of performance evaluation redone?

To relieve the effect of sampling, the whole process of performance evaluation is redone 20 times by using different random splits between the training and testing subsets.

How many percent of the recognition rate is achieved by Grauman and Darrell?

Using the pyramid matching kernel over data in the bag-of-features representation, the recognition rate by Grauman and Darrell [21] is 50 percent.

How can MKL-SDA boost the recognition rate?

The quantitative results in Table 4 show that MKL-SDA can boost the recognition rate about 10 percent by making use of the additional information from the unlabeled training data.

How many categories are used for the image clustering experiments?

The authors follow the setting in [12], where affinity propagation [15] is used for unsupervised image categorization, and select the same 20 categories from Caltech-101 for the image clustering experiments.

(Open Access) Multiple Kernel Learning for Dimensionality Reduction (2011) | Yen-Yu Lin

Q: What have the authors contributed in "Multiple kernel learning for dimensionality reduction" ?

To this end, the proposed approach ( termed MKL-DR ) generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: Third, by focusing on the techniques pertaining to dimensionality reduction, the formulation introduces a new class of applications with the multiple kernel learning framework to address not only the supervised learning problems but also the unsupervised and semi-supervised ones.

Q: What is the distance function for data representation under the mth descriptor?

Let ¼ fxigNi¼1, xi ¼ fxi;m 2 XmgMm¼1, and dm : Xm Xm ! 0 [ IRþ be the distance function for data representation under the mth descriptor.

Q: What is the main reason why dimensionality reduction is an inherent part of the current research?

THE fact that most visual learning problems deal withhigh-dimensional data has made dimensionality reduction an inherent part of the current research.

Multiple Kernel Learning

for Dimensionality Reduction

Yen-Yu Lin, Tyng-Luh Liu, Member, IEEE, and Chiou-Shann Fuh, Member, IEEE

Abstract—In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a

feasible way for improving performance. The resulting data representations are typically high-dimensional and assume diverse forms.

Hence, finding a way of transforming them into a unified space of lower dimension generally facilitates the underlying tasks such as

object recognition or clustering. To this end, the proposed approach (termed MKL-DR) generalizes the framework of multiple kernel

learning for dimensionality reduction, and distinguishes itself with the following three main contributions: First, our method provides the

convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data.

Second, it extends a broad set of existing dimensionality reduction techniques to consider multiple kernel learning, and consequently

improves their effectiveness. Third, by focusing on the techniques pertaining to dimensionality reduction, the formulation introduces a

new class of applications with the multiple kernel learning framework to address not only the supervised learning problems but also the

unsupervised and semi-supervised ones.

Index Terms—Dimensionality reduction, multiple kernel learning, object categorization, image clustering, face recognition.

1INTRODUCTION

HE fact that most visual learning problems deal with

high-dimensional data has made dimensionality reduc-

tion an inherent part of the current research. Besides having

the potential for a more efficient approach, working with a

new space of lower dimension can often gain the advantage

of better analyzing the intrinsic structures in the data for

various applications. For example, dimensionality reduc-

tion can be performed to compress data for a compact

representation [25], [56], to visualize high-dimensional data

[40], [47], to exclude unfavorable data variations [8], or to

improve the classification power of the nearest neighbor

rule [9], [54].

Despite the great applicability, existing dimensionality

reduction methods often suffer from two main restrictions.

First, many of them, especially the linear ones, require data

to be represented in the form of feature vectors. The

limitation may eventually reduce the effectiveness of the

overall algorithms when the data of interest could be more

precisely characterized in other forms, e.g., bag-of-features

[2], [33], matrices, or high-order tensors [54], [57]. Second,

there seems to be a lack of a systematic way of integrating

multiple image features for dimensionality reduction. When

addressing applications where no single descriptor can

appropriately depict the whole data set, this shortcoming

becomes even more evident. Alas, it is usually the case for

addressing today’s vision applications, such as the recogni-

tion task in the Caltech-101 data set [14] or the classification

and detection tasks in the Pascal VOC challenge [13]. On the

other hand, the advantage of using multiple features has

indeed been consistently pointed out in a number of recent

research efforts, e.g., [7], [18], [31], [50], [51].

Aiming to overcome the above-mentioned restrictions,

we introduce a framework called MKL-DR that incorpo-

rates multiple kernel learning (MKL) into the training process

of dimensionality reduction (DR) methods. It works with

multiple base kernels, each of which is created based on a

specific kind of data descriptor, and fuses the descriptors in

the domain of kernel matrices. We will illustrate the

formulation of MKL-DR with graph embedding [54], which

provides a unified view for a large family of DR methods.

Any DR technique expressible by graph embedding can

therefore be generalized by MKL-DR to boost their power

by simultaneously taking account of data characteristics

captured in different descriptors. It follows that the

proposed approach can extend the MKL framework to

address, as the corresponding DR methods would do, not

only the supervised learning problems but also the unsuper-

vised and semi-supervised ones.

2RELATED WORK

Since the relevant literature is quite extensive, our survey

instead emphasizes the key concepts crucial to the establish-

ment of the proposed framework.

2.1 Dimensionality Reduction

Techniques to perform dimensionality reduction for high-

dimensional data can vary considerably from each other

due to, e.g., different assumptions about the data distribu-

tion or the availability of the data labeling. We categorize

them as follows:

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. X, XXXXXXX 2011 1

. Y.-Y. Lin and T.-L. Liu are with the Institute of Information Science,

Academia Sinica, Nankang, Taipei 115, Taiwan.

E-mail: {yylin, liutyng}@iis.sinica.edu.tw.

. C.-S. Fuh is with the Department of Computer Science and Information

Engineering, National Taiwan University, Taipei 106, Taiwan.

E-mail: fuh@csie.ntu.edu.tw.

Manuscript received 25 Jan. 2010; revised 9 July 2010; accepted 28 July 2010;

published online 30 Sept. 2010.

Recommended for acceptance by S. Belongie.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number

TPAMI-2010-01-0054.

Digital Object Identifier no. 10.1109/TPAMI.2010.183.

0162-8828/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society

2.1.1 Unsupervised DR

Principal component analysis (PCA) [25] is the most well-

known one that finds a linear mapping by maximizing the

projected variances. For nonlinear DR techniques, isometric

feature mapping (Isomap) [47] and locally linear embedding

(LLE) [40] both exploit the manifold assumption to yield the

embeddings. And, to resolve the out-of-sample problem in

Isomap and LLE, locality preserving projections (LPP) [23] are

proposed to uncover the data manifold by a linear relaxation.

2.1.2 Supervised DR

Linear discriminant analysis (LDA) assumes that the data of

each class have a Gaussian distribution, and derives a

projection from simultaneously maximizing the between-

class scatter and minimizing the within-class sc atter.

Alternatively, marginal Fisher analysis (MFA) [54] and local

discriminant embedding (LDE) [9] adopt the assumption that

the data of each class spread as a submanifold, and seek a

discriminant embedding over these submanifolds.

2.1.3 Semi-Supervised DR

If the observed data are partially labeled, dimensionality

reduction can be performed by carrying out discriminant

analysis over the labeled ones while preserving the intrinsic

geometric structures of the remaining. Such techniques are

useful, say, for vision applications where user interactions are

involved, e.g., semi-supervised discriminant analysis (SDA) [6]

for content-based image retrieval with relevance feedback.

2.1.4 Kernelization

It is possible to kern elize a certain typ e of linear DR

techniques into nonlinear ones. As shown in [6], [9], [23],

[34], [41], [54], the kernelized versions generally can achieve

significant improvements. In addition, kernelization pro-

vides a convenient way for DR methods to handle data not

in vector form by specifying an associated kernel, e.g., the

pyramid matching kernel [21] for data in the form of bag-of-

features or the dissimilarity kernel [38] based on the pairwise

distances.

2.2 Graph Embedding

A number of dimensionality reduction methods focus on

modeling the pairwise relationships among data and utilize

graph-based structures. In particular, the framework of

graph embedding [54] provides a unified formulation for a

broad set of such DR algorithms. Let  ¼fx

2 IR

i¼1

the data set. A DR scheme accounted for by graph

embedding involves a complete graph G whose vertices

are over . A corresponding affinity matrix W ¼½w

2

NN

is used to record the edge weights that characterize

the similarity relationships between pairs of training

samples. Then, the optimal linear embedding v



2 IR

can

be obtained by solving



¼ arg min

XDX

v¼1; or

v¼1

XLX

v; ð1Þ

where X ¼½x

 x

 is the data matrix and L ¼

diagðW  11ÞW is the graph Laplacian of G. Depending

on the property of a problem, one of the two constraints in

(1) will be used in the optimization. If the first constraint is

chosen, a diagonal matrix D ¼½d

2IR

NN

is included for

scale normalization. Otherwise, another complete graph G

over  is required for the second constraint, where L

and

¼½w

2IR

NN

are, respectively, the graph Laplacian

and affinity matrix of G

. The optimization problem (1) has

an intuitive interpretation: v

X ¼½v

 v

 repre-

sents the projected data; graph Laplacian L (or L

)isto

explore the pairwise distances of the projected data, while

diagonal matrix D is to weightedly combine their distances

to the origin. More precisely, the meaning of (1) can be better

understood with the following equivalent problem:

min

i;j¼1

 v

ð2Þ

subject to

i¼1

¼ 1; or ð3Þ

i;j¼1

 v

¼ 1: ð4Þ

The constrained optimization problem (2) implies that only

distances to the origin or pairwise distances of projected

data (in the form of v

x) are modeled by the framework. By

specifying W and D (or W and W

), Yan et al. [54] show that

a set of dimensionality reduction methods, such as PCA

[25], LPP [23], LDA, and MFA [54] can be expressed by (1).

Clearly, LDE [9] and SDA [6] are also in the class of graph

embedding.

2.3 Multiple Kernel Learning

MKL refers to the process of learning a kernel machine with

multiple kernel functions or kernel matrices. Recent

research efforts on MKL, e.g., [1], [20], [29], [39], [45], have

shown that learning SVMs with multiple kernels not only

increases the accuracy but also enhances the interpretability

of the resulting classifiers. Our MKL formulation is to find

an optimal way to linearly combine the given kernels.

Suppose we have a set of base kernel functions fk

m¼1

(or

base kernel matrices fK

m¼1

). An ensemble kernel function k

(or an ensemble kernel matrix K) is then defined by

kðx

; x

Þ¼

m¼1



ðx

; x

Þ;

 0; ð5Þ

K ¼

m¼1



;

 0: ð6Þ

Consequently, an often-used MKL model from binary-class

data fðx

21Þg

i¼1

fðxÞ¼

i¼1



kðx

; xÞþb ð7Þ

i¼1



m¼1



ðx

; xÞþb: ð8Þ

Optimizing over both the coefficients f

i¼1

and f

m¼1

is one particular form of the MKL problems. Our approach

utilizes such an MKL optimization to yield more flexible

dimensionality reduction schemes for data in different

feature representations.

2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. X, XXXXXXX 2011

2.4 Dimensionalit y Reduction with Multiple Kernels

Our approach is related to the work of Kim et al. [27], where

learning an optimal kernel over a given convex set of

kernels is coupled with kernel Fisher discriminant analysis

(KFDA) for binary-class data. Motivated by their idea of

learning an opti mal kernel for improving the KFDA

performance, we instead consider establishing a general

framework of dimensionality reduction for data in various

feature representations via multiple kernel learning [32]. As

we will show later, MKL-DR can be used to conveniently

deal with image data depicted by different descriptors, and

effectively tackle not only supervised but also semi-

supervised and unsupervised learning tasks. To the best

of our knowledge, such a generalization of multiple kernel

learning is novel.

3THE MKL-DR FRAMEWORK

We first discuss the construction of base kernels from

multiple descriptors, and then explain how to integrate

them for dimensionality reduction. Finally, we present an

optimization procedure to complete the framework.

3.1 Kernel as a Unified Feature Representation

Consider again a data set  of N samples, and M kinds of

descriptors to characterize each sample. Let  ¼fx

i¼1

¼fx

i;m

m¼1

, and d

: X

X

! 0 [ IR

be the

distance function for data representation under the

mth descriptor. In general, the domains resulting from

distinct descriptors, e.g., feature vectors, histograms, or

bags of features, are different. To eliminate such variances

in representation, we express data under each descriptor as

a kernel matrix. There are several ways to accomplish this

goal, such as using the RBF kernel for data in the form of

vector or the pyramid match kernel [21] for data in the form

of bag-of-features. We may also convert pairwise distances

between data samples to a kernel matrix [50], [58]. By

coupling each representation with its corresponding dis-

tance function, we obtain a set of M dissimilarity-based

kernel matrices fK

m¼1

, where

ði; jÞ¼k

ðx

; x

Þ¼exp

d

ðx

i;m

; x

j;m





ð9Þ

and 

is a positive constant. Our use of dissimilarity-based

kernels is convenient and advantageous in solving visual

learning tasks, especially due to the fact that a number of well-

designed descriptors and their associated distance functions

have been introduced over the years. However, K

in (9) is

not always guaranteed to be positive semidefinite. Following

[58], we resolve this issue by first computing the smallest

eigenvalue of K

. Then, if it is negative, we add its absolute

value to the diagonal of K

. With (5), (6), and (9), determining

a set of optimal ensemble coefficients f

;

; ...;

g can

now be interpreted as finding appropriate weights for best

fusing the M feature representations.

Note that in our formulation, accessing the data is

restricted to referencing the resulting M kernels defined in

(9). The main advantage of doing so is that it enables our

approach to work with different descriptors and distance

functions, without the need to explicitly handle the

variations among the representations.

3.2 The MKL-DR Algorithm

Instead of designing a specific dimensionality reduction

algorithm, we choose to describe MKL-DR upon graph

embedding. This way we can emphasize the flexibility of

the proposed approach: If a dimensionality reduction

scheme is explained by graph embedding, then it will

also be extendible by MKL-DR to handle data in multiple

feature representations. Recall that there are two possible

types of constraints in graph embedding. For ease of

presentation, we discuss how to develop MKL-DR subject

to constraint (4). However, the derivation can be analo-

gously applied when using constraint (3).

Kernelization in MKL-DR is accomplished in a similar way

to that in kernel PCA [41], but with the key difference in using

multiple kernels fK

m¼1

. Suppose the ensemble kernel K in

MKL-DR is generated by linearly combining the base kernels

m¼1

as in (6). Let  : X!Fdenote the feature mapping

induced by K. Via , the training data can be implicitly

mapped to a high-dimensional Hilbert space, i.e.,

7! ðx

Þ; for i ¼ 1; 2; ...;N: ð10Þ

Since optimizing (1) or (2) can be reduced to solving the

eigenvalue problem XLX

v ¼ XL

v, it implies that an

optimal v lies in the span of training data, i.e.,

v ¼

n¼1



ðx

Þ: ð11Þ

To show that the underlying algorithm can be reformulated

in the form of inner product and accomplished in the new

feature space F , we observe that by plugging each mapped

sample ðx

Þ into (2), projection v would appear exclusively

in the form of v

ðx

Þ. Hence, it suffices to show that in

MKL-DR, v

ðx

Þ can be evaluated via the kernel trick

ðx

Þ¼

n¼1

m¼1





ðx

; x

Þ¼

ðiÞ

; ð12Þ

where

 ¼ 

 

½

2 IR

; ð13Þ

 ¼ 

 

½

2 IR

; ð14Þ

ðiÞ

ð1;iÞ  K

ð1;iÞ

ðN;iÞ  K

ðN;iÞ

2 IR

NM

: ð15Þ

With (2) and (12), we define the constrained optimization

problem for 1D MKL-DR as follows:

min

;

i;j¼1

k

ðiÞ

  

ðjÞ

k

ð16Þ

subject to

i;j¼1

k

ðiÞ

  

ðjÞ

k

¼ 1; ð17Þ



 0;m¼ 1; 2; ...;M: ð18Þ

The additional constraints in (18) arise from the use of the

ensemble kernel in (5) or (6), and are to ensure that the

resulting kernel K in MKL-DR is a nonnegative combina-

tion of base kernels.

LIN ET AL.: MULTIPLE KERNEL LEARNING FOR DIMENSIONALITY REDUCTION 3

Observe from (12) that the one-dimensional projection v

of MKL-DR is specified by a sample coefficient vector  and a

kernel weight vector . The two vectors, respectively,

account for the relative importance among the samples

and the base kernels in the construction of the projection.

To generalize the formulation to uncover a multidimen-

sional projection, we consider a set of P sample coefficient

vectors, denoted by

A ¼½



 

: ð19Þ

With A and , each 1D projection v

is determined by a

specific sample coefficient vector 

and the (shared) kernel

weight vector . The resulting projection V ¼½v

 v



will map samples to a P -dimensional euclidean space.

Analogously to the 1D case, a projected sample x

can be

written as

ðx

Þ¼A

ðiÞ

 2 IR

: ð20Þ

The optimization problem (16) can now be extended to

accommodate the multidimensional projection

min

A;

i;j¼1

ðiÞ

  A

ðjÞ

k

ð21Þ

subject to

i;j¼1

ðiÞ

  A

ðjÞ

k

¼ 1; ð22Þ



 0;m¼ 1; 2; ...;M: ð23Þ

Before specifying the details of how to solve the

constrained optimization problem (21) in the next section,

we give an illustration of the four kinds of spaces related to

MKL-DR and the connections among them in Fig. 1. The

four spaces, in order, are the input space of each feature

representation, the reproducing kernel Hilbert space

(RKHS) induced by each base kernel and the ensemble

kernel, and the projected euclidean space.

3.3 Optimization

Since direct optimization to (21) is difficult, we instead

adopt an iterative, two-step strategy to alternately optimize

A and . At each iteration, one of A and  is optimized

while the other is fixed, and then the roles of A and  are

switched. Iterations are repeated until convergence or a

maximum number of iterations is reached.

On optimizing A. By fixing  and using the property

kuk

¼ traceðuu

Þ for a column vector u, the optimization

problem (21) is reduced to

min

traceðA



AÞ

subject to traceðA



AÞ¼1;

ð24Þ

where



i;j¼1

ðIK

ðiÞ

 IK

ðjÞ

Þ

ðIK

ðiÞ

 IK

ðjÞ

; ð25Þ



i;j¼1

ðIK

ðiÞ

 IK

ðjÞ

Þ

ðIK

ðiÞ

 IK

ðjÞ

: ð26Þ

The optimization problem (24) is a trace ratio problem, i.e.,

min

traceðA



AÞ=traceðA



AÞ. Following [9] and [52],

one can obtain a closed-form solution by transforming

(24) into the corresponding ratio trace pro blem, i.e.,

min

trace½ðA



AÞ

1

ðA



AÞ.Consequently,thecol-

umns of the optimal A



¼½





 are the eigenvectors

corresponding to the first P smallest eigenvalues in



 ¼ S



: ð27Þ

On optimizing . By fixing A and kuk

¼ u

u, the

optimization problem (21) becomes

min





subject to 

 ¼ 1 and   0;

ð28Þ

where

i;j¼1

ðIK

ðiÞ

 IK

ðjÞ

ðIK

ðiÞ

 IK

ðjÞ

Þ; ð29Þ

i;j¼1

ðIK

ðiÞ

 IK

ðjÞ

ðIK

ðiÞ

 IK

ðjÞ

Þ: ð30Þ

The additional constraints   0 cause the optimization to

(28) no longer be formulatable as a generalized eigenvalue

4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. X, XXXXXXX 2011

Fig. 1. Four kinds of spaces in MKL-DR: (a) the input space of each feature representation, (b) the RKHS induced by each base kernel, (c) the RKHS

by the ensemble kernel, and (d) the projected euclidean space.

problem. Indeed, it now becomes a nonconvex quadrati-

cally constrained quadratic programming (QCQP) problem,

and is known to be hard to solve. We instead consider

solving its convex relaxation by adding an auxiliary

variable B of size M  M:

min

;B

traceðS

BÞð31Þ

subject to traceðS

BÞ¼1; ð32Þ

  0;m¼ 1; 2; ...;M; ð33Þ

1 

B

 0; ð34Þ

where e

in (33) is a column vector whose elements are 0

except that its mth element is 1, and the constraint in (34)

means that the square matrix is a positive semidefinite. The

optimization problem (31) is a semidefinite programming

(SDP) relaxation of the nonconvex QCQP problem (28), and

can be efficiently solved by SDP. One can verify the

equivalence between the two optimization problems (28)

and (31) by replacing the constraint (34) with B ¼ 

.In

view of that the constraint B ¼ 

is nonconvex, it is

relaxed to B  

. Applying the Schur complement

lemma, B  

can be equivalently expressed by the

constraint in (34). (Refer to [49] for the details.) Concerning

the computational complexity, we note that the numbers of

constraints and variables in (31) are, respectively, linear

and quadratic to M, the number of the adopted descrip-

tors. In practice, the value of M is often small. (M ¼ 4  10

in our experiments.) Thus, like most of the other DR

methods, the computational bottleneck of MKL-DR is still

in solving the generalized eigenvalue problems, whose

complexity is OðN

Þ.

Listed in Algorithm 1 (Fig. 2), the procedure of MKL-DR

requires an initial guess to either A or  in the alternating

optimization. We have tried two possibilities: 1)  is

initialized by setting all of its elements as 1 to equally

weight base kernels; 2) A is initialized by assuming

¼ I. In our empirical testing, the second initialization

strategy gives more stable performances and is thus

adopted in the experiments. Pertaining to the convergence

of the optimization procedure, since SDP relaxation has

been used, the values of the objective function are not

guaranteed to monotonically decrease throughout the

iterations. Still, the optimization procedures rapidly con-

verge after only a few iterations in all of our experiments.

3.4 Novel Sample Embedding

After accomplishing the training procedure of MKL-DR, we

are ready to project a testing sample, say z, into the learned

space of lower dimension by

z 7! A

ðzÞ

; where ð35Þ

ðzÞ

2 IR

NM

and IK

ðzÞ

ðn; mÞ¼k

ðx

; zÞ: ð36Þ

Depending on the applications, some postprocessing, such

as the nearest neighbor rule for classification or k-means

clustering for data grouping, is then applied to the projected

sample(s) to complete the task. In the remainder of this

paper, we specifically discuss three sets of experimental

results to demonstrate the effectiveness of MKL-DR,

including supervised learning for object categorization,

unsupervised learning for image clustering, and semi-

supervised learning for face recognition.

4EXPERIMENTAL RESULTS: SUPERVISED

LEARNING FOR OBJECT CATEGORIZATION

Applying MKL-DR to object categorization is appropriate

as the complexity of the task often requires the use of

multiple feature descriptors. And in our experiments, the

effectiveness of MKL-DR will be investigated through a

supervised learning formulation.

4.1 Data Set

The Caltech-101 data set [14], collected by Fei-Fei et al., is

used in our experiments for object categorization. It consists

of 101 object categories and one additional class of

background images. The total number of categories is 102,

and each category contains roughly 40 to 800 images.

Although each target object often appears in the central

region of an image, the large class number and the

substantial intraclass variations still make the data set very

challenging. Indeed, the data set provides a good test bed to

demonstrate the advantage of using multiple image

descriptors for complex recognition tasks. Note that as the

LIN ET AL.: MULTIPLE KERNEL LEARNING FOR DIMENSIONALITY REDUCTION 5

Fig. 2. Algorithm 1.

Multiple Kernel Learning for Dimensionality Reduction

Figures

Citations

The PASCAL Visual Object Classes Challenge

A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment

A survey of dimensionality reduction techniques.

Affinity aggregation for spectral clustering

Hi-Net: Hybrid-Fusion Network for Multi-Modal MR Image Synthesis

References

Distinctive Image Features from Scale-Invariant Keypoints

Principal Component Analysis

Nonlinear dimensionality reduction by locally linear embedding.

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

A global geometric framework for nonlinear dimensionality reduction.

Related Papers (5)

Nonlinear dimensionality reduction by locally linear embedding.

A global geometric framework for nonlinear dimensionality reduction.

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

Robust Face Recognition via Sparse Representation

Distinctive Image Features from Scale-Invariant Keypoints

Frequently Asked Questions (18)

Q1. What have the authors contributed in "Multiple kernel learning for dimensionality reduction" ?

Q2. What is the way to reduce dimensionality?

Q3. What is the problem with the convergence of the optimization procedure?

Q4. What are some of the DR methods that focus on modeling the pairwise relationships among data?

Q5. What is the way to solve the complexity of the task?

Q6. What is the distance function for data representation under the mth descriptor?

Q7. What is the simplest way to solve the constrained optimization problem?

Q8. What is the main reason why dimensionality reduction is an inherent part of the current research?

Q9. How many percent of the recognition rate is achieved by combining the pyramid matching kernel?

Q10. What is the main advantage of using the same descriptors and distance functions?

Q11. What are the main features of MKL-DR?

Q12. How is the process of performance evaluation redone?

Q13. What is the number of constraints and variables in (31)?

Q14. What is the way to record the edge weights?

Q15. How many percent of the recognition rate is achieved by Grauman and Darrell?

Q16. How can MKL-SDA boost the recognition rate?

Q17. How many categories are used for the image clustering experiments?

Q18. What are the two methods of discriminant embedding?