What are the contributions in "Multiple kernel learning, conic duality, and the smo algorithm" ?

The authors propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied. The authors present experimental results that show that their SMO-based algorithm is significantly more efficient than the general-purpose interior point methods available in current optimization toolboxes.

What have the authors stated for future works in "Multiple kernel learning, conic duality, and the smo algorithm" ?

The good scaling with respect to the number of data points makes it possible to learn kernels for large scale problems, while the good scaling with respect to the number of basis kernels opens up the possibility of application to largescale feature selection, in which the algorithm selects kernels that define non-linear mappings on subsets of input features.

What is the algorithm for learning kernels?

Their algorithm is based on applying sequential minimization techniques to a smoothed version of a convex nonsmooth optimization problem.

What is the main reason for the rise to prominence of the support vector machine?

One of the major reasons for the rise to prominence of the support vector machine (SVM) is its ability to cast nonlinear classification as a convex optimization problem, in particular a quadratic program (QP).

What is the optimality of the function J()?

Their stopping criterion, referred to as (ε1, ε2)optimality, requires that the ε1-subdifferential is within ε2 of zero, and that the usual KKT conditions are met.

What is the simplest way to check the optimality of a given?

Checking this sufficient condition is a linear programming (LP) existence problem, i.e., find η such that:η > 0, ηj = 0 if j /∈ Jε1(α), ∑ j d 2 jηj = 1(OPT3) max i∈IM∪I0−∪IC+{(K(η)D(y)α)i − yi}6 min i∈IM∪I0+∪IC−{(K(η)D(y)α)i − yi} + 2ε2,where K(η) = ∑j∈Jε1 (α) ηjKj .

what is the a priori bound on aj?

In this section, the authors show that if (aj) are small enough, then an ε2/2optimal solution of the MY-regularized SKM α, together with η̃(α), is an (ε1, ε2)-optimal solution of the SKM, and an a priori bound on (aj) is obtained that does not depend on the solution α.Theorem 1 Let 0 < ε < 1. Let y ∈ {−1, 1}n and Kj, j = 1, . . . ,m be m positive semidefinite kernel matrices.

What is the way to check the optimality of a given LP?

If in addition to having α, the authors know a potential candidate for η, then a sufficient condition for optimality is that this η verifies (OPT3), which doesn’t require solving the LP.

What is the inverse of the conic dual problem?

If the authors define the function G(α) asG(α) = minγ∈R+,µ∈Rm{ 12γ2 + 12 ∑ j (µj−γdj)2a2 j − ∑i αi, ||∑i αiyixji||2 6 µj ,∀j},then the dual problem is equivalent to minimizing G(α) subject to 0 6 α 6 C and α>y =

(Open Access) Multiple kernel learning, conic duality, and the SMO algorithm (2004) | Francis Bach

Q: What does the author mean by the title of the paper?

Q: What is the way to check the optimality of a given LP?

If in addition to having α, the authors know a potential candidate for η, then a sufficient condition for optimality is that this η verifies (OPT3), which doesn’t require solving the LP.

Multiple Kernel Learning, Conic Duality, and the SMO Algorithm

Francis R. Bach & Gert R. G. Lanckriet {fbach,gert}@cs.berkeley.edu

Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720, USA

Michael I. Jordan jordan@cs.berkeley.edu

Computer Science Division and Department of Statistics, University of California, Berkeley, CA 94720, USA

Abstract

While classical kernel-based classiﬁers are

based on a single kernel, in practice it is often

desirable to base classiﬁers on combinations

of multiple kernels. Lanckriet et al. (2004)

considered conic combinations of kernel ma-

trices for the support vector machine (SVM),

and showed that the optimization of the co-

eﬃcients of such a combination reduces to

a convex optimization problem known as a

quadratically-constrained quadratic program

(QCQP). Unfortunately, current convex op-

timization toolboxes can solve this problem

only for a small number of kernels and a

small number of data points; moreover, the

sequential minimal optimization (SMO) tech-

niques that are essential in large-scale imple-

mentations of the SVM cannot be applied be-

cause the cost function is non-diﬀerentiable.

We propose a novel dual formulation of

the QCQP as a second-order cone program-

ming problem, and show how to exploit the

technique of Moreau-Yosida regularization to

yield a formulation to which SMO techniques

can be applied. We present experimental re-

sults that show that our SMO-based algo-

rithm is signiﬁcantly more eﬃcient than the

general-purpose interior point methods avail-

able in current optimization toolboxes.

1. Introduction

One of the major reasons for the rise to prominence

of the support vector machine (SVM) is its ability to

cast nonlinear classiﬁcation as a convex optimization

problem, in particular a quadratic program (QP). Con-

App earing in Proceedings of the 21

International Confer-

ence on Machine Learning, Banﬀ, Canada, 2004. Copyright

2004 by the ﬁrst author.

vexity implies that the solution is unique and br ings a

suite of standard numerical software to bear in ﬁnding

the solution. Convexity alone, however, does not imply

that the available algorithms scale well to problems of

interest. Indeed, oﬀ-the-shelf algorithms do not suﬃce

in large-scale applications of the SVM, and a second

major reason for the rise to prominence of the SVM

is the development of special-purpose algorithms for

solving the QP (Platt, 1998; Joachims, 1998; Keerthi

et al., 2001).

Recent developments in the literature on the SVM

and other kernel methods have emphasized the need

to consider multiple kernels, or parameterizations of

kernels, and not a single ﬁxed kernel. This provides

needed ﬂexibility and also reﬂects the fact that prac-

tical learning problems often involve multiple, hetero-

geneous data sources. While this so-called “multiple

kernel learning” problem can in principle be solved via

cross-validation, several recent papers have focused on

more eﬃcient methods for kernel learning (Chapelle

et al., 2002; Grandvalet & Canu, 2003; Lanckriet et al.,

2004; Ong et al., 2003). In this paper we focus on the

framework proposed by Lanckriet et al. (2004), which

involves joint optimization of the co eﬃcients in a conic

combination of kernel matrices and the coeﬃcients of

a discriminative classiﬁer. In the SVM setting, this

problem turns out to again be a convex optimization

problem—a quadratically-constrained quadratic pro-

gram (QCQP). This problem is more challenging than

a QP, but it can also be solved in principle by general-

purpose optimization toolboxes such as Mosek (Ander-

sen & Andersen, 2000). Again, however, this existing

algorithmic solution suﬃces only for small problems

(small numbers of kernels and data points), and im-

proved algorithmic solutions akin to sequential mini-

mization optimization (SMO) are needed.

While the multiple kernel learning problem is convex,

it is also non-smooth—it can be cast as the minimiza-

tion of a non-diﬀerentiable function subject to linear

constraints (see Section 3.1). Unfortunately, as is well

known in the non-smooth optimization literature, this

means that simple local descent algorithms such as

SMO may fail to converge or may converge to incor-

rect values (Bertsekas, 1995). Indeed, in preliminary

attempts to solve the QCQP using SMO we ran into

exactly these convergence problems.

One class of solutions to non-smooth optimization

problems involves constructing a smooth approximate

problem out of a non-smooth problem. In particu-

lar, Moreau-Yosida (MY) regularization is an eﬀec-

tive general solution methodology that is based on

inf-convolution (Lemarechal & Sagastizabal, 1997). It

can be viewed in terms of the dual problem as simply

adding a quadratic regularization term to the dual ob-

jective function. Unfortunately, in our setting, this

creates a new diﬃculty—we lose the sparsity that

makes the SVM amenable to SMO optimization. In

particular, the QCQP formulation of Lanckriet et al.

(2004) does not lead to an MY-regularized problem

that can be solved eﬃciently by SMO techniques.

In this paper we show how these problems can be re-

solved by considering a novel dual formulation of the

QCQP as a second-order cone programming (SOCP)

problem. This new formulation is of interest on its

own merit, because of various connections to existing

algorithms. In particular, it is closely related to the

classical maximum margin formulation of the SVM,

diﬀering only by the choice of the norm of the in-

verse margin. Moreover, the KKT conditions arising

in the new formulation not only lead to support vec-

tors as in the classical SVM, but also to a dual notion

of “support kernels”—those kernels that are active in

the conic combination. We thus refer to the new for-

mulation as the support kernel machine (SKM).

As we will show, the conic dual problem deﬁning the

SKM is exactly the multiple kernel learning problem

of Lanckriet et al. (2004).

Moreover, given this new

formulation, we can design a Moreau-Yosida regular-

ization which preserves the sparse SVM structure, and

therefore we can apply SMO techniques.

Making this circle of ideas precise requires a number

of tools from convex analysis. In particular, Section 3

deﬁnes appropriate approximate optimality conditions

for the SKM in terms of subdiﬀerentials and approxi-

mate subdiﬀerentials. These conditions are then used

in Section 4 in the design of an MY regularization for

the SKM and an SMO-based algorithm. We present

It is worth noting that this dual problem cannot be

obtained directly as the Lagrangian dual of the QCQP

problem—Lagrangian duals of QCQPs are semideﬁnite

programming problems.

the results of numerical experiments with the new

method in Section 5.

2. Learning the kernel matrix

In this section, we (1) begin with a brief review

of the multiple kernel learning problem of Lanckriet

et al. (2004), (2) introduce the support kernel ma-

chine (SKM), and (3) show that the dual of the SKM

is equivalent to the multiple kernel learning primal.

2.1. Multiple kernel learning problem

In the multiple kernel learning problem, we assume

that we are given n data points (x

, y

), where x

∈ X

for some input space X, and where y

∈ {−1, 1}. We

also assume that we are given m matrices K

∈ R

n×n

which are assumed to be symmetric positive semidef-

inite (and might or might not be obtained from eval-

uating a kernel function on the data {x

}). We con-

sider the problem of learning the best linear combi-

nation

j=1

of the kernels K

with nonnega-

tive coeﬃcients η

> 0 and with a trace constraint

j=1

tr K

= c, where c > 0 is

ﬁxed. Lanckriet et al. (2004) show that this setup

yields the following optimization problem:

min ζ − 2e

(L) w.r.t. ζ ∈ R, α ∈ R

s.t. 0 6 α 6 C, α

y = 0

D(y)K

D(y)α 6

tr K

ζ, j ∈ {1, . . . , m},

where D(y) is the diagonal matrix with diagonal y, e ∈

the vector of all ones, and C a positive constant.

The coeﬃcients η

are recovered as Lagrange multipli-

ers for the constraints α

D(y)K

D(y)α 6

tr K

ζ.

2.2. Support kernel machine

We now introduce a novel classiﬁcation algorithm that

we refer to as the “support kernel machine” (SKM).

It will be motivated as a block-based variant of the

SVM and related margin-based classiﬁcation algo-

rithms. But our underlying motivation is the fact that

the dual of the SKM is exactly the problem (L). We

establish this equivalence in the following section.

2.2.1. Linear classification

In this section we let X = R

. We also assume we are

given a decomposition of R

as a product of m blocks:

= R

× ··· × R

, so that each data point x

can

be decomposed into m block components, i.e. x

, . . . , x

), where each x

is in general a vector.

The goal is to ﬁnd a linear classiﬁer of the form

y = sign(w

x + b) where w has the same block de-

composition w = (w

, . . . , w

) ∈ R

+···+k

. In the

spirit of the soft margin SVM, we achieve this by min-

imizing a linear combination of the inverse of the mar-

gin and the training error. Various norms can be used

to combine the two terms, and indeed many diﬀerent

algorithms have been explored for various combina-

tions of `

-norms and `

-norms. In this paper, our

goal is to encourage the sparsity of the vector w at

the level of blocks; in particular, we want most of its

(multivariate) components w

to be zero. A natural

way to achieve this is to penalize the `

-norm of w.

Since w is deﬁned by blocks, we minimize the square

of a weighted block `

-norm, (

j=1

||w

)

, where

within every block, an `

-norm is used. Note that a

standard `

-based SVM is obtained if we minimize the

square of a block `

-norm,

j=1

||w

, which corre-

sponds to ||w||

, i.e., ignoring the block structure. On

the other hand, if m = k and d

= 1, we minimize the

square of the `

-norm of w, which is very similar to

the LP-SVM proposed by Bradley and Mangasarian

(1998). The primal pr oblem for the SKM is thus:

min

(

j=1

||w

)

+ C

i=1

(P ) w.r.t. w ∈ R

× ··· × R

, ξ ∈ R

, b ∈ R

s.t. y

(

+ b) > 1 − ξ

, ∀i∈{1, . . . , n}.

2.2.2. Conic duality and optimality

conditions

For a given optimization problem there are many ways

of deriving a dual problem. In our particular case,

we treat problem (P ) as a second-order cone program

(SOCP) (Lobo et al., 1998), which yields the following

dual (see Appendix A for the derivation):

min

− α

(D) w.r.t. γ ∈ R, α ∈ R

s.t. 0 6 α 6 C, α

y = 0

6 d

γ, ∀j ∈ {1, . . . , m}.

In addition, the Karush-Kuhn-Tucker (KKT) optimal-

ity conditions give the following complementary slack-

ness equations:

(a) α

(

+ b) − 1 + ξ

) = 0, ∀i

(b) (C − α

)ξ

= 0, ∀i

(c)



||w





−



= 0, ∀j

(d) γ(

− γ) = 0.

Equations (a) and (b) are the same as in the classi-

cal SVM, where they deﬁne the notion of a “support

vector.” That is, at the optimum, we can divide the

w’

Figure 1. Orthogonality of elements of the second-order

cone K

= {w = (u, v), u ∈ R

, v ∈ R, ||u||

6 v}: two ele-

ments w, w

of K

are orthogonal and nonzero if and only

if they belong to the boundary and are anti-proportional.

data points into three disjoint sets: I

= {i, α

= 0},

= {i, α

∈ (0, C)}, and I

= {i, α

= C}, such that

points belonging to I

are correctly classiﬁed points

not on the margin and such that ξ

= 0; points in

are correctly classiﬁed points on the margin such

that ξ

= 0 and y

(

+ b) = 1, and points

in I

are points on the “wrong” side of the margin

for which ξ

> 0 (incorrectly classiﬁed if ξ

> 1) and

(

+ b) = 1 −ξ

. The points whose indices i

are in I

or I

are the support vectors.

While the KKT conditions (a) and (b) refer to the in-

dex i over data points, the KKT conditions (c) and (d)

refer to the index j over components of the input vec-

tor. These conditions thus imply a form of sparsity not

over data points but over “input dimensions.” Indeed,

two non-zero elements (u, v) and (u

, v

) of a second-

order cone K

= {(u, v) ∈ R

× R, ||u||

6 v} are or-

thogonal if and only if they both belong to the bound-

ary, and they are “anti-proportional” (Lobo et al.,

1998); that is, ∃η > 0 such that ||u||

= v, ||u

, (u, v) = η(−u

, v

) (see Figure 1).

Thus, if γ > 0, we have:

• if ||

< d

γ, then w

= 0,

• if ||

= d

γ, then ∃η

> 0, such that

=η

, ||w

=η

γ.

Sparsity thus emerges from the optimization prob-

lem. Let J denote the set of active dimensions, i.e.,

J(α, γ) = {j : ||

= d

γ}. We can rewrite

the optimality conditions as

∀j, w

= η

, with η

= 0 if j /∈ J.

Equation (d) implies that γ =

||w

(η

γ), which in turn implies

j∈J

= 1.

2.2.3. Kernelization

We now remove the assumption that X is a Euclidean

space, and consider embeddings of the data points x

in a Euclidean space via a mapping φ : X → R

. In

correspondence with our block-based formulation of

the classiﬁcation problem, we assume that φ(x) has m

distinct block components φ(x) = (φ

(x), . . . , φ

(x)).

Following the usual recipe for kernel methods, we as-

sume that this embedding is performed implicitly, by

specifying the inner product in R

using a kernel func-

tion, which in this case is the sum of individual kernel

functions on each block component:

k(x

, x

) = φ(x

)

φ(x

) =

s=1

)

s=1

, x

We now “kernelize” the problem (P ) using this ker-

nel function. I n particular, we consider the dual of

(P ) and substitute the kernel function for the inner

products in (D):

min

− e

) w.r.t. γ ∈ R, α ∈ R

s.t. 0 6 α 6 C, α

y = 0

(α

D(y)K

D(y)α)

1/2

6 γd

, ∀j,

where K

is the j-th Gram matrix of the points {x

}

corresponding to the j-th kernel.

The sparsity that emerges via the KKT conditions (c)

and (d) now refers to the kernels K

, and we refer

to the kernels with nonzero η

as “support kernels.”

The resulting classiﬁer has the same form as the SVM

classiﬁer, but is based on the kernel matrix combina-

tion K =

, which is a sparse combination of

“support kernels.”

2.3. Equivalence of the two formulations

By simply taking d

tr K

, we see that problem

) and (L) are indeed equivalent—thus the dual of

the SKM is the multiple kernel learning primal. Care

must be taken here though—the weights η

are deﬁned

for (L) as Lagrange multipliers and for (D

) through

the anti-proportionality of orthogonal elements of a

second-order cone, and a priori they might not coin-

cide: although (D

) and (L) are equivalent, their dual

problems have diﬀerent formulations. It is straightfor-

ward, however, to write the KKT optimality condi-

tions for (α, η) for both problems and verify that they

are indeed equivalent. One direct consequence is that

for an optimal pair (α, η), α is an optimal solution of

the SVM with kernel matrix

3. Optimality conditions

In this section, we formulate our problem (in either

of its two equivalent forms) as the minimization of

a non-diﬀerentiable convex function subject to linear

constraints. Exact and approximate optimality condi-

tions are then readily derived using subdiﬀerentials. In

later sections we will show how these conditions lead

to an MY-regularized algorithmic formulation that will

be amenable to SMO techniques.

3.1. Max-function formulation

A rearrangement of the problem (D

) yields an equiv-

alent formulation in which the quadratic constraints

are moved into the objective function:

min max

D(y)K

D(y)α − α

(S) w.r.t. α ∈ R

s.t. 0 6 α 6 C, α

y = 0.

We let J

(α) denote

D(y)K

D(y)α − α

e and

J(α) = max

(α). Problem (S) is the minimization

of the non-diﬀerentiable convex function J(α) subject

to linear constraints. Let J(α) be the set of active

kernels, i.e., the set of indices j such that J

(α) =

J(α). We let F

(α) ∈ R

denote the gradient of J

that is, F

∂J

∂α

D(y)K

D(y)α − e.

3.2. Optimality conditions and subdiﬀerential

Given any function J(α), the subdiﬀerential of J at α

∂J(α) is deﬁned as (Bertsekas, 1995):

∂J(α) = {g ∈ R

, ∀α

, J(α

) > J(α) + g

(α

− α)}.

Elements of the subdiﬀerential ∂J(α) are called sub-

gradients. When J is convex and diﬀerentiable at α,

then the subdiﬀerential is a singleton and reduces to

the gradient. The notion of subdiﬀerential is especially

useful for characterizing optimality conditions of non-

smooth problems (Bertsekas, 1995).

The function J(α) deﬁned in the earlier s ection is a

pointwise maximum of convex diﬀerentiable functions,

and using subgradient calculus we can easily see that

the subdiﬀerential ∂J(α) of J at α is equal to the

convex hull of the gradients F

of J

for the active

kernels. That is:

∂J(α) = convex hull{F

(α), j ∈ J(α)}.

The Lagrangian for (S) is equal to L(α) = J(α) −

α+ ξ

(α−Ce) +bα

y, where b ∈ R, ξ, δ ∈ R

, and

the global minimum of L(α, δ, ξ, b) with respect to α

is characterized by the equation

0 ∈ ∂L(α) ⇔ δ −ξ − by ∈ ∂J(α).

The optimality conditions are thus the following:

α, (b, δ, ξ) is a pair of optimal primal/dual variables

if and only if:

δ − ξ − by ∈ ∂J(α)

(OP T

) ∀i, δ

= 0, ξ

(C − α

) = 0

y = 0, 0 6 α 6 C.

As before, we deﬁne I

(α) = {i, 0 < α

< C},

(α) = {i, α

= 0}, I

(α) = {i, α

= C}. We also de-

ﬁne, following (Keerthi et al., 2001), I

= I

∩{i, y

1} and I

0−

= I

∩{i, y

= −1}, I

C+

= I

∩{i, y

= 1},

C−

= I

∩ {i, y

= −1}. We can then rewrite the

optimality conditions as

ν − be = D(y)

j∈J (α)

(α)

η > 0,

= 1

(OP T

) ∀i ∈ I

∪ I

C−

, ν

> 0

∀i ∈ I

∪ I

C−

, ν

6 0.

3.3. Approximate optimality conditions

Exact optimality conditions such as (OP T

) or

(OP T

) are generally not suitable for numerical op-

timization. In non-smooth optimization theory, one

instead formulates optimality criteria in terms of the

ε-subdiﬀerential, which is deﬁned as

∂

J(α) = {g ∈ R

, ∀α

, J(α

) > J(α)−ε+g

(α

−α)}.

When J(α) = max

(α), then the ε-subdiﬀerential

contains (potentially strictly) the convex hull of the

gradients F

(α), for all ε-active functions, i.e., for all

j such that max

(α) − ε 6 J

(α). We let J

(α)

denote the set of all such kernels. So, we have C

(α) =

convex hull{F

(α), j ∈ J

(α)} ⊆ ∂

J(α).

Our stopping criterion, referred to as (ε

, ε

optimality, requires that the ε

-subdiﬀerential is

within ε

of zero, and that the usual KKT conditions

are met. That is, we stop whenever there exist ν, b, g

such that

g ∈ ∂

J(α)

(OP T

) ∀i ∈ I

∪ I

C−

, ν

> 0

∀i ∈ I

∪ I

C−

, ν

6 0

||ν − be − D(y)g||

∞

6 ε

Note that for one kernel, i.e., when the SKM re-

duces to the SVM, this corresponds to the approxi-

mate KKT conditions usually employed for the stan-

dard SVM (Platt, 1998; Keerthi et al., 2001; Joachims,

1998). For a given α, checking optimality is hard, since

even computing ∂

J(α) is hard in closed form. How-

ever, a suﬃcient condition for optimality can b e ob-

tained by using the inner approximation C

(α) of this

-subdiﬀerential, i.e., the convex hull of gradients of

-active kernels. Checking this suﬃcient condition

is a linear programming (LP) existence problem, i.e.,

ﬁnd η such that:

η > 0, η

= 0 if j /∈ J

(α),

= 1

(OP T

) max

i∈I

∪I

0−

∪I

C+

{(K(η)D(y)α)

− y

}

6 min

i∈I

∪I

C−

{(K(η)D(y)α)

− y

} + 2ε

where K(η) =

j∈J

(α)

. Given α, we can de-

termine whether it is (ε

, ε

)-optimal by solving the

potentially large LP (OP T

). If in addition to having

α, we know a potential candidate for η, then a suf-

ﬁcient condition for optimality is that this η veriﬁes

(OP T

), which doesn’t require solving the LP. Indeed,

the iterative algorithm that we present in Section 4

outputs a pair (α, η) and only these suﬃcient optimal-

ity conditions need to be checked.

3.4. Improving sparsity

Once we have an approximate solution, i.e., values α

and η that satisfy (OP T

), we can ask whether η can

be made sparser. Indeed, if some of the kernels are

close to identical, then some of the η’s can potentially

be removed—for a general SVM, the optimal α is not

unique if data points coincide, and for a general SKM,

the optimal α and η are not unique if data points or

kernels coincide. When searching for the minimum

-norm η which satisﬁes the constraints (OP T

), we

can thus consider a simple heuristic approach where

we loop through all the nonzero η

and check whether

each such component can be removed. That is, for all

j ∈ J

(α), we force η

to zero and solve the LP. If it

is feasible, then the j-th kernel can be removed.

4. Regularized support kernel machine

The function J(α) is convex but not diﬀerentiable.

It is well known that in this situation, steepest de-

scent and coordinate descent methods do not necessar-

ily converge to the global optimum (Bertsekas, 1995).

SMO unfortunately falls into this class of methods.

Therefore, in order to develop an SMO-like algorithm

for the SKM, we make use of Moreau-Yosida regu-

larization. In our s peciﬁc case, this simply involves

adding a second regularization term to the objective

function of the SKM, as follows:

min

(

||w

)

||w

+ C

(R) w.r.t. w ∈ R

× ··· × R

, ξ ∈ R

, b ∈ R

s.t. y

(

+ b) > 1 − ξ

, ∀i ∈ {1, . . . , n},

where (a

) are the MY-regularization parameters.

Multiple kernel learning, conic duality, and the SMO algorithm

Figures

Citations

Semi-Supervised Learning

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.

Automated Flower Classification over a Large Number of Classes

Multiple Kernel Learning Algorithms

Representing shape with a spatial pyramid kernel

References

Nonlinear Programming

Fast training of support vector machines using sequential minimal optimization, advances in kernel methods

Fast training of support vector machines using sequential minimal optimization

Algorithms for Minimization Without Derivatives

Learning the Kernel Matrix with Semidefinite Programming

Related Papers (5)

Learning the Kernel Matrix with Semidefinite Programming

Kernel Methods for Pattern Analysis

LIBSVM: A library for support vector machines

The Nature of Statistical Learning Theory

Statistical learning theory

Frequently Asked Questions (11)

Q1. What are the contributions in "Multiple kernel learning, conic duality, and the smo algorithm" ?

Q2. What have the authors stated for future works in "Multiple kernel learning, conic duality, and the smo algorithm" ?

Q3. What is the algorithm for learning kernels?

Q4. What is the main reason for the rise to prominence of the support vector machine?

Q5. What is the optimality of the function J()?

Q6. What is the simplest way to check the optimality of a given?

Q7. what is the a priori bound on aj?

Q8. What does the author mean by the title of the paper?

Q9. What is the way to check the optimality of a given LP?

Q10. What is the difference between a multiple kernel learning problem and a quadratic program?

Q11. What is the inverse of the conic dual problem?