Multiple kernel learning, conic duality, and the SMO algorithm
read more
Citations
Semi-Supervised Learning
On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.
Automated Flower Classification over a Large Number of Classes
Multiple Kernel Learning Algorithms
Representing shape with a spatial pyramid kernel
References
Fast training of support vector machines using sequential minimal optimization, advances in kernel methods
Fast training of support vector machines using sequential minimal optimization
Algorithms for Minimization Without Derivatives
Learning the Kernel Matrix with Semidefinite Programming
Related Papers (5)
Frequently Asked Questions (11)
Q2. What have the authors stated for future works in "Multiple kernel learning, conic duality, and the smo algorithm" ?
The good scaling with respect to the number of data points makes it possible to learn kernels for large scale problems, while the good scaling with respect to the number of basis kernels opens up the possibility of application to largescale feature selection, in which the algorithm selects kernels that define non-linear mappings on subsets of input features.
Q3. What is the algorithm for learning kernels?
Their algorithm is based on applying sequential minimization techniques to a smoothed version of a convex nonsmooth optimization problem.
Q4. What is the main reason for the rise to prominence of the support vector machine?
One of the major reasons for the rise to prominence of the support vector machine (SVM) is its ability to cast nonlinear classification as a convex optimization problem, in particular a quadratic program (QP).
Q5. What is the optimality of the function J()?
Their stopping criterion, referred to as (ε1, ε2)optimality, requires that the ε1-subdifferential is within ε2 of zero, and that the usual KKT conditions are met.
Q6. What is the simplest way to check the optimality of a given?
Checking this sufficient condition is a linear programming (LP) existence problem, i.e., find η such that:η > 0, ηj = 0 if j /∈ Jε1(α), ∑ j d 2 jηj = 1(OPT3) max i∈IM∪I0−∪IC+{(K(η)D(y)α)i − yi}6 min i∈IM∪I0+∪IC−{(K(η)D(y)α)i − yi} + 2ε2,where K(η) = ∑j∈Jε1 (α) ηjKj .
Q7. what is the a priori bound on aj?
In this section, the authors show that if (aj) are small enough, then an ε2/2optimal solution of the MY-regularized SKM α, together with η̃(α), is an (ε1, ε2)-optimal solution of the SKM, and an a priori bound on (aj) is obtained that does not depend on the solution α.Theorem 1 Let 0 < ε < 1. Let y ∈ {−1, 1}n and Kj, j = 1, . . . ,m be m positive semidefinite kernel matrices.
Q8. What does the author mean by the title of the paper?
Copyright 2004 by the first author.vexity implies that the solution is unique and brings a suite of standard numerical software to bear in finding the solution.
Q9. What is the way to check the optimality of a given LP?
If in addition to having α, the authors know a potential candidate for η, then a sufficient condition for optimality is that this η verifies (OPT3), which doesn’t require solving the LP.
Q10. What is the difference between a multiple kernel learning problem and a quadratic program?
While the multiple kernel learning problem is convex, it is also non-smooth—it can be cast as the minimization of a non-differentiable function subject to linearconstraints (see Section 3.1).
Q11. What is the inverse of the conic dual problem?
If the authors define the function G(α) asG(α) = minγ∈R+,µ∈Rm{ 12γ2 + 12 ∑ j (µj−γdj)2a2 j − ∑i αi, ||∑i αiyixji||2 6 µj ,∀j},then the dual problem is equivalent to minimizing G(α) subject to 0 6 α 6 C and α>y =