scispace - formally typeset
Search or ask a question

Showing papers on "Multiple kernel learning published in 2008"


Journal ArticleDOI
TL;DR: This paper derives necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, and proposes an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
Abstract: We consider the least-square regression problem with regularization by a block l1-norm, that is, a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the l1-norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic group selection consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model mis specification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covar iance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.

687 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: A localized multiple kernel learning (LMKL) algorithm using a gating model for selecting the appropriate kernel function locally and the kernel-based classifier are coupled and their optimization is done in a joint manner.
Abstract: Recently, instead of selecting a single kernel, multiple kernel learning (MKL) has been proposed which uses a convex combination of kernels, where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel over the whole input space. In this paper, we develop a localized multiple kernel learning (LMKL) algorithm using a gating model for selecting the appropriate kernel function locally. The localizing gating model and the kernel-based classifier are coupled and their optimization is done in a joint manner. Empirical results on ten benchmark and two bioinformatics data sets validate the applicability of our approach. LMKL achieves statistically similar accuracy results compared with MKL by storing fewer support vectors. LMKL can also combine multiple copies of the same kernel function localized in different parts. For example, LMKL with multiple linear kernels gives better accuracy results than using a single linear kernel on bioinformatics data sets.

293 citations


Posted Content
TL;DR: The extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the l1-norm or the block l1-norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

233 citations


Proceedings Article
08 Dec 2008
TL;DR: The extended level method is extended, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and applies it to multiple kernel learning, and overcomes the drawbacks of SILP and SD.
Abstract: We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i.e., Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91.9% of computational time over the SILP method and 70.3% over the SD method.

179 citations


Journal ArticleDOI
TL;DR: A novel hyperbolic framework for large-scale image visualization and interactive hypotheses assessment and a novel hierarchical boosting algorithm is developed to learn their ensemble classifiers hierarchically is developed.
Abstract: In this paper, we have developed a new scheme for achieving multilevel annotations of large-scale images automatically. To achieve more sufficient representation of various visual properties of the images, both the global visual features and the local visual features are extracted for image content representation. To tackle the problem of huge intraconcept visual diversity, multiple types of kernels are integrated to characterize the diverse visual similarity relationships between the images more precisely, and a multiple kernel learning algorithm is developed for SVM image classifier training. To address the problem of huge interconcept visual similarity, a novel multitask learning algorithm is developed to learn the correlated classifiers for the sibling image concepts under the same parent concept and enhance their discrimination and adaptation power significantly. To tackle the problem of huge intraconcept visual diversity for the image concepts at the higher levels of the concept ontology, a novel hierarchical boosting algorithm is developed to learn their ensemble classifiers hierarchically. In order to assist users on selecting more effective hypotheses for image classifier training, we have developed a novel hyperbolic framework for large-scale image visualization and interactive hypotheses assessment. Our experiments on large-scale image collections have also obtained very positive results.

152 citations


Proceedings Article
08 Dec 2008
TL;DR: In this article, the kernel decomposes into a large sum of individual basis kernels which can then be embedded in a directed acyclic graph, and it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels.
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms In this paper, we explore penalizing by sparsity-inducing norms such as the l1-norm or the block l1-norm We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance

137 citations


Journal ArticleDOI
TL;DR: A new effective multiple kernel learning algorithm that can maximally correlate the m views in the transformed coordinates and introduces a special term called Inter-Function Similarity Loss RIFSI into the existing regularization framework so as to guarantee the agreement of multiview outputs.
Abstract: In this paper, we develop a new effective multiple kernel learning algorithm. First, we map the input data into m different feature spaces by m empirical kernels, where each generated feature space is taken as one view of the input space. Then, through borrowing the motivating argument from Canonical Correlation Analysis (CCA) that can maximally correlate the m views in the transformed coordinates, we introduce a special term called Inter-Function Similarity Loss RIFSI. into the existing regularization framework so as to guarantee the agreement of multiview outputs. In implementation, we select the Modification of Ho-Kashyap algorithm with Squared approximation of the misclassification errors (MHKS) as the incorporated paradigm and the experimental results on benchmark data sets demonstrate the feasibility and effectiveness of the proposed algorithm named MultiK-MHKS.

128 citations


Proceedings ArticleDOI
24 Aug 2008
TL;DR: Experimental results show that the integration of multiple data sources leads to a considerable improvement in the prediction accuracy, and the proposed algorithm identifies biomarkers that play more significant roles than others in AD diagnosis.
Abstract: Effective diagnosis of Alzheimer's disease (AD) is of primary importance in biomedical research. Recent studies have demonstrated that neuroimaging parameters are sensitive and consistent measures of AD. In addition, genetic and demographic information have also been successfully used for detecting the onset and progression of AD. The research so far has mainly focused on studying one type of data source only. It is expected that the integration of heterogeneous data (neuroimages, demographic, and genetic measures) will improve the prediction accuracy and enhance knowledge discovery from the data, such as the detection of biomarkers. In this paper, we propose to integrate heterogeneous data for AD prediction based on a kernel method. We further extend the kernel framework for selecting features (biomarkers) from heterogeneous data sources. The proposed method is applied to a collection of MRI data from 59 normal healthy controls and 59 AD patients. The MRI data are pre-processed using tensor factorization. In this study, we treat the complementary voxel-based data and region of interest (ROI) data from MRI as two data sources, and attempt to integrate the complementary information by the proposed method. Experimental results show that the integration of multiple data sources leads to a considerable improvement in the prediction accuracy. Results also show that the proposed algorithm identifies biomarkers that play more significant roles than others in AD diagnosis.

117 citations


Journal Article
TL;DR: It is shown that the kernel learning problem in RKDA can be formulated as convex programs, and SDP formulations are proposed for the multi-class case, which leads naturally to QCQP and SILP formulations.
Abstract: Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multi-class case based on a key result established in this paper. That is, the multi-class RKDA kernel learning problem can be decomposed into a set of binary-class kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multi-class case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.

104 citations


Proceedings Article
01 Oct 2008
TL;DR: The results show two things: for many datasets there is no benefit in using MKL/IKL instead of the SVM classifier, thus the flexibility of using more than one kernel seems to be of no use, and on some datasets IKL yields massive increases in accuracy over SVM/MKL due to the possibility of using a largely increased kernel set.
Abstract: In this paper we build upon the Multiple Kernel Learning (MKL) framework and in particular on [2] which generalized it to infinitely many kernels . We rewrite the problem in the standard MKL formulation which leads to a Semi-Infinite Program. We devise a new algorithm to solve it (Infinite Kernel Learning, IKL). The IKL algorithm is applicable to both the finite and infinite case and we find it to be faster and more stable than SimpleMKL [8]. Furthermore we present the first large scale comparison of SVMs to MKL on a variety of benchmark datasets, also comparing IKL. The results show two things: a) for many datasets there is no benefit in using MKL/IKL instead of the SVM classifier, thus the flexibility of using more than one kernel seems to be of no use, b) on some datasets IKL yields massive increases in accuracy over SVM/MKL due to the possibility of using a largely increased kernel set. For those cases parameter selection through Cross-Validation or MKL is not applicable.

88 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: This paper considers a regularized SVM formulation, in which the indefinite kernel matrix is treated as a noisy observation of some unknown positive semidefinite one (proxy kernel) and the support vectors and the proxy kernel can be computed simultaneously.
Abstract: Similarity matrices generated from many applications may not be positive semidefinite, and hence can't fit into the kernel machine framework. In this paper, we study the problem of training support vector machines with an indefinite kernel. We consider a regularized SVM formulation, in which the indefinite kernel matrix is treated as a noisy observation of some unknown positive semidefinite one (proxy kernel) and the support vectors and the proxy kernel can be computed simultaneously. We propose a semi-infinite quadratically constrained linear program formulation for the optimization, which can be solved iteratively to find a global optimum solution. We further propose to employ an additional pruning strategy, which significantly improves the efficiency of the algorithm, while retaining the convergence property of the algorithm. In addition, we show the close relationship between the proposed formulation and multiple kernel learning. Experiments on a collection of benchmark data sets demonstrate the efficiency and effectiveness of the proposed algorithm.

Proceedings Article
08 Dec 2008
TL;DR: The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP) and an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem.
Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instance-label correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

Proceedings ArticleDOI
05 Jul 2008
TL;DR: This work proposes Composite Kernel Learning to address the situation where distinct components give rise to a group structure among kernels, and characterize the convexity of the learning problem, and provide a general wrapper algorithm for computing solutions.
Abstract: The Support Vector Machine (SVM) is an acknowledged powerful tool for building classifiers, but it lacks flexibility, in the sense that the kernel is chosen prior to learning. Multiple Kernel Learning (MKL) enables to learn the kernel, from an ensemble of basis kernels, whose combination is optimized in the learning process. Here, we propose Composite Kernel Learning to address the situation where distinct components give rise to a group structure among kernels. Our formulation of the learning problem encompasses several setups, putting more or less emphasis on the group structure. We characterize the convexity of the learning problem, and provide a general wrapper algorithm for computing solutions. Finally, we illustrate the behavior of our method on multi-channel data where groups correpond to channels.

Book ChapterDOI
10 Jun 2008
TL;DR: A method to combine the efficiency of single class localization with a subsequent decision process that works jointly for all given object classes is proposed by following a multiple kernel learning (MKL) approach and shows that the subsequent joint decision step clearly improves the accuracy compared to single class detection.
Abstract: Most current methods for multi-class object classification and localization work as independent 1-vs-rest classifiers. They decide whether and where an object is visible in an image purely on a per-class basis. Joint learning of more than one object class would generally be preferable, since this would allow the use of contextual information such as co-occurrence between classes. However, this approach is usually not employed because of its computational cost. In this paper we propose a method to combine the efficiency of single class localization with a subsequent decision process that works jointly for all given object classes. By following a multiple kernel learning (MKL) approach, we automatically obtain a sparse dependency graph of relevant object classes on which to base the decision. Experiments on the PASCAL VOC 2006 and 2007 datasets show that the subsequent joint decision step clearly improves the accuracy compared to single class detection.

Book ChapterDOI
15 Sep 2008
TL;DR: Retsch et al. as discussed by the authors utilized the multiclass support vector machine (m-SVM) method to directly solve protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems.
Abstract: Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. Here we utilize the multiclass support vector machine (m-SVM) method to directly solve protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. We further propose a general class of protein sequence kernels which considers all motifs, including motifs with gaps. Instead of heuristically selecting one or a few kernels from this family, we utilize a recent extension of SVMs that optimizes over multiple kernels simultaneously. This way, we automatically search over families of possible amino acid motifs. We compare our automated approach to three other predictors on four different datasets, and show that we perform better than the current state of the art. Further, our method provides some insights as to which sequence motifs are most useful for determining subcellular localization, which are in agreement with biological reasoning. Data files, kernel matrices and open source software are available at http://www.fml.mpg.de/raetsch/projects/protsubloc .

Proceedings ArticleDOI
13 Jul 2008
TL;DR: Two simple but effective methods for determining weights for conic combination of multiple kernels are proposed, to learn optimal weights formulated by the measure FSM for kernel matrix evaluation (feature space-basedkernel matrix evaluation measure), denoted by FSM-MKL.
Abstract: Complex biological data generated from various experiments are stored in diverse data types in multiple datasets. By appropriately representing each biological dataset as a kernel matrix then combining them in solving problems, the kernel-based approach has become a spotlight in data integration and its application in bioinformatics and other fields as well. While linear combination of unweighed multiple kernels (UMK) is popular, there have been effort on multiple kernel learning (MKL) where optimal weights are learned by semi-definite programming or sequential minimal optimization (SMO-MKL). These methods provide high accuracy of biological prediction problems, but very complicated and hard to use, especially for non-experts in optimization. These methods are also usually of high computational cost and not suitable for large data sets. In this paper, we propose two simple but effective methods for determining weights for conic combination of multiple kernels. The former is to learn optimal weights formulated by our measure FSM for kernel matrix evaluation (feature space-based kernel matrix evaluation measure), denoted by FSM-MKL. The latter assigns a weight to each kernel that is proportional to the quality of the kernel, determining by direct cross validation, named proportionally weighted multiple kernels (PWMK). Experimental comparative evaluation of the four methods UMK, SMO-MKL, FSM-MKL and PWMK for the problem of protein-protein interactions shows that our proposed methods are simpler, more efficient but still effective. They achieved performances almost as high as that of MKL and higher than that of UMK.

Posted Content
TL;DR: The authors used text from news articles to predict intraday price movements of financial assets using support vector machines and developed an analytic center cutting plane method to solve the kernel learning problem efficiently.
Abstract: We show how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features to increase classification performance and we develop an analytic center cutting plane method to solve the kernel learning problem efficiently. We observe that while the direction of returns is not predictable using either text or returns, their size is, with text features producing significantly better performance than historical returns alone.

Proceedings Article
08 Dec 2008
TL;DR: An approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR) is described, which is flexible in simultaneously tackling data in various feature representations and general in that it is established upon graph embedding.
Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

Proceedings ArticleDOI
12 May 2008
TL;DR: Several refinements to the standard maximum-margin scheme for speaker verification systems, including a regularisation term, are examined, allowing the appropriate level of sparsity to be selected.
Abstract: Many speaker verification (SV) systems combine multiple classifiers using score-fusion to improve system performance. For SVM classifiers, an alternative strategy is to combine at the kernel level. This involves finding a suitable kernel weighting, known as multiple kernel learning (MKL). Recently, an efficient maximum-margin scheme for MKL has been proposed. This work examines several refinements to this scheme for SV. The standard scheme has a known tendency towards sparse weightings, which may not be optimal for SV. A regularisation term is proposed, allowing the appropriate level of sparsity to be selected. Cross-speaker tying of kernel weights is also applied to improve robustness. Various combinations of dynamic kernels were evaluated, including derivative and parametric kernels based upon different model structures. The performance achieved on the NIST 2002 SRE when combining five kernels was 4.83% EER.

Proceedings ArticleDOI
24 Aug 2008
TL;DR: It is shown that the optimal subspace kernel can be obtained efficiently by solving an eigenvalue problem and an equivalent semi-infinite linear program (SILP) formulation which can be solved efficiently by the column generation technique.
Abstract: Kernel methods have been applied successfully in many data mining tasks. Subspace kernel learning was recently proposed to discover an effective low-dimensional subspace of a kernel feature space for improved classification. In this paper, we propose to construct a subspace kernel using the Hilbert-Schmidt Independence Criterion (HSIC). We show that the optimal subspace kernel can be obtained efficiently by solving an eigenvalue problem. One limitation of the existing subspace kernel learning formulations is that the kernel learning and classification are independent and the subspace kernel may not be optimally adapted for classification. To overcome this limitation, we propose a joint optimization framework, in which we learn the subspace kernel and subsequent classifiers simultaneously. In addition, we propose a novel learning formulation that extracts an uncorrelated subspace kernel to reduce the redundant information in a subspace kernel. Following the idea from multiple kernel learning, we extend the proposed formulations to the case when multiple kernels are available and need to be combined. We show that the integration of subspace kernels can be formulated as a semidefinite program (SDP) which is computationally expensive. To improve the efficiency of the SDP formulation, we propose an equivalent semi-infinite linear program (SILP) formulation which can be solved efficiently by the column generation technique. Experimental results on a collection of benchmark data sets demonstrate the effectiveness of the proposed algorithms.

Journal Article
TL;DR: In this paper, a multiple kernel learning (MKLKL) approach is used to combine the efficiency of single class localization with a subsequent decision process that works jointly for all given object classes.
Abstract: Most current methods for multi-class object classification and localization work as independent 1-vs-rest classifiers. They decide whether and where an object is visible in an image purely on a per-class basis. Joint learning of more than one object class would generally be preferable, since this would allow the use of contextual information such as co-occurrence between classes. However, this approach is usually not employed because of its computational cost. In this paper we propose a method to combine the efficiency of single class localization with a subsequent decision process that works jointly for all given object classes. By following a multiple kernel learning (MKL) approach, we automatically obtain a sparse dependency graph of relevant object classes on which to base the decision. Experiments on the PASCAL VOC 2006 and 2007 datasets show that the subsequent joint decision step clearly improves the accuracy compared to single class detection.

Journal ArticleDOI
TL;DR: A new multiclass classification method that reduces the multiclass problem to a single binary classifier (SBC) and indicates that it outperforms one-vs.-all, all-pairs and the error-correcting output coding scheme at least when the number of classes is small.

Journal ArticleDOI
TL;DR: This paper addresses the issue of learning an optimal diffusion kernel, in the form of a convex combination of a set of pre-specified kernels constructed from biological networks, for protein function prediction, and shows that the performance of linearly combined diffusion kernel is better than every single candidate diffusion kernel.
Abstract: Machine-learning tools have gained considerable attention during the last few years for analyzing biological networks for protein function prediction. Kernel methods are suitable for learning from graph-based data such as biological networks, as they only require the abstraction of the similarities between objects into the kernel matrix. One key issue in kernel methods is the selection of a good kernel function. Diffusion kernels, the discretization of the familiar Gaussian kernel of Euclidean space, are commonly used for graph-based data. In this paper, we address the issue of learning an optimal diffusion kernel, in the form of a convex combination of a set of pre-specified kernels constructed from biological networks, for protein function prediction. Most prior work on this kernel learning task focus on variants of the loss function based on Support Vector Machines (SVM). Their extensions to other loss functions such as the one based on Kullback-Leibler (KL) divergence, which is more suitable for mining biological networks, lead to expensive optimization problems. By exploiting the special structure of the diffusion kernel, we show that this KL divergence based kernel learning problem can be formulated as a simple optimization problem, which can then be solved efficiently. It is further extended to the multi-task case where we predict multiple functions of a protein simultaneously. We evaluate the efficiency and effectiveness of the proposed algorithms using two benchmark data sets. Results show that the performance of linearly combined diffusion kernel is better than every single candidate diffusion kernel. When the number of tasks is large, the algorithms based on multiple tasks are favored due to their competitive recognition performance and small computational costs.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: This paper shows that MKL problem with a enhanced spatial pyramid match kernel can be solved efficiently using projected gradient method, and demonstrates the algorithm on classification tasks, which is based on a linear combination of the proposed kernels computed at multiple pyramid levels of image encoding.
Abstract: Recent publications and developments based on SVM have shown that using multiple kernels instead of a single one can enhance interpretability of the decision function and improve classifier performance, which motivates researchers to explore the use of homogeneous model obtained as linear combinations of kernels. However, the use of multiple kernels faces the challenge of choosing the kernel weights, and an increased number of parameters that may lead to overfitting. In this paper we show that MKL problem with a enhanced spatial pyramid match kernel can be solved efficiently using projected gradient method. Weights on each kernel matrix (level) are included in the standard SVM empirical risk minimization problem with a L2 constraint to encourage sparsity. We demonstrate our algorithm on classification tasks, which is based on a linear combination of the proposed kernels computed at multiple pyramid levels of image encoding, and we show that the proposed method is accurate and significantly more efficient than current approaches.

Proceedings ArticleDOI
18 Jun 2008
TL;DR: A two stage multiple kernel learning algorithm is developed by incorporating sequential minimal optimization (SMO) with the gradient projection method, and experimental results show that the proposed approach outperforms single-kernel support vector clustering.
Abstract: Support vector clustering (SVC) has been successfully applied to solve multi-class classification problems. However, it is usually hard to determine the hyper-parameters of RBF kernel functions. A multiple kernel learning (MKL) algorithm is developed to solve this problem, by which the kernel matrix weights and Lagrange multipliers can be simultaneously obtained with semidefinite programming. However, the amount of time and space required is very demanding. We develop a two stage multiple kernel learning algorithm by incorporating sequential minimal optimization (SMO) with the gradient projection method. Experimental results on data sets from UCI and Statlog show that the proposed approach outperforms single-kernel support vector clustering.

Book ChapterDOI
09 Jan 2008
TL;DR: A novel hierarchical boosting algorithm is proposed by incorporating concept ontology and multi-task learning to achieve hierarchical image classifier training to enable automatic multi-level image annotation.
Abstract: In this paper, we have proposed a novel algorithm to achieve automatic multi-level image annotation by incorporating concept ontology and multitask learning for hierarchical image classifier training. To achieve more reliable image classifier training in high-dimensional heterogeneous feature space, a new algorithm is proposed by incorporating multiple kernels for diverse image similarity characterization, and a multiple kernel learning algorithm is developed to train the SVM classifiers for the atomic image concepts at the first level of the concept ontology. To enable automatic multi-level image annotation, a novel hierarchical boosting algorithm is proposed by incorporating concept ontology and multi-task learning to achieve hierarchical image classifier training.

Book ChapterDOI
06 May 2008
TL;DR: Empirical results on real biological data demonstrate that multiple kernel regression can improve accuracy and decrease model complexity by reducing the number of support vectors.
Abstract: The cell defense mechanism of RNA interference has applicationsin gene function analysis and human disease therapy. To effectivelysilence a target gene, it is desirable to select the initiator siRNA moleculeshaving satisfactory silencing capabilities. Computational prediction forsilencing efficacy of siRNAs can assist this screening process before usingthem in biological experiments. String kernel functions, which operatedirectly on the string objects representing siRNAs and target mRNAs,have been applied to support vector regression for the prediction and improvedaccuracy over numerical kernels in multidimensional vector spacesconstructed from descriptors of siRNA design rules. To fully utilize informationprovided by string and numerical kernels, we propose to unifythe two in the kernel feature space by devising a multiple kernel regressionframework where a linear combination of the kernels are used. Weformulate the multiple kernel learning into a quadratically constrainedquadratic programming (QCQP) problem, which although yields globaloptimal solution, is computationally inefficient and requires a commercialsolver package.We further propose three heuristics based on the principleof kernel-target alignment and predictive accuracy. Empirical results onreal biological data demonstrate that multiple kernel regression can improveaccuracy and decrease model complexity by reducing the numberof support vectors. In addition, multiple kernel regression gives insightsinto the kernel combination, which, for siRNA efficacy prediction, evaluatesthe relative significance of the design rules.

31 Dec 2008
TL;DR: This method attempts at finding the most compact ball amongst the 11 different feature representations of the CBIR task with relevance feedback using a novel 1- and 2-norm regularisation technique for the 1-class SVM under the MKL framework.
Abstract: This report presents a a novel Multiple Kernel Learning (MKL) algorithm for the 1-class support vector machine. The emphasis is placed on viewing the CBIR task with relevance feedback as a metric learning problem, where each image has 11 different feature extraction methods applied to it. Our method attempts at finding the most compact ball amongst the 11 different feature representations using a novel 1- and 2-norm regularisation technique for the 1-class SVM under the MKL framework. We also devise a simple way of including the set of negative examples whilst still utilising the 1-class SVM implementation.

Journal Article
TL;DR: The experimental results reveal that the time consumptions of training and testing are decreased and the classification efficiency is maintained at the same level as the origin after applying the cooperative clustering to multiple kernel SVM.
Abstract: Support vector machine based on multiple kernel learning is proposed due to the learning problems involve multiple and heterogeneous data sources,however,the increase of kernels will increase the computation of multiple kernel learning inevitably.To solve this problem,a new cluster method is presented,which is called cooperative clustering.Applying the cooperative clustering to multiple kernel SVM,the number of support vectors will be reduced,the time complexity of computation is also reduced.The experimental results reveal that the time consumptions of training and testing are decreased and the classification efficiency is maintained at the same level as the origin after applying our method.

Proceedings ArticleDOI
12 Jul 2008
TL;DR: This study reformulate the SDP problem to reduce the time and space requirements, and strategies for reducing the search space in solving the SSPD problem are introduced.
Abstract: Support vector machines (SVMs) have been successfully applied to classification problems. Practical issues Involve how to determine the right type and suitable hyperparameters of kernel functions. Recently, multiple-kernel learning (MKL) algorithms are developed to handle these issues by combining different kernels. The weight with each kernel in the combination is obtained through learning. One of the most popular methods is to learn the weights with semidefinite programming (SDP). However, the amount of time and space required by this method is demanding. In this study, we reformulate the SDP problem to reduce the time and space requirements. Strategies for reducing the search space in solving the SDP problem are introduced. Experimental results obtained from running on synthetic datasets and benchmark datasets of UCI and Statlog show that the proposed approach improves the efficiency of the SDP method without degrading the performance.