Showing papers on "Multiple kernel learning published in 2008"

PDF

Open Access

Journal Article•DOI•

Consistency of the Group Lasso and Multiple Kernel Learning

[...]

01 Jun 2008-Journal of Machine Learning Research

TL;DR: This paper derives necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, and proposes an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.

...read moreread less

Abstract: We consider the least-square regression problem with regularization by a block l1-norm, that is, a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the l1-norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic group selection consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model mis specification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covar iance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.

...read moreread less

687 citations

Proceedings Article•DOI•

Localized multiple kernel learning

[...]

Mehmet Gönen¹, Ethem Alpaydin¹•Institutions (1)

Boğaziçi University¹

05 Jul 2008

TL;DR: A localized multiple kernel learning (LMKL) algorithm using a gating model for selecting the appropriate kernel function locally and the kernel-based classifier are coupled and their optimization is done in a joint manner.

...read moreread less

Abstract: Recently, instead of selecting a single kernel, multiple kernel learning (MKL) has been proposed which uses a convex combination of kernels, where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel over the whole input space. In this paper, we develop a localized multiple kernel learning (LMKL) algorithm using a gating model for selecting the appropriate kernel function locally. The localizing gating model and the kernel-based classifier are coupled and their optimization is done in a joint manner. Empirical results on ten benchmark and two bioinformatics data sets validate the applicability of our approach. LMKL achieves statistically similar accuracy results compared with MKL by storing fewer support vectors. LMKL can also combine multiple copies of the same kernel function localized in different parts. For example, LMKL with multiple linear kernels gives better accuracy results than using a single linear kernel on bioinformatics data sets.

...read moreread less

293 citations

Posted Content•

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

[...]

Francis Bach¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

09 Sep 2008-arXiv: Learning

TL;DR: The extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

...read moreread less

Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the l1-norm or the block l1-norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

...read moreread less

233 citations

Proceedings Article•

An Extended Level Method for Efficient Multiple Kernel Learning

[...]

Zenglin Xu¹, Rong Jin², Irwin King¹, Michael R. Lyu¹•Institutions (2)

The Chinese University of Hong Kong¹, Michigan State University²

08 Dec 2008

TL;DR: The extended level method is extended, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and applies it to multiple kernel learning, and overcomes the drawbacks of SILP and SD.

...read moreread less

Abstract: We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i.e., Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91.9% of computational time over the SILP method and 70.3% over the SD method.

...read moreread less

179 citations

Journal Article•DOI•

Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation

[...]

Jianping Fan¹, Yuli Gao¹, Hangzai Luo¹•Institutions (1)

University of North Carolina at Charlotte¹

01 Mar 2008-IEEE Transactions on Image Processing

TL;DR: A novel hyperbolic framework for large-scale image visualization and interactive hypotheses assessment and a novel hierarchical boosting algorithm is developed to learn their ensemble classifiers hierarchically is developed.

...read moreread less

Abstract: In this paper, we have developed a new scheme for achieving multilevel annotations of large-scale images automatically. To achieve more sufficient representation of various visual properties of the images, both the global visual features and the local visual features are extracted for image content representation. To tackle the problem of huge intraconcept visual diversity, multiple types of kernels are integrated to characterize the diverse visual similarity relationships between the images more precisely, and a multiple kernel learning algorithm is developed for SVM image classifier training. To address the problem of huge interconcept visual similarity, a novel multitask learning algorithm is developed to learn the correlated classifiers for the sibling image concepts under the same parent concept and enhance their discrimination and adaptation power significantly. To tackle the problem of huge intraconcept visual diversity for the image concepts at the higher levels of the concept ontology, a novel hierarchical boosting algorithm is developed to learn their ensemble classifiers hierarchically. In order to assist users on selecting more effective hypotheses for image classifier training, we have developed a novel hyperbolic framework for large-scale image visualization and interactive hypotheses assessment. Our experiments on large-scale image collections have also obtained very positive results.

...read moreread less

152 citations

Proceedings Article•

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

[...]

Francis Bach¹•Institutions (1)

École Normale Supérieure¹

08 Dec 2008

TL;DR: In this article, the kernel decomposes into a large sum of individual basis kernels which can then be embedded in a directed acyclic graph, and it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels.

...read moreread less

Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms In this paper, we explore penalizing by sparsity-inducing norms such as the l1-norm or the block l1-norm We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance

...read moreread less

137 citations

Journal Article•DOI•

MultiK-MHKS: A Novel Multiple Kernel Learning Algorithm

[...]

Zhe Wang¹, Songcan Chen¹, Tingkai Sun²•Institutions (2)

Nanjing University¹, Nanjing University of Science and Technology²

01 Feb 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new effective multiple kernel learning algorithm that can maximally correlate the m views in the transformed coordinates and introduces a special term called Inter-Function Similarity Loss RIFSI into the existing regularization framework so as to guarantee the agreement of multiview outputs.

...read moreread less

Abstract: In this paper, we develop a new effective multiple kernel learning algorithm. First, we map the input data into m different feature spaces by m empirical kernels, where each generated feature space is taken as one view of the input space. Then, through borrowing the motivating argument from Canonical Correlation Analysis (CCA) that can maximally correlate the m views in the transformed coordinates, we introduce a special term called Inter-Function Similarity Loss RIFSI. into the existing regularization framework so as to guarantee the agreement of multiview outputs. In implementation, we select the Modification of Ho-Kashyap algorithm with Squared approximation of the misclassification errors (MHKS) as the incorporated paradigm and the experimental results on benchmark data sets demonstrate the feasibility and effectiveness of the proposed algorithm named MultiK-MHKS.

...read moreread less

128 citations

Proceedings Article•DOI•

Heterogeneous data fusion for alzheimer's disease study

[...]

Jieping Ye¹, Kewei Chen², Teresa Wu¹, Jing Li¹, Zheng Zhao¹, Rinkal Patel¹, Min Bae¹, Ravi Janardan³, Huan Liu¹, Gene E. Alexander², Eric M. Reiman² - Show less +7 more•Institutions (3)

Arizona State University¹, Good Samaritan Medical Center², University of Minnesota³

24 Aug 2008

TL;DR: Experimental results show that the integration of multiple data sources leads to a considerable improvement in the prediction accuracy, and the proposed algorithm identifies biomarkers that play more significant roles than others in AD diagnosis.

...read moreread less

Abstract: Effective diagnosis of Alzheimer's disease (AD) is of primary importance in biomedical research. Recent studies have demonstrated that neuroimaging parameters are sensitive and consistent measures of AD. In addition, genetic and demographic information have also been successfully used for detecting the onset and progression of AD. The research so far has mainly focused on studying one type of data source only. It is expected that the integration of heterogeneous data (neuroimages, demographic, and genetic measures) will improve the prediction accuracy and enhance knowledge discovery from the data, such as the detection of biomarkers. In this paper, we propose to integrate heterogeneous data for AD prediction based on a kernel method. We further extend the kernel framework for selecting features (biomarkers) from heterogeneous data sources. The proposed method is applied to a collection of MRI data from 59 normal healthy controls and 59 AD patients. The MRI data are pre-processed using tensor factorization. In this study, we treat the complementary voxel-based data and region of interest (ROI) data from MRI as two data sources, and attempt to integrate the complementary information by the proposed method. Experimental results show that the integration of multiple data sources leads to a considerable improvement in the prediction accuracy. Results also show that the proposed algorithm identifies biomarkers that play more significant roles than others in AD diagnosis.

...read moreread less

117 citations

Journal Article•

Multi-class Discriminant Kernel Learning via Convex Programming

[...]

Jieping Ye, Shuiwang Ji¹, Jianhui Chen¹•Institutions (1)

Arizona State University¹

01 Jun 2008-Journal of Machine Learning Research

TL;DR: It is shown that the kernel learning problem in RKDA can be formulated as convex programs, and SDP formulations are proposed for the multi-class case, which leads naturally to QCQP and SILP formulations.

...read moreread less

Abstract: Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multi-class case based on a key result established in this paper. That is, the multi-class RKDA kernel learning problem can be decomposed into a set of binary-class kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multi-class case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.

...read moreread less

104 citations

Proceedings Article•

Infinite Kernel Learning

[...]

Peter V. Gehler¹, Sebastian Nowozin¹•Institutions (1)

Max Planck Society¹

01 Oct 2008

TL;DR: The results show two things: for many datasets there is no benefit in using MKL/IKL instead of the SVM classifier, thus the flexibility of using more than one kernel seems to be of no use, and on some datasets IKL yields massive increases in accuracy over SVM/MKL due to the possibility of using a largely increased kernel set.

...read moreread less

Abstract: In this paper we build upon the Multiple Kernel Learning (MKL) framework and in particular on [2] which generalized it to infinitely many kernels . We rewrite the problem in the standard MKL formulation which leads to a Semi-Infinite Program. We devise a new algorithm to solve it (Infinite Kernel Learning, IKL). The IKL algorithm is applicable to both the finite and infinite case and we find it to be faster and more stable than SimpleMKL [8]. Furthermore we present the first large scale comparison of SVMs to MKL on a variety of benchmark datasets, also comparing IKL. The results show two things: a) for many datasets there is no benefit in using MKL/IKL instead of the SVM classifier, thus the flexibility of using more than one kernel seems to be of no use, b) on some datasets IKL yields massive increases in accuracy over SVM/MKL due to the possibility of using a largely increased kernel set. For those cases parameter selection through Cross-Validation or MKL is not applicable.

...read moreread less

88 citations

Proceedings Article•DOI•

Training SVM with indefinite kernels

[...]

Jianhui Chen¹, Jieping Ye¹•Institutions (1)

Arizona State University¹

05 Jul 2008

TL;DR: This paper considers a regularized SVM formulation, in which the indefinite kernel matrix is treated as a noisy observation of some unknown positive semidefinite one (proxy kernel) and the support vectors and the proxy kernel can be computed simultaneously.

...read moreread less

Abstract: Similarity matrices generated from many applications may not be positive semidefinite, and hence can't fit into the kernel machine framework. In this paper, we study the problem of training support vector machines with an indefinite kernel. We consider a regularized SVM formulation, in which the indefinite kernel matrix is treated as a noisy observation of some unknown positive semidefinite one (proxy kernel) and the support vectors and the proxy kernel can be computed simultaneously. We propose a semi-infinite quadratically constrained linear program formulation for the optimization, which can be solved iteratively to find a global optimum solution. We further propose to employ an additional pruning strategy, which significantly improves the efficiency of the algorithm, while retaining the convergence property of the algorithm. In addition, we show the close relationship between the proposed formulation and multiple kernel learning. Experiments on a collection of benchmark data sets demonstrate the efficiency and effectiveness of the proposed algorithm.

...read moreread less

Proceedings Article•

Multi-label Multiple Kernel Learning

[...]

Shuiwang Ji¹, Liang Sun¹, Rong Jin², Jieping Ye¹•Institutions (2)

Arizona State University¹, Michigan State University²

08 Dec 2008

TL;DR: The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP) and an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem.

...read moreread less

Abstract: We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instance-label correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms.

...read moreread less

Proceedings Article•DOI•

Composite kernel learning

[...]

Marie Szafranski¹, Yves Grandvalet¹, Alain Rakotomamonjy²•Institutions (2)

University of Technology of Compiègne¹, University of Rouen²

05 Jul 2008

TL;DR: This work proposes Composite Kernel Learning to address the situation where distinct components give rise to a group structure among kernels, and characterize the convexity of the learning problem, and provide a general wrapper algorithm for computing solutions.

...read moreread less

Abstract: The Support Vector Machine (SVM) is an acknowledged powerful tool for building classifiers, but it lacks flexibility, in the sense that the kernel is chosen prior to learning. Multiple Kernel Learning (MKL) enables to learn the kernel, from an ensemble of basis kernels, whose combination is optimized in the learning process. Here, we propose Composite Kernel Learning to address the situation where distinct components give rise to a group structure among kernels. Our formulation of the learning problem encompasses several setups, putting more or less emphasis on the group structure. We characterize the convexity of the learning problem, and provide a general wrapper algorithm for computing solutions. Finally, we illustrate the behavior of our method on multi-channel data where groups correpond to channels.

...read moreread less

Book Chapter•DOI•

A Multiple Kernel Learning Approach to Joint Multi-class Object Detection

[...]

Christoph H. Lampert¹, Matthew B. Blaschko¹•Institutions (1)

Max Planck Society¹

10 Jun 2008

TL;DR: A method to combine the efficiency of single class localization with a subsequent decision process that works jointly for all given object classes is proposed by following a multiple kernel learning (MKL) approach and shows that the subsequent joint decision step clearly improves the accuracy compared to single class detection.

...read moreread less

Abstract: Most current methods for multi-class object classification and localization work as independent 1-vs-rest classifiers. They decide whether and where an object is visible in an image purely on a per-class basis. Joint learning of more than one object class would generally be preferable, since this would allow the use of contextual information such as co-occurrence between classes. However, this approach is usually not employed because of its computational cost. In this paper we propose a method to combine the efficiency of single class localization with a subsequent decision process that works jointly for all given object classes. By following a multiple kernel learning (MKL) approach, we automatically obtain a sparse dependency graph of relevant object classes on which to base the decision. Experiments on the PASCAL VOC 2006 and 2007 datasets show that the subsequent joint decision step clearly improves the accuracy compared to single class detection.

...read moreread less

Book Chapter•DOI•

An Automated Combination of Kernels for Predicting Protein Subcellular Localization

[...]

Cheng Soon Ong¹, Alexander Zien²•Institutions (2)

Max Planck Society¹, Fraunhofer Society²

15 Sep 2008

TL;DR: Retsch et al. as discussed by the authors utilized the multiclass support vector machine (m-SVM) method to directly solve protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems.

...read moreread less

Abstract: Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. Here we utilize the multiclass support vector machine (m-SVM) method to directly solve protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. We further propose a general class of protein sequence kernels which considers all motifs, including motifs with gaps. Instead of heuristically selecting one or a few kernels from this family, we utilize a recent extension of SVMs that optimizes over multiple kernels simultaneously. This way, we automatically search over families of possible amino acid motifs. We compare our automated approach to three other predictors on four different datasets, and show that we perform better than the current state of the art. Further, our method provides some insights as to which sequence motifs are most useful for determining subcellular localization, which are in agreement with biological reasoning. Data files, kernel matrices and open source software are available at http://www.fml.mpg.de/raetsch/projects/protsubloc .

...read moreread less

Proceedings Article•DOI•

Simple but effective methods for combining kernels in computational biology

[...]

H. Tanabe¹, Tu Bao Ho¹, Canh Hao Nguyen¹, Saori Kawasaki¹•Institutions (1)

Japan Advanced Institute of Science and Technology¹

13 Jul 2008

TL;DR: Two simple but effective methods for determining weights for conic combination of multiple kernels are proposed, to learn optimal weights formulated by the measure FSM for kernel matrix evaluation (feature space-basedkernel matrix evaluation measure), denoted by FSM-MKL.

...read moreread less

Abstract: Complex biological data generated from various experiments are stored in diverse data types in multiple datasets. By appropriately representing each biological dataset as a kernel matrix then combining them in solving problems, the kernel-based approach has become a spotlight in data integration and its application in bioinformatics and other fields as well. While linear combination of unweighed multiple kernels (UMK) is popular, there have been effort on multiple kernel learning (MKL) where optimal weights are learned by semi-definite programming or sequential minimal optimization (SMO-MKL). These methods provide high accuracy of biological prediction problems, but very complicated and hard to use, especially for non-experts in optimization. These methods are also usually of high computational cost and not suitable for large data sets. In this paper, we propose two simple but effective methods for determining weights for conic combination of multiple kernels. The former is to learn optimal weights formulated by our measure FSM for kernel matrix evaluation (feature space-based kernel matrix evaluation measure), denoted by FSM-MKL. The latter assigns a weight to each kernel that is proportional to the quality of the kernel, determining by direct cross validation, named proportionally weighted multiple kernels (PWMK). Experimental comparative evaluation of the four methods UMK, SMO-MKL, FSM-MKL and PWMK for the problem of protein-protein interactions shows that our proposed methods are simpler, more efficient but still effective. They achieved performances almost as high as that of MKL and higher than that of UMK.

...read moreread less

Posted Content•

Predicting Abnormal Returns From News Using Text Classification

[...]

Ronny Luss¹, Alexandre d'Aspremont¹•Institutions (1)

Princeton University¹

16 Sep 2008-arXiv: Learning

TL;DR: The authors used text from news articles to predict intraday price movements of financial assets using support vector machines and developed an analytic center cutting plane method to solve the kernel learning problem efficiently.

...read moreread less

Abstract: We show how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features to increase classification performance and we develop an analytic center cutting plane method to solve the kernel learning problem efficiently. We observe that while the direction of returns is not predictable using either text or returns, their size is, with text features producing significantly better performance than historical returns alone.

...read moreread less

Proceedings Article•

Dimensionality Reduction for Data in Multiple Feature Representations

[...]

Yen-Yu Lin¹, Tyng-Luh Liu², Chiou-Shann Fuh¹•Institutions (2)

National Taiwan University¹, Academia Sinica²

08 Dec 2008

TL;DR: An approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR) is described, which is flexible in simultaneously tackling data in various feature representations and general in that it is established upon graph embedding.

...read moreread less

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

...read moreread less

Proceedings Article•DOI•

Multiple kernel learning for speaker verification

[...]

C Longworth¹, Mark J. F. Gales¹•Institutions (1)

University of Cambridge¹

12 May 2008

TL;DR: Several refinements to the standard maximum-margin scheme for speaker verification systems, including a regularisation term, are examined, allowing the appropriate level of sparsity to be selected.

...read moreread less

Abstract: Many speaker verification (SV) systems combine multiple classifiers using score-fusion to improve system performance. For SVM classifiers, an alternative strategy is to combine at the kernel level. This involves finding a suitable kernel weighting, known as multiple kernel learning (MKL). Recently, an efficient maximum-margin scheme for MKL has been proposed. This work examines several refinements to this scheme for SV. The standard scheme has a known tendency towards sparse weightings, which may not be optimal for SV. A regularisation term is proposed, allowing the appropriate level of sparsity to be selected. Cross-speaker tying of kernel weights is also applied to improve robustness. Various combinations of dynamic kernels were evaluated, including derivative and parametric kernels based upon different model structures. The performance achieved on the NIST 2002 SRE when combining five kernels was 4.83% EER.

...read moreread less

Proceedings Article•DOI•

Learning subspace kernels for classification

[...]

Jianhui Chen¹, Shuiwang Ji¹, Betul Ceran¹, Qi Li², Mingrui Wu³, Jieping Ye¹ - Show less +2 more•Institutions (3)

Arizona State University¹, Western Kentucky University², Yahoo!³

24 Aug 2008

TL;DR: It is shown that the optimal subspace kernel can be obtained efficiently by solving an eigenvalue problem and an equivalent semi-infinite linear program (SILP) formulation which can be solved efficiently by the column generation technique.

...read moreread less

Abstract: Kernel methods have been applied successfully in many data mining tasks. Subspace kernel learning was recently proposed to discover an effective low-dimensional subspace of a kernel feature space for improved classification. In this paper, we propose to construct a subspace kernel using the Hilbert-Schmidt Independence Criterion (HSIC). We show that the optimal subspace kernel can be obtained efficiently by solving an eigenvalue problem. One limitation of the existing subspace kernel learning formulations is that the kernel learning and classification are independent and the subspace kernel may not be optimally adapted for classification. To overcome this limitation, we propose a joint optimization framework, in which we learn the subspace kernel and subsequent classifiers simultaneously. In addition, we propose a novel learning formulation that extracts an uncorrelated subspace kernel to reduce the redundant information in a subspace kernel. Following the idea from multiple kernel learning, we extend the proposed formulations to the case when multiple kernels are available and need to be combined. We show that the integration of subspace kernels can be formulated as a semidefinite program (SDP) which is computationally expensive. To improve the efficiency of the SDP formulation, we propose an equivalent semi-infinite linear program (SILP) formulation which can be solved efficiently by the column generation technique. Experimental results on a collection of benchmark data sets demonstrate the effectiveness of the proposed algorithms.

...read moreread less

Journal Article•

A multiple kernel learning approach to joint multi-class object detection

[...]

Christoph H. Lampert¹, Matthew B. Blaschko¹•Institutions (1)

Max Planck Society¹

01 Jan 2008-Lecture Notes in Computer Science

TL;DR: In this paper, a multiple kernel learning (MKLKL) approach is used to combine the efficiency of single class localization with a subsequent decision process that works jointly for all given object classes.

...read moreread less

Journal Article•DOI•

Better multiclass classification via a margin-optimized single binary problem

[...]

Ran El-Yaniv¹, Dmitry Pechyony¹, Elad Yom-Tov²•Institutions (2)

Technion – Israel Institute of Technology¹, IBM²

30 Oct 2008-Pattern Recognition Letters

TL;DR: A new multiclass classification method that reduces the multiclass problem to a single binary classifier (SBC) and indicates that it outperforms one-vs.-all, all-pairs and the error-correcting output coding scheme at least when the number of classes is small.

...read moreread less

Journal Article•DOI•

Adaptive diffusion kernel learning from biological networks for protein function prediction

[...]

Liang Sun¹, Shuiwang Ji¹, Jieping Ye¹•Institutions (1)

Arizona State University¹

25 Mar 2008-BMC Bioinformatics

TL;DR: This paper addresses the issue of learning an optimal diffusion kernel, in the form of a convex combination of a set of pre-specified kernels constructed from biological networks, for protein function prediction, and shows that the performance of linearly combined diffusion kernel is better than every single candidate diffusion kernel.

...read moreread less

Abstract: Machine-learning tools have gained considerable attention during the last few years for analyzing biological networks for protein function prediction. Kernel methods are suitable for learning from graph-based data such as biological networks, as they only require the abstraction of the similarities between objects into the kernel matrix. One key issue in kernel methods is the selection of a good kernel function. Diffusion kernels, the discretization of the familiar Gaussian kernel of Euclidean space, are commonly used for graph-based data. In this paper, we address the issue of learning an optimal diffusion kernel, in the form of a convex combination of a set of pre-specified kernels constructed from biological networks, for protein function prediction. Most prior work on this kernel learning task focus on variants of the loss function based on Support Vector Machines (SVM). Their extensions to other loss functions such as the one based on Kullback-Leibler (KL) divergence, which is more suitable for mining biological networks, lead to expensive optimization problems. By exploiting the special structure of the diffusion kernel, we show that this KL divergence based kernel learning problem can be formulated as a simple optimization problem, which can then be solved efficiently. It is further extended to the multi-task case where we predict multiple functions of a protein simultaneously. We evaluate the efficiency and effectiveness of the proposed algorithms using two benchmark data sets. Results show that the performance of linearly combined diffusion kernel is better than every single candidate diffusion kernel. When the number of tasks is large, the algorithms based on multiple tasks are favored due to their competitive recognition performance and small computational costs.

...read moreread less

Proceedings Article•DOI•

Multiple kernel learning from sets of partially matching image features

[...]

Siyao Fu¹, Guo ShengYang¹, Zeng-Guang Hou², Zize Liang², Min Tan² - Show less +1 more•Institutions (2)

Central University, India¹, Chinese Academy of Sciences²

01 Dec 2008

TL;DR: This paper shows that MKL problem with a enhanced spatial pyramid match kernel can be solved efficiently using projected gradient method, and demonstrates the algorithm on classification tasks, which is based on a linear combination of the proposed kernels computed at multiple pyramid levels of image encoding.

...read moreread less

Abstract: Recent publications and developments based on SVM have shown that using multiple kernels instead of a single one can enhance interpretability of the decision function and improve classifier performance, which motivates researchers to explore the use of homogeneous model obtained as linear combinations of kernels. However, the use of multiple kernels faces the challenge of choosing the kernel weights, and an increased number of parameters that may lead to overfitting. In this paper we show that MKL problem with a enhanced spatial pyramid match kernel can be solved efficiently using projected gradient method. Weights on each kernel matrix (level) are included in the standard SVM empirical risk minimization problem with a L2 constraint to encourage sparsity. We demonstrate our algorithm on classification tasks, which is based on a linear combination of the proposed kernels computed at multiple pyramid levels of image encoding, and we show that the proposed method is accurate and significantly more efficient than current approaches.

...read moreread less

Proceedings Article•DOI•

Multi-Kernel Support Vector Clustering for Multi-Class Classification

[...]

Chi-Yuan Yeh¹, Chi-Wei Huang¹, Shie-Jue Lee¹•Institutions (1)

National Sun Yat-sen University¹

18 Jun 2008

TL;DR: A two stage multiple kernel learning algorithm is developed by incorporating sequential minimal optimization (SMO) with the gradient projection method, and experimental results show that the proposed approach outperforms single-kernel support vector clustering.

...read moreread less

Abstract: Support vector clustering (SVC) has been successfully applied to solve multi-class classification problems. However, it is usually hard to determine the hyper-parameters of RBF kernel functions. A multiple kernel learning (MKL) algorithm is developed to solve this problem, by which the kernel matrix weights and Lagrange multipliers can be simultaneously obtained with semidefinite programming. However, the amount of time and space required is very demanding. We develop a two stage multiple kernel learning algorithm by incorporating sequential minimal optimization (SMO) with the gradient projection method. Experimental results on data sets from UCI and Statlog show that the proposed approach outperforms single-kernel support vector clustering.

...read moreread less

Book Chapter•DOI•

New approach for hierarchical classifier training and multi-level image annotation

[...]

Jianping Fan¹, Yuli Gao¹, Hangzai Luo¹, Shin'ichi Satoh²•Institutions (2)

University of North Carolina at Charlotte¹, National Institute of Informatics²

09 Jan 2008

TL;DR: A novel hierarchical boosting algorithm is proposed by incorporating concept ontology and multi-task learning to achieve hierarchical image classifier training to enable automatic multi-level image annotation.

...read moreread less

Abstract: In this paper, we have proposed a novel algorithm to achieve automatic multi-level image annotation by incorporating concept ontology and multitask learning for hierarchical image classifier training. To achieve more reliable image classifier training in high-dimensional heterogeneous feature space, a new algorithm is proposed by incorporating multiple kernels for diverse image similarity characterization, and a multiple kernel learning algorithm is developed to train the SVM classifiers for the atomic image concepts at the first level of the concept ontology. To enable automatic multi-level image annotation, a novel hierarchical boosting algorithm is proposed by incorporating concept ontology and multi-task learning to achieve hierarchical image classifier training.

...read moreread less

Book Chapter•DOI•

Multiple kernel support vector regression for siRNA efficacy prediction

[...]

Shibin Qiu, Terran Lane¹•Institutions (1)

University of New Mexico¹

06 May 2008

TL;DR: Empirical results on real biological data demonstrate that multiple kernel regression can improve accuracy and decrease model complexity by reducing the number of support vectors.

...read moreread less

Abstract: The cell defense mechanism of RNA interference has applicationsin gene function analysis and human disease therapy. To effectivelysilence a target gene, it is desirable to select the initiator siRNA moleculeshaving satisfactory silencing capabilities. Computational prediction forsilencing efficacy of siRNAs can assist this screening process before usingthem in biological experiments. String kernel functions, which operatedirectly on the string objects representing siRNAs and target mRNAs,have been applied to support vector regression for the prediction and improvedaccuracy over numerical kernels in multidimensional vector spacesconstructed from descriptors of siRNA design rules. To fully utilize informationprovided by string and numerical kernels, we propose to unifythe two in the kernel feature space by devising a multiple kernel regressionframework where a linear combination of the kernels are used. Weformulate the multiple kernel learning into a quadratically constrainedquadratic programming (QCQP) problem, which although yields globaloptimal solution, is computationally inefficient and requires a commercialsolver package.We further propose three heuristics based on the principleof kernel-target alignment and predictive accuracy. Empirical results onreal biological data demonstrate that multiple kernel regression can improveaccuracy and decrease model complexity by reducing the numberof support vectors. In addition, multiple kernel regression gives insightsinto the kernel combination, which, for siRNA efficacy prediction, evaluatesthe relative significance of the design rules.

...read moreread less

Basic metric learning

[...]

Zakria Hussain, John Shawe-Taylor, Craig Saunders, Kitsuchart Pasupa

31 Dec 2008

TL;DR: This method attempts at finding the most compact ball amongst the 11 different feature representations of the CBIR task with relevance feedback using a novel 1- and 2-norm regularisation technique for the 1-class SVM under the MKL framework.

...read moreread less

Abstract: This report presents a a novel Multiple Kernel Learning (MKL) algorithm for the 1-class support vector machine. The emphasis is placed on viewing the CBIR task with relevance feedback as a metric learning problem, where each image has 11 different feature extraction methods applied to it. Our method attempts at finding the most compact ball amongst the 11 different feature representations using a novel 1- and 2-norm regularisation technique for the 1-class SVM under the MKL framework. We also devise a simple way of including the set of negative examples whilst still utilising the 1-class SVM implementation.

...read moreread less

Journal Article•

Multiple Kernel Learning Based on Cooperative Clustering

[...]

Yin Chuanhuan

01 Jan 2008-Journal of Beijing Jiaotong University

TL;DR: The experimental results reveal that the time consumptions of training and testing are decreased and the classification efficiency is maintained at the same level as the origin after applying the cooperative clustering to multiple kernel SVM.

...read moreread less

Abstract: Support vector machine based on multiple kernel learning is proposed due to the learning problems involve multiple and heterogeneous data sources,however,the increase of kernels will increase the computation of multiple kernel learning inevitably.To solve this problem,a new cluster method is presented,which is called cooperative clustering.Applying the cooperative clustering to multiple kernel SVM,the number of support vectors will be reduced,the time complexity of computation is also reduced.The experimental results reveal that the time consumptions of training and testing are decreased and the classification efficiency is maintained at the same level as the origin after applying our method.

...read moreread less

Proceedings Article•DOI•

Improving efficiency of multi-kernel learning for support vector machines

[...]

Chi-Yuan Yeh¹, Wen-Pin Su¹, Shie-Jue Lee¹•Institutions (1)

National Sun Yat-sen University¹

12 Jul 2008

TL;DR: This study reformulate the SDP problem to reduce the time and space requirements, and strategies for reducing the search space in solving the SSPD problem are introduced.

...read moreread less

Abstract: Support vector machines (SVMs) have been successfully applied to classification problems. Practical issues Involve how to determine the right type and suitable hyperparameters of kernel functions. Recently, multiple-kernel learning (MKL) algorithms are developed to handle these issues by combining different kernels. The weight with each kernel in the combination is obtained through learning. One of the most popular methods is to learn the weights with semidefinite programming (SDP). However, the amount of time and space required by this method is demanding. In this study, we reformulate the SDP problem to reduce the time and space requirements. Strategies for reducing the search space in solving the SDP problem are introduced. Experimental results obtained from running on synthetic datasets and benchmark datasets of UCI and Statlog show that the proposed approach improves the efficiency of the SDP method without degrading the performance.

...read moreread less