scispace - formally typeset
Search or ask a question

Showing papers on "Multiple kernel learning published in 2007"


Posted Content
TL;DR: In this paper, the authors consider the least-square regression problem with regularization by a block 1-norm and derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification.
Abstract: We consider the least-square regression problem with regularization by a block 1-norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1-norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic model consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.

613 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: This paper investigates the problem of learning optimal descriptors for a given classification task using the kernel learning framework and learns the optimal, domain-specific kernel as a combination of base kernels corresponding to base features which achieve different levels of trade-off.
Abstract: We investigate the problem of learning optimal descriptors for a given classification task. Many hand-crafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that it achieves between discriminative power and invariance. Since this trade-off must vary from task to task, no single descriptor can be optimal in all situations. Our focus, in this paper, is on learning the optimal tradeoff for classification given a particular training set and prior constraints. The problem is posed in the kernel learning framework. We learn the optimal, domain-specific kernel as a combination of base kernels corresponding to base features which achieve different levels of trade-off (such as no invariance, rotation invariance, scale invariance, affine invariance, etc.) This leads to a convex optimisation problem with a unique global optimum which can be solved for efficiently. The method is shown to achieve state-of-the-art performance on the UIUC textures, Oxford flowers and Cal- tech 101 datasets.

566 citations


Proceedings ArticleDOI
20 Jun 2007
TL;DR: This paper proposes an algorithm for solving the MKL problem through an adaptive 2-norm regularization formulation and provides an new insight on MKL algorithms based on block 1- norm regularization by showing that the two approaches are equivalent.
Abstract: An efficient and general multiple kernel learning (MKL) algorithm has been recently proposed by Sonnenburg et al. (2006). This approach has opened new perspectives since it makes the MKL approach tractable for large-scale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs several iterations before converging towards a reasonable solution. In this paper, we address the MKL problem through an adaptive 2-norm regularization formulation. Weights on each kernel matrix are included in the standard SVM empirical risk minimization problem with a l1 constraint to encourage sparsity. We propose an algorithm for solving this problem and provide an new insight on MKL algorithms based on block 1-norm regularization by showing that the two approaches are equivalent. Experimental results show that the resulting algorithm converges rapidly and its efficiency compares favorably to other MKL algorithms.

310 citations


Proceedings ArticleDOI
20 Jun 2007
TL;DR: This work proposes MKL for joint feature maps, which provides a convenient and principled way for MKL with multiclass problems, and shows the equivalence of several different primal formulations including different regularizers.
Abstract: In many applications it is desirable to learn from several kernels. "Multiple kernel learning" (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semi-infinite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets.

308 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: A family of kernels between images, defined as kernels between their respective segmentation graphs, based on soft matching of subtree-patterns of the respective graphs, leveraging the natural structure of images while remaining robust to the associated segmentation process uncertainty.
Abstract: We propose a family of kernels between images, defined as kernels between their respective segmentation graphs. The kernels are based on soft matching of subtree-patterns of the respective graphs, leveraging the natural structure of images while remaining robust to the associated segmentation process uncertainty. Indeed, output from morphological segmentation is often represented by a labelled graph, each vertex corresponding to a segmented region, with edges joining neighboring regions. However, such image representations have mostly remained underused for learning tasks, partly because of the observed instability of the segmentation process and the inherent hardness of inexact graph matching with uncertain graphs. Our kernels count common virtual substructures amongst images, which enables to perform efficient supervised classification of natural images with a support vector machine. Moreover, the kernel machinery allows us to take advantage of recent advances in kernel-based learning: (i) semi-supervised learning reduces the required number of labelled images, while (ii) multiple kernel learning algorithms efficiently select the most relevant similarity measures between images within our family.

266 citations


Journal ArticleDOI
TL;DR: A novel rule extraction approach using the information provided by the separating hyperplane and support vectors is proposed to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity of SVM.

113 citations


Book ChapterDOI
11 Apr 2007
TL;DR: This work proposes an evolutionary approach for finding the optimal weights of a combined kernel used by the Support Vector Machines (SVM) algorithm for solving some particular problems and uses a genetic algorithm for evolving these weights.
Abstract: Standard kernel-based classifiers use only a single kernel, but the real-world applications and the recent developments of various kernel methods have emphasized the need to consider a combination of multiple kernels. We propose an evolutionary approach for finding the optimal weights of a combined kernel used by the Support Vector Machines (SVM) algorithm for solving some particular problems. We use a genetic algorithm (GA) for evolving these weights. The numerical experiments show that the evolved combined kernels (ECKs) perform better than the convex combined kernels (CCKs) for several classification problems.

22 citations


Proceedings ArticleDOI
13 Jun 2007
TL;DR: It is shown that the MKL framework enable us to apply a model selection and improve the performance and three different applications concerning combination of representations, automatic parameters setting and feature selection are proposed.
Abstract: This paper presents a pedestrian detection method based on the multiple kernel framework. This approach enables us to select and combine different kinds of image representations. The combination is done through a linear combination of kernels, weighted according to the relevance of kernels. After having presented some descriptors and detailed the multiple kernel framework, we propose three different applications concerning combination of representations, automatic parameters setting and feature selection. We then show that the MKL framework enable us to apply a model selection and improve the performance.

9 citations


01 Jan 2007
TL;DR: A family of kernels between images, defined as kernels between their respective segmentation graphs, based on soft matching of subtree-patterns of the respective graphs, leveraging the natural structure of images while remaining robust to the associated segmentation process uncertainty.
Abstract: We propose a family of kernels between images, defined as kernels between their respective segmentation graphs. The kernels are based on soft matching of subtree-patterns of the respective graphs, leveraging the natural structure of images while remaining robust to the associated segmentation process uncertainty. Indeed, output from morphological segmentation is often represented by a labelled graph, each vertex corresponding to a segmented region, with edges joining neighboring regions. However, such image representations have mostly remained underused for learning tasks, partly because of the observed instability of the segmentation process and the inherent hardness of inexact graph matching with uncertain graphs. Our kernels count common virtual substructures amongst images, which enables to perform efficient supervised classification of natural images with a support vector machine. Moreover, the kernel machinery allows us to take advantage of recent advances in kernel-based learning: i) semi-supervised learning reduces the required number of labelled images, while ii) multiple kernel learning algorithms efficiently select the most relevant similarity measures between images within our family.

1 citations


01 Oct 2007
TL;DR: This paper proposes an optimal way of integrating multiple features in the framework of multiple kernel learning that optimally combine seven kernels extracted from sequence, physico-chemical properties, pairwise alignment, and structural information and significantly improves the prediction preformance compared with the previous well-known methods.
Abstract: Phosphorylation is one of the most important post translational modifications which regulate the activity of proteins. The problem of predicting phosphorylation sites is the first step of understanding various biological processes that initiate the actual function of proteins in each signaling pathway. Although many prediction methods using single or multiple features extracted from protein sequences have been proposed, systematic data integration approach has not been applied in order to improve the accuracy of predicting general phosphorylation sites. In this paper, we propose an optimal way of integrating multiple features in the framework of multiple kernel learning. We optimally combine seven kernels extracted from sequence, physico-chemical properties, pairwise alignment, and structural information. Using the data set of Phospho.ELM, the accuracy evaluated by 5-fold cross-validation reaches 85% for serine, 85% for threonine, and 81% for tyrosine. Our computational experiments show significant improvement in the performance of prediction relative to a single feature, or to the combined feature with equal weights. Moreover, our systematic integration method significantly improves the prediction preformance compared with the previous well-known methods.

Journal Article
TL;DR: The proposed algorithm of multiple kernel learning considers that conic combinations of kernel matrices for classification leads to a convex quadratically constraint quadratic program, and it can be efficiently solved by recycling the standard SVM implementations.
Abstract: According to the feature of text classification which often involves multiple,heterogeneous data sources,this paper puts forward the algorithm of multiple kernel learningIt considers that conic combinations of kernel matrices for classification leads to a convex quadratically constraint quadratic program,and it can be efficiently solved by recycling the standard SVM implementationsExperimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined,and it has higher recall rate and higher precision rate for classification of text email with multiple,heterogeneous data sources