scispace - formally typeset
Search or ask a question
Topic

Multiple kernel learning

About: Multiple kernel learning is a research topic. Over the lifetime, 1630 publications have been published within this topic receiving 56082 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper redefines multiple kernels using deep multi-layer networks as a multi-layered combination of nonlinear activation functions, each one involves a combination of several elementary or intermediate kernels, and results into a positive semi-definite deep kernel.
Abstract: Multiple kernel learning (MKL) is a widely used technique for kernel design. Its principle consists in learning, for a given support vector classifier, the most suitable convex (or sparse) linear combination of standard elementary kernels. However, these combinations are shallow and often powerless to capture the actual similarity between highly semantic data, especially for challenging classification tasks, such as image annotation. In this paper, we redefine multiple kernels using deep multi-layer networks. In this new contribution, a deep multiple kernel is recursively defined as a multi-layered combination of nonlinear activation functions, each one involves a combination of several elementary or intermediate kernels, and results into a positive semi-definite deep kernel. We propose four different frameworks in order to learn the weights of these networks: supervised, unsupervised, kernel-based semi-supervised, and Laplacian-based semi-supervised. When plugged into support vector machines, the resulting deep kernel networks show clear gain, compared with several shallow kernels for the task of image annotation. Extensive experiments and analysis on the challenging ImageCLEF photo annotation benchmark, the COREL5k database, and the Banana data set validate the effectiveness of the proposed method.

74 citations

Proceedings Article
16 Jun 2013
TL;DR: In this article, a fully conjugate probabilistic formulation of the kernelized matrix factorization problem is proposed, which enables an efficient variational approximation, whereas fully Bayesian treatments are not computationally feasible in the earlier approaches.
Abstract: We extend kernelized matrix factorization with a fully Bayesian treatment and with an ability to work with multiple side information sources expressed as different kernels. Kernel functions have been introduced to matrix factorization to integrate side information about the rows and columns (e.g., objects and users in recommender systems), which is necessary for making out-of-matrix (i.e., cold start) predictions. We discuss specifically bipartite graph inference, where the output matrix is binary, but extensions to more general matrices are straightforward. We extend the state of the art in two key aspects: (i) A fully conjugate probabilistic formulation of the kernelized matrix factorization problem enables an efficient variational approximation, whereas fully Bayesian treatments are not computationally feasible in the earlier approaches. (ii) Multiple side information sources are included, treated as different kernels in multiple kernel learning that additionally reveals which side information sources are informative. Our method outperforms alternatives in predicting drug-protein interactions on two data sets. We then show that our framework can also be used for solving multilabel learning problems by considering samples and labels as the two domains where matrix factorization operates on. Our algorithm obtains the lowest Hamming loss values on 10 out of 14 multilabel classification data sets compared to five state-of-the-art multilabel learning algorithms.

74 citations

Journal ArticleDOI
TL;DR: A novel MKL method, structure-preserving multiple kernel clustering (SPMKC), which proposes a new kernel affine weight strategy to learn an optimal consensus kernel from a predefined kernel pool, which can assign a suitable weight for each base kernel automatically.
Abstract: Multiple kernel learning (MKL) is generally recognized to perform better than single kernel learning (SKL) in handling nonlinear clustering problem, largely thanks to MKL avoids selecting and tuning predefined kernel. By integrating the self-expression learning framework, the graph-based MKL subspace clustering has recently attracted considerable attention. However, the graph structure of data in kernel space is largely ignored by previous MKL methods, which is a key concept of affinity graph construction for spectral clustering purposes. In order to address this problem, a novel MKL method is proposed in this article, namely, structure-preserving multiple kernel clustering (SPMKC). Specifically, SPMKC proposes a new kernel affine weight strategy to learn an optimal consensus kernel from a predefined kernel pool, which can assign a suitable weight for each base kernel automatically. Furthermore, SPMKC proposes a kernel group self-expressiveness term and a kernel adaptive local structure learning term to preserve the global and local structure of the input data in kernel space, respectively, rather than the original space. In addition, an efficient algorithm is proposed to solve the resulting unified objective function, which iteratively updates the consensus kernel and the affinity graph so that collaboratively promoting each of them to reach the optimum condition. Experiments on both image and text clustering demonstrate that SPMKC outperforms the state-of-the-art MKL clustering methods in terms of clustering performance and computational cost.

73 citations

Proceedings Article
29 Apr 2018
TL;DR: In this article, the authors proposed to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters, and transform the candidate solution into a new one that better approximates the discrete one.
Abstract: Spectral clustering has found extensive use in many areas. Most traditional spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretizing the learned labels by k-means clustering. Such common practice has two potential flaws, which may lead to severe information loss and performance degradation. First, predefined similarity graph might not be optimal for subsequent clustering. It is well-accepted that similarity graph highly affects the clustering results. To this end, we propose to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters. Second, the discrete solution may deviate from the spectral solution since k-means method is well-known as sensitive to the initialization of cluster centers. In this work, we transform the candidate solution into a new one that better approximates the discrete one. Finally, those three subtasks are integrated into a unified framework, with each subtask iteratively boosted by using the results of the others towards an overall optimal solution. It is known that the performance of a kernel method is largely determined by the choice of kernels. To tackle this practical problem of how to select the most suitable kernel for a particular data set, we further extend our model to incorporate multiple kernel learning ability. Extensive experiments demonstrate the superiority of our proposed method as compared to existing clustering approaches.

73 citations

Journal Article
TL;DR: In this paper, a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints is presented, which can be applied to standard squared-norm regularization, the Lasso, the group Lasso and other regularization schemes.
Abstract: We present a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints. The bound can be applied to standard squared-norm regularization, the Lasso, the group Lasso, some versions of the group Lasso with overlapping groups, multiple kernel learning and other regularization schemes. In all these cases competitive results are obtained. A novel feature of our bound is that it can be applied in an infinite dimensional setting such as the Lasso in a separable Hilbert space or multiple kernel learning with a countable number of kernels.

73 citations


Network Information
Related Topics (5)
Convolutional neural network
74.7K papers, 2M citations
89% related
Deep learning
79.8K papers, 2.1M citations
89% related
Feature extraction
111.8K papers, 2.1M citations
87% related
Feature (computer vision)
128.2K papers, 1.7M citations
87% related
Image segmentation
79.6K papers, 1.8M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202321
202244
202172
2020101
2019113
2018114