scispace - formally typeset
Search or ask a question

Showing papers on "Multiple kernel learning published in 2011"


Journal Article
TL;DR: Overall, using multiple kernels instead of a single one is useful and it is believed that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.
Abstract: In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.

1,762 citations


Proceedings Article
14 Jul 2011
TL;DR: In this paper, a generalized Fisher score was proposed to jointly select features, which maximizes the lower bound of traditional Fisher score by solving a quadratically constrained linear programming (QCLP) problem.
Abstract: Fisher score is one of the most widely used supervised feature selection methods. However, it selects each feature independently according to their scores under the Fisher criterion, which leads to a suboptimal subset of features. In this paper, we present a generalized Fisher score to jointly select features. It aims at finding an subset of features, which maximize the lower bound of traditional Fisher score. The resulting feature selection problem is a mixed integer programming, which can be reformulated as a quadratically constrained linear programming (QCLP). It is solved by cutting plane algorithm, in each iteration of which a multiple kernel learning problem is solved alternatively by multivariate ridge regression and projected gradient descent. Experiments on benchmark data sets indicate that the proposed method outperforms Fisher score as well as many other state-of-the-art feature selection methods.

472 citations


Journal ArticleDOI
TL;DR: Empirical applications of lp-norm MKL to three real-world problems from computational biology show that non-sparse MKL achieves accuracies that surpass the state-of-the-art, and two efficient interleaved optimization strategies for arbitrary norms are developed.
Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability and scalability. Unfortunately, this l1-norm MKL is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we extend MKL to arbitrary norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary norms, that is lp-norms with p ≥ 1. This interleaved optimization is much faster than the commonly used wrapper approaches, as demonstrated on several data sets. A theoretical analysis and an experiment on controlled artificial data shed light on the appropriateness of sparse, non-sparse and l∞-norm MKL in various scenarios. Importantly, empirical applications of lp-norm MKL to three real-world problems from computational biology show that non-sparse MKL achieves accuracies that surpass the state-of-the-art. Data sets, source code to reproduce the experiments, implementations of the algorithms, and further information are available at http://doc.ml.tu-berlin.de/nonsparse_mkl/.

423 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: A new spatio-temporal context distribution feature of interest points for human action recognition, and a new learning algorithm, called Multiple Kernel Learning with Augmented Features (AFMKL), to learn an adapted classifier based on multiple kernels and the pre-learned classifiers of other action classes.
Abstract: We first propose a new spatio-temporal context distribution feature of interest points for human action recognition. Each action video is expressed as a set of relative XYT coordinates between pairwise interest points in a local region. We learn a global GMM (referred to as Universal Background Model, UBM) using the relative coordinate features from all the training videos, and then represent each video as the normalized parameters of a video-specific GMM adapted from the global GMM. In order to capture the spatio-temporal relationships at different levels, multiple GMMs are utilized to describe the context distributions of interest points over multi-scale local regions. To describe the appearance information of an action video, we also propose to use GMM to characterize the distribution of local appearance features from the cuboids centered around the interest points. Accordingly, an action video can be represented by two types of distribution features: 1) multiple GMM distributions of spatio-temporal context; 2) GMM distribution of local video appearance. To effectively fuse these two types of heterogeneous and complementary distribution features, we additionally propose a new learning algorithm, called Multiple Kernel Learning with Augmented Features (AFMKL), to learn an adapted classifier based on multiple kernels and the pre-learned classifiers of other action classes. Extensive experiments on KTH, multi-view IXMAS and complex UCF sports datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.

253 citations


Journal ArticleDOI
TL;DR: The proposed approach generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: first, the method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data, and consequently improves their effectiveness.
Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. The resulting data representations are typically high-dimensional and assume diverse forms. Hence, finding a way of transforming them into a unified space of lower dimension generally facilitates the underlying tasks such as object recognition or clustering. To this end, the proposed approach (termed MKL-DR) generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: First, our method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data. Second, it extends a broad set of existing dimensionality reduction techniques to consider multiple kernel learning, and consequently improves their effectiveness. Third, by focusing on the techniques pertaining to dimensionality reduction, the formulation introduces a new class of applications with the multiple kernel learning framework to address not only the supervised learning problems but also the unsupervised and semi-supervised ones.

234 citations


Journal ArticleDOI
TL;DR: A two-stage multiple-kernel learning algorithm by incorporating sequential minimal optimization and the gradient projection method is developed, by which advantages from different hyperparameter settings can be combined and overall system performance can be improved.
Abstract: Support vector regression has been applied to stock market forecasting problems. However, it is usually needed to tune manually the hyperparameters of the kernel functions. Multiple-kernel learning was developed to deal with this problem, by which the kernel matrix weights and Lagrange multipliers can be simultaneously derived through semidefinite programming. However, the amount of time and space required is very demanding. We develop a two-stage multiple-kernel learning algorithm by incorporating sequential minimal optimization and the gradient projection method. By this algorithm, advantages from different hyperparameter settings can be combined and overall system performance can be improved. Besides, the user need not specify the hyperparameter settings in advance, and trial-and-error for determining appropriate hyperparameter settings can then be avoided. Experimental results, obtained by running on datasets taken from Taiwan Capitalization Weighted Stock Index, show that our method performs better than other methods.

227 citations


Proceedings Article
12 Dec 2011
TL;DR: A variational Bayesian inference algorithm which can be widely applied to sparse linear models and is based on the spike and slab prior, which is the golden standard for sparse inference is introduced.
Abstract: We introduce a variational Bayesian inference algorithm which can be widely applied to sparse linear models. The algorithm is based on the spike and slab prior which, from a Bayesian perspective, is the golden standard for sparse inference. We apply the method to a general multi-task and multiple kernel learning model in which a common set of Gaussian process functions is linearly combined with task-specific sparse weights, thus inducing relation between tasks. This model unifies several sparse linear models, such as generalized linear models, sparse factor analysis and matrix factorization with missing values, so that the variational algorithm can be applied to all these cases. We demonstrate our approach in multi-output Gaussian process regression, multi-class classification, image processing applications and collaborative filtering.

189 citations


Journal Article
TL;DR: To cope with the ubiquitous problems of subjectivity and inconsistency in multi-media similarity, this work develops graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.
Abstract: In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, including nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, unified similarity space. Our algorithm learns an optimal ensemble of kernel transformations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multi-media similarity, we develop graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.

155 citations


Proceedings Article
12 Dec 2011
TL;DR: A novel generative model is proposed that is able to reason jointly about the 3D scene layout as well as the3D location and orientation of objects in the scene and significantly increase the performance of state-of-the-art object detectors in their ability to estimate object orientation.
Abstract: We propose a novel generative model that is able to reason jointly about the 3D scene layout as well as the 3D location and orientation of objects in the scene. In particular, we infer the scene topology, geometry as well as traffic activities from a short video sequence acquired with a single camera mounted on a moving car. Our generative model takes advantage of dynamic information in the form of vehicle tracklets as well as static information coming from semantic labels and geometry (i.e., vanishing points). Experiments show that our approach outperforms a discriminative baseline based on multiple kernel learning (MKL) which has access to the same image information. Furthermore, as we reason about objects in 3D, we are able to significantly increase the performance of state-of-the-art object detectors in their ability to estimate object orientation.

123 citations


Proceedings Article
14 Jun 2011
TL;DR: This paper investigates a framework of Multi-Layer Multiple Kernel Learning that aims to learn “deep” kernel machines by exploring the combinations of multiple kernels in a multi-layer structure, which goes beyond the conventional MKL approach.
Abstract: Multiple Kernel Learning (MKL) aims to learn kernel machines for solving a real machine learning problem (e.g. classification) by exploring the combinations of multiple kernels. The traditional MKL approach is in general “shallow” in the sense that the target kernel is simply a linear (or convex) combination of some base kernels. In this paper, we investigate a framework of Multi-Layer Multiple Kernel Learning (MLMKL) that aims to learn “deep” kernel machines by exploring the combinations of multiple kernels in a multi-layer structure, which goes beyond the conventional MKL approach. Through a multiple layer mapping, the proposed MLMKL framework offers higher flexibility than the regular MKL for finding the optimal kernel for applications. As the first attempt to this new MKL framework, we present a Two-Layer Multiple Kernel Learning (2LMKL) method together with two efficient algorithms for classification tasks. We analyze their generalization performances and have conducted an extensive set of experiments over 16 benchmark datasets, in which encouraging results showed that our method performed better than the conventional MKL methods.

97 citations


Journal ArticleDOI
TL;DR: Empirical evidences from toy dataset and real-word datasets dealing with brain-computer interface single-trial electroencephalogram classification and protein subcellular localization show the benefit of the proposed approaches and algorithms.
Abstract: Recently, there has been much interest around multitask learning (MTL) problem with the constraints that tasks should share a common sparsity profile. Such a problem can be addressed through a regularization framework where the regularizer induces a joint-sparsity pattern between task decision functions. We follow this principled framework and focus on lp-lq (with 0 ≤ p ≤ 1 and 1 ≤ q ≤ 2) mixed norms as sparsity-inducing penalties. Our motivation for addressing such a larger class of penalty is to adapt the penalty to a problem at hand leading thus to better performances and better sparsity pattern. For solving the problem in the general multiple kernel case, we first derive a variational formulation of the l1-lq penalty which helps us in proposing an alternate optimization algorithm. Although very simple, the latter algorithm provably converges to the global minimum of the l1-lq penalized problem. For the linear case, we extend existing works considering accelerated proximal gradient to this penalty. Our contribution in this context is to provide an efficient scheme for computing the l1-lq proximal operator. Then, for the more general case, when 0 <; p <; 1, we solve the resulting nonconvex problem through a majorization-minimization approach. The resulting algorithm is an iterative scheme which, at each iteration, solves a weighted l1-lq sparse MTL problem. Empirical evidences from toy dataset and real-word datasets dealing with brain-computer interface single-trial electroencephalogram classification and protein subcellular localization show the benefit of the proposed approaches and algorithms.

Proceedings Article
12 Dec 2011
TL;DR: Experimental evidence suggests that the general approach presented can be instantiated with different metric learning algorithms provided that they satisfy some constraints and outperforms metric learning with an unweighted kernel combination and metriclearning with cross-validation based kernel selection.
Abstract: Metric learning has become a very active research field. The most popular representative–Mahalanobis metric learning–can be seen as learning a linear transformation and then computing the Euclidean metric in the transformed space. Since a linear transformation might not always be appropriate for a given learning problem, kernelized versions of various metric learning algorithms exist. However, the problem then becomes finding the appropriate kernel function. Multiple kernel learning addresses this limitation by learning a linear combination of a number of predefined kernels; this approach can be also readily used in the context of multiple-source learning to fuse different data sources. Surprisingly, and despite the extensive work on multiple kernel learning for SVMs, there has been no work in the area of metric learning with multiple kernel learning. In this paper we fill this gap and present a general approach for metric learning with multiple kernel learning. Our approach can be instantiated with different metric learning algorithms provided that they satisfy some constraints. Experimental evidence suggests that our approach outperforms metric learning with an unweighted kernel combination and metric learning with cross-validation based kernel selection.

Journal ArticleDOI
TL;DR: This paper proposes a generalized MKL model with a constraint on a linear combination of the -norm and the squared -norm on the kernel weights to seek the optimal kernel combination weights, which enjoys the favorable sparsity property on the solution and also facilitates the grouping effect.
Abstract: Kernel methods have been successfully applied in various applications. To succeed in these applications, it is crucial to learn a good kernel representation, whose objective is to reveal the data similarity precisely. In this paper, we address the problem of multiple kernel learning (MKL), searching for the optimal kernel combination weights through maximizing a generalized performance measure. Most MKL methods employ the -norm simplex constraints on the kernel combination weights, which therefore involve a sparse but non-smooth solution for the kernel weights. Despite the success of their efficiency, they tend to discard informative complementary or orthogonal base kernels and yield degenerated generalization performance. Alternatively, imposing the -norm constraint on the kernel weights will keep all the information in the base kernels. This leads to non-sparse solutions and brings the risk of being sensitive to noise and incorporating redundant information. To tackle these problems, we propose a generalized MKL (GMKL) model by introducing an elastic-net-type constraint on the kernel weights. More specifically, it is an MKL model with a constraint on a linear combination of the -norm and the squared -norm on the kernel weights to seek the optimal kernel combination weights. Therefore, previous MKL problems based on the -norm or the -norm constraints can be regarded as special cases. Furthermore, our GMKL enjoys the favorable sparsity property on the solution and also facilitates the grouping effect. Moreover, the optimization of our GMKL is a convex optimization problem, where a local solution is the global optimal solution. We further derive a level method to efficiently solve the optimization problem. A series of experiments on both synthetic and real-world datasets have been conducted to show the effectiveness and efficiency of our GMKL.

Journal ArticleDOI
01 Jan 2011
TL;DR: Experiments with Taiwanese banknotes show that the proposed approach outperforms single-kernel SVMs, standard SVMs with SDP, and multiple-SVM classifiers.
Abstract: Finding an efficient method to detect counterfeit banknotes is an imperative task in business transactions. In this paper, we propose a system based on multiple-kernel support vector machines for counterfeit banknote recognition. A support vector machine (SVM) to minimize false rates is developed. Each banknote is divided into partitions and the luminance histograms of the partitions are taken as the input of the system. Each partition is associated with its own kernels. Linearly weighted combination is adopted to combine multiple kernels into a combined matrix. Optimal weights with kernel matrices in the combination are obtained through semi-definite programming (SDP) learning. Two strategies are adopted to reduce the amount of time and space required by the SDP method. One strategy assumes the non-negativity of the kernel weights, and the other one is to set the sum of the weights to be unity. Experiments with Taiwanese banknotes show that the proposed approach outperforms single-kernel SVMs, standard SVMs with SDP, and multiple-SVM classifiers.

Journal ArticleDOI
TL;DR: A new optimization algorithm for Multiple Kernel Learning called SpicyMKL is proposed, which is applicable to general convex loss functions and general types of regularization, and gives a general block-norm formulation of MKL that includes non-sparse regularizations, such as elastic-net and ℓp-norm regularizations.
Abstract: We propose a new optimization algorithm for Multiple Kernel Learning (MKL) called SpicyMKL, which is applicable to general convex loss functions and general types of regularization. The proposed SpicyMKL iteratively solves smooth minimization problems. Thus, there is no need of solving SVM, LP, or QP internally. SpicyMKL can be viewed as a proximal minimization method and converges super-linearly. The cost of inner minimization is roughly proportional to the number of active kernels. Therefore, when we aim for a sparse kernel combination, our algorithm scales well against increasing number of kernels. Moreover, we give a general block-norm formulation of MKL that includes non-sparse regularizations, such as elastic-net and l p -norm regularizations. Extending SpicyMKL, we propose an efficient optimization method for the general regularization framework. Experimental results show that our algorithm is faster than existing methods especially when the number of kernels is large (>1000).

Journal ArticleDOI
TL;DR: The experimental results show that the proposed MK-SVM method not only leads to better global performances by taking the advantages of multiple features but also has a low computational complexity.
Abstract: In this letter, we propose a multiple kernel support vector machine (MK-SVM) method for multiple feature based VAD. To make the MK-SVM based VAD practical, we adapt the multiple kernel learning (MKL) thought to an efficient cutting-plane structural SVM solver. We further discuss the performances of the MK-SVM with two different optimization objectives, in terms of minimum classification errors (MCE) and improvement of receiver operating characteristic (ROC) curves. Our experimental results show that the proposed method not only leads to better global performances by taking the advantages of multiple features but also has a low computational complexity.

Journal ArticleDOI
TL;DR: Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization and outperforms state-of-the-art MKL solvers in terms of computational efficiency.
Abstract: This paper presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and ls, s≥2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels---hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efficient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O(m2ntot log nmax/e2) where m is no. training data points, nmax,ntot are the maximum no. kernels in any group, total no. kernels respectively and e is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efficiency.

Posted Content
TL;DR: In this article, the authors derived an upper bound on the local Rademacher complexity of multiple kernel learning, which yields a tighter excess risk bound than global approaches, and derived consequences regarding excess loss, namely fast convergence rates of the order O(n^{-\frac{\alpha}{1+\alpha}) where α is the minimum eigenvalue decay rate of individual kernels.
Abstract: We derive an upper bound on the local Rademacher complexity of $\ell_p$-norm multiple kernel learning, which yields a tighter excess risk bound than global approaches. Previous local approaches aimed at analyzed the case $p=1$ only while our analysis covers all cases $1\leq p\leq\infty$, assuming the different feature mappings corresponding to the different kernels to be uncorrelated. We also show a lower bound that shows that the bound is tight, and derive consequences regarding excess loss, namely fast convergence rates of the order $O(n^{-\frac{\alpha}{1+\alpha}})$, where $\alpha$ is the minimum eigenvalue decay rate of the individual kernels.

Journal ArticleDOI
TL;DR: In recent years, several methods have been proposed to combine multiple kernels instead of using a single one, but these different kernels may correspond to using different notions of similarity or may...
Abstract: In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may...

Proceedings Article
17 Nov 2011
TL;DR: The unsupervised multiple kernel learning problem is formulated as an optimization task and an efficient alternating optimization algorithm is proposed to solve it and empirical results on both classification and dimension reductions tasks validate the efficacy of the proposed UMKL algorithm.
Abstract: Traditional multiple kernel learning (MKL) algorithms are essentially supervised learning in the sense that the kernel learning task requires the class labels of training data. However, class labels may not always be available prior to the kernel learning task in some real world scenarios, e.g., an early preprocessing step of a classification task or an unsupervised learning task such as dimension reduction. In this paper, we investigate a problem of Unsupervised Multiple Kernel Learning (UMKL), which does not require class labels of training data as needed in a conventional multiple kernel learning task. Since a kernel essentially defines pairwise similarity between any two examples, our unsupervised kernel learning method mainly follows two intuitive principles: (1) a good kernel should allow every example to be well reconstructed from its localized bases weighted by the kernel values; (2) a good kernel should induce kernel values that are coincided with the local geometry of the data. We formulate the unsupervised multiple kernel learning problem as an optimization task and propose an efficient alternating optimization algorithm to solve it. Empirical results on both classification and dimension reductions tasks validate the efficacy of the proposed UMKL algorithm.

Journal ArticleDOI
TL;DR: A weighted multiple kernel learning-based approach for automatic PPI extraction from biomedical literature that uses a weighted linear combination of individual kernels instead of assigning the same weight to each individual kernel, allowing the introduction of each kernel to incrementally contribute to the performance improvement.

Journal ArticleDOI
TL;DR: Machine learning tools aid many Alzheimer's disease-related investigations by enabling multisource data fusion and biomarker identification as well as analysis of functional brain connectivity.
Abstract: Machine learning tools aid many Alzheimer's disease-related investigations by enabling multisource data fusion and biomarker identification as well as analysis of functional brain connectivity.

Proceedings ArticleDOI
21 Mar 2011
TL;DR: A new multi-resolution framework based on the recent multiple kernel algorithm is introduced, combining independent point detection and prior knowledge on the point distribution, which is robust to variable lighting conditions and facial expressions.
Abstract: In this paper we present a robust and accurate method to detect 17 facial landmarks in expressive face images. We introduce a new multi-resolution framework based on the recent multiple kernel algorithm. Low resolution patches carry the global information of the face and give a coarse but robust detection of the desired landmark. High resolution patches, using local details, refine this location. This process is combined with a bootstrap process and a statistical validation, both improving the system robustness. Combining independent point detection and prior knowledge on the point distribution, the proposed detector is robust to variable lighting conditions and facial expressions. This detector is tested on several databases and the results reported can be compared favorably with the current state of the art point detectors.

Journal ArticleDOI
TL;DR: This paper investigates the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains and examines the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise.
Abstract: This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.

Proceedings Article
14 Jun 2011
TL;DR: This work proposes a family of online algorithms able to tackle variants of MKL and group-LASSO, for which regret, convergence, and generalization bounds are shown.
Abstract: Training structured predictors often requires a considerable time selecting features or tweaking the kernel. Multiple kernel learning (MKL) sidesteps this issue by embedding the kernel learning into the training procedure. Despite the recent progress towards efficiency of MKL algorithms, the structured output case remains an open research front. We propose a family of online algorithms able to tackle variants of MKL and group-LASSO, for which we show regret, convergence, and generalization bounds. Experiments on handwriting recognition and dependency parsing attest the success of the approach.

Journal ArticleDOI
TL;DR: A localized multiple kernel learning (L-MKL) algorithm to tackle the issues above and develops a locality gating model to partition the input space of heterogeneous representations to a set of localities of simpler data structure.
Abstract: Realistic human action recognition in videos has been a useful yet challenging task. Video shots of same actions may present huge intra-class variations in terms of visual appearance, kinetic patterns, video shooting, and editing styles. Heterogeneous feature representations of videos pose another challenge on how to effectively handle the redundancy, complementariness and disagreement in these features. This paper proposes a localized multiple kernel learning (L-MKL) algorithm to tackle the issues above. L-MKL integrates the localized classifier ensemble learning and multiple kernel learning in a unified framework to leverage the strengths of both. The basis of L-MKL is to build multiple kernel classifiers on diverse features at subspace localities of heterogeneous representations. L-MKL integrates the discriminability of complementary features locally and enables localized MKL classifiers to deliver better performance in its own region of expertise. Specifically, L-MKL develops a locality gating model to partition the input space of heterogeneous representations to a set of localities of simpler data structure. Each locality then learns its localized optimal combination of Mercer kernels of heterogeneous features. Finally, the gating model coordinates the localized multiple kernel classifiers globally to perform action recognition. Experiments on two datasets show that the proposed approach delivers promising performance.

Journal ArticleDOI
TL;DR: This paper introduces a novel multiple kernel learning algorithm based on active constraints methods that achieves state-of-the art efficiency and proposes some variants of this algorithm that can produce approximate solutions more efficiently.

Journal ArticleDOI
TL;DR: This paper addresses the issue of multiple kernel learning for LS-SVM by formulating it as semidefinite programming (SDP) and shows that the regularization parameter can be optimized in a unified framework with the kernel, which leads to an automatic process for model selection.

Journal ArticleDOI
TL;DR: This paper proposes a multiple-kernel SVM based data mining system where multiple tasks, including feature selection, data fusion, class prediction, decision rule extraction, associated rule extraction and subclass discovery are incorporated in an integrated framework.
Abstract: Gene expression profiling using DNA microarray technique has been shown as a promising tool to improve the diagnosis and treatment of cancer. Recently, many computational methods have been used to discover maker genes, make class prediction and class discovery based on gene expression data of cancer tissue. However, those techniques fall short on some critical areas. These included (a) interpretation of the solution and extracted knowledge. (b) Integrating various sources data and incorporating the prior knowledge into the system. (c) Giving a global understanding of biological complex systems by a complete knowledge discovery framework. This paper proposes a multiple-kernel SVM based data mining system. Multiple tasks, including feature selection, data fusion, class prediction, decision rule extraction, associated rule extraction and subclass discovery, are incorporated in an integrated framework. ALL-AML Leukemia dataset is used to demonstrate the performance of this system.

Journal ArticleDOI
TL;DR: Experimental results show that the model outperforms current state-of-the-art contextual frameworks and reveals individual contributions for each contextual interaction level as well as appearance features, indicating their relative importance for object localization.
Abstract: Recently, many object localization models have shown that incorporating contextual cues can greatly improve accuracy over using appearance features alone. Therefore, many of these models have explored different types of contextual sources, but only considering one level of contextual interaction at the time. Thus, what context could truly contribute to object localization, through integrating cues from all levels, simultaneously, remains an open question. Moreover, the relative importance of the different contextual levels and appearance features across different object classes remains to be explored. Here we introduce a novel framework for multiple class object localization that incorporates different levels of contextual interactions. We study contextual interactions at the pixel, region and object level based upon three different sources of context: semantic, boundary support, and contextual neighborhoods. Our framework learns a single similarity metric from multiple kernels, combining pixel and region interactions with appearance features, and then applies a conditional random field to incorporate object level interactions. To effectively integrate different types of feature descriptions, we extend the large margin nearest neighbor to a novel algorithm that supports multiple kernels. We perform experiments on three challenging image databases: Graz-02, MSRC and PASCAL VOC 2007. Experimental results show that our model outperforms current state-of-the-art contextual frameworks and reveals individual contributions for each contextual interaction level as well as appearance features, indicating their relative importance for object localization.