scispace - formally typeset
Search or ask a question

Showing papers on "Multiple kernel learning published in 2012"


Book ChapterDOI
07 Oct 2012
TL;DR: This paper shows how to analyze the influences of object characteristics on detection performance and the frequency and impact of different types of false positives, and shows that sensitivity to size, localization error, and confusion with similar objects are the most impactful forms of error.
Abstract: This paper shows how to analyze the influences of object characteristics on detection performance and the frequency and impact of different types of false positives. In particular, we examine effects of occlusion, size, aspect ratio, visibility of parts, viewpoint, localization error, and confusion with semantically similar objects, other labeled objects, and background. We analyze two classes of detectors: the Vedaldi et al. multiple kernel learning detector and different versions of the Felzenszwalb et al. detector. Our study shows that sensitivity to size, localization error, and confusion with similar objects are the most impactful forms of error. Our analysis also reveals that many different kinds of improvement are necessary to achieve large gains, making more detailed analysis essential for the progress of recognition research. By making our software and annotations available, we make it effortless for future researchers to perform similar analysis.

573 citations


Journal ArticleDOI
TL;DR: Comprehensive experiments on three domain adaptation data sets demonstrate that DTMKL-based methods outperform existing cross-domain learning and multiple kernel learning methods.
Abstract: Cross-domain learning methods have shown promising results by leveraging labeled patterns from the auxiliary domain to learn a robust classifier for the target domain which has only a limited number of labeled samples. To cope with the considerable change between feature distributions of different domains, we propose a new cross-domain kernel learning framework into which many existing kernel methods can be readily incorporated. Our framework, referred to as Domain Transfer Multiple Kernel Learning (DTMKL), simultaneously learns a kernel function and a robust classifier by minimizing both the structural risk functional and the distribution mismatch between the labeled and unlabeled samples from the auxiliary and target domains. Under the DTMKL framework, we also propose two novel methods by using SVM and prelearned classifiers, respectively. Comprehensive experiments on three domain adaptation data sets (i.e., TRECVID, 20 Newsgroups, and email spam data sets) demonstrate that DTMKL-based methods outperform existing cross-domain learning and multiple kernel learning methods.

562 citations


Journal ArticleDOI
TL;DR: A multiple kernel fuzzy c-means (MKFC) algorithm that is more immune to ineffective kernels and irrelevant features and automatically adjusting the kernel weights, which makes the choice of kernels less crucial.
Abstract: While fuzzy c-means is a popular soft-clustering method, its effectiveness is largely limited to spherical clusters. By applying kernel tricks, the kernel fuzzy c-means algorithm attempts to address this problem by mapping data with nonlinear relationships to appropriate feature spaces. Kernel combination, or selection, is crucial for effective kernel clustering. Unfortunately, for most applications, it is uneasy to find the right combination. We propose a multiple kernel fuzzy c-means (MKFC) algorithm that extends the fuzzy c-means algorithm with a multiple kernel-learning setting. By incorporating multiple kernels and automatically adjusting the kernel weights, MKFC is more immune to ineffective kernels and irrelevant features. This makes the choice of kernels less crucial. In addition, we show multiple kernel k-means to be a special case of MKFC. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed MKFC algorithm.

348 citations


Proceedings ArticleDOI
10 Dec 2012
TL;DR: This work exploits multiple representations for the same set of instances within a clustering framework in terms of given kernel matrices and a weighted combination of the kernels is learned in parallel to the partitioning.
Abstract: Exploiting multiple representations, or views, for the same set of instances within a clustering framework is a popular practice for boosting clustering accuracy. However, some of the available sources may be misleading (due to noise, errors in measurement etc.) in revealing the true structure of the data, thus, their inclusion in the clustering process may have negative influence. This aspect seems to be overlooked in the multi-view literature where all representations are equally considered. In this work, views are expressed in terms of given kernel matrices and a weighted combination of the kernels is learned in parallel to the partitioning. Weights assigned to kernels are indicative of the quality of the corresponding views' information. Additionally, the combination scheme incorporates a parameter that controls the admissible sparsity of the weights to avoid extremes and tailor them to the data. Two efficient iterative algorithms are proposed that alternate between updating the view weights and recomputing the clusters to optimize the intra-cluster variance from different perspectives. The conducted experiments reveal the effectiveness of our methodology compared to other multi-view methods.

233 citations


Journal ArticleDOI
TL;DR: Simulated and real-life data fusion applications are experimentally studied, and the results validate that the proposed algorithm has comparable performance, moreover, it is more efficient on large-scale data sets.
Abstract: This paper presents a novel optimized kernel k-means algorithm (OKKC) to combine multiple data sources for clustering analysis. The algorithm uses an alternating minimization framework to optimize the cluster membership and kernel coefficients as a nonconvex problem. In the proposed algorithm, the problem to optimize the cluster membership and the problem to optimize the kernel coefficients are all based on the same Rayleigh quotient objective; therefore the proposed algorithm converges locally. OKKC has a simpler procedure and lower complexity than other algorithms proposed in the literature. Simulated and real-life data fusion applications are experimentally studied, and the results validate that the proposed algorithm has comparable performance, moreover, it is more efficient on large-scale data sets. (The Matlab implementation of OKKC algorithm is downloadable from http://homes.esat.kuleuven.be/~sistawww/bio/syu/okkc.html.).

210 citations


Posted Content
TL;DR: In this article, a generalized Fisher score was proposed to jointly select features, which maximizes the lower bound of traditional Fisher score by solving a quadratically constrained linear programming (QCLP) problem.
Abstract: Fisher score is one of the most widely used supervised feature selection methods. However, it selects each feature independently according to their scores under the Fisher criterion, which leads to a suboptimal subset of features. In this paper, we present a generalized Fisher score to jointly select features. It aims at finding an subset of features, which maximize the lower bound of traditional Fisher score. The resulting feature selection problem is a mixed integer programming, which can be reformulated as a quadratically constrained linear programming (QCLP). It is solved by cutting plane algorithm, in each iteration of which a multiple kernel learning problem is solved alternatively by multivariate ridge regression and projected gradient descent. Experiments on benchmark data sets indicate that the proposed method outperforms Fisher score as well as many other state-of-the-art feature selection methods.

208 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: This work rigorously analyzes and combines a large set of low-level features that capture appearance, color, motion, audio and audio-visual co-occurrence patterns in videos and exploits multimodal information by analyzing available spoken and videotext content using state-of-the-art automatic speech recognition (ASR) and Videotext recognition systems.
Abstract: Combining multiple low-level visual features is a proven and effective strategy for a range of computer vision tasks. However, limited attention has been paid to combining such features with information from other modalities, such as audio and videotext, for large scale analysis of web videos. In our work, we rigorously analyze and combine a large set of low-level features that capture appearance, color, motion, audio and audio-visual co-occurrence patterns in videos. We also evaluate the utility of high-level (i.e., semantic) visual information obtained from detecting scene, object, and action concepts. Further, we exploit multimodal information by analyzing available spoken and videotext content using state-of-the-art automatic speech recognition (ASR) and videotext recognition systems. We combine these diverse features using a two-step strategy employing multiple kernel learning (MKL) and late score level fusion methods. Based on the TRECVID MED 2011 evaluations for detecting 10 events in a large benchmark set of ∼45000 videos, our system showed the best performance among the 19 international teams.

200 citations


Journal ArticleDOI
TL;DR: This paper addresses the MKL for classification in hyperspectral images by extracting the most variation from the space spanned by multiple kernels and proposes a representative MKL (RMKL) algorithm that greatly reduces the computational load for searching optimal combination of basis kernels.
Abstract: Recently, multiple kernel learning (MKL) methods have been developed to improve the flexibility of kernel-based learning machine. The MKL methods generally focus on determining key kernels to be preserved and their significance in optimal kernel combination. Unfortunately, computational demand of finding the optimal combination is prohibitive when the number of training samples and kernels increase rapidly, particularly for hyperspectral remote sensing data. In this paper, we address the MKL for classification in hyperspectral images by extracting the most variation from the space spanned by multiple kernels and propose a representative MKL (RMKL) algorithm. The core idea embedded in the algorithm is to determine the kernels to be preserved and their weights according to statistical significance instead of time-consuming search for optimal kernel combination. The noticeable merits of RMKL consist that it greatly reduces the computational load for searching optimal combination of basis kernels and has no limitation from strict selection of basis kernels like most MKL algorithms do; meanwhile, RMKL keeps excellent properties of MKL in terms of both good classification accuracy and interpretability. Experiments are conducted on different real hyperspectral data, and the corresponding experimental results show that RMKL algorithm provides the best performances to date among several the state-of-the-art algorithms while demonstrating satisfactory computational efficiency.

186 citations


Journal ArticleDOI
TL;DR: The goal is to establish oracle inequalities for the excess risk of the resulting prediction rule showing that the method is adaptive both to the unknown design distribution and to the sparsity of the problem.
Abstract: The problem of multiple kernel learning based on penalized empirical risk minimization is discussed. The complexity penalty is determined jointly by the empirical $L_2$ norms and the reproducing kernel Hilbert space (RKHS) norms induced by the kernels with a data-driven choice of regularization parameters. The main focus is on the case when the total number of kernels is large, but only a relatively small number of them is needed to represent the target function, so that the problem is sparse. The goal is to establish oracle inequalities for the excess risk of the resulting prediction rule showing that the method is adaptive both to the unknown design distribution and to the sparsity of the problem.

147 citations


Journal ArticleDOI
TL;DR: A framework with ensemble techniques is presented for customer churn prediction directly using longitudinal behavioral data and a novel approach called the hierarchical multiple kernel support vector machine (H-MK-SVM) is formulated.

129 citations


Proceedings ArticleDOI
19 Oct 2012
TL;DR: This paper constructs a classification framework that is able to incorporate both static and dynamic views into a unified framework in the hopes that, while a malicious executable can disguise itself in some views, disguising itself in every view while maintaining malicious intent will prove to be substantially more difficult.
Abstract: Malware classification systems have typically used some machine learning algorithm in conjunction with either static or dynamic features collected from the binary. Recently, more advanced malware has introduced mechanisms to avoid detection in these views by using obfuscation techniques to avoid static detection and execution-stalling techniques to avoid dynamic detection. In this paper we construct a classification framework that is able to incorporate both static and dynamic views into a unified framework in the hopes that, while a malicious executable can disguise itself in some views, disguising itself in every view while maintaining malicious intent will prove to be substantially more difficult. Our method uses kernels to place a similarity metric on each distinct view and then employs multiple kernel learning to find a weighted combination of the data sources which yields the best classification accuracy in a support vector machine classifier. Our approach opens up new avenues of malware research which will allow the research community to elegantly look at multiple facets of malware simultaneously, and which can easily be extended to integrate any new data sources that may become popular in the future.

Proceedings ArticleDOI
12 Aug 2012
TL;DR: A Spectral Projected Gradient descent optimizer is developed which takes into account second order information in selecting step sizes, employs a non-monotone step size selection criterion requiring fewer function evaluations, is robust to gradient noise, and can take quick steps when far away from the optimum.
Abstract: Multiple Kernel Learning (MKL) aims to learn the kernel in an SVM from training data. Many MKL formulations have been proposed and some have proved effective in certain applications. Nevertheless, as MKL is a nascent field, many more formulations need to be developed to generalize across domains and meet the challenges of real world applications. However, each MKL formulation typically necessitates the development of a specialized optimization algorithm. The lack of an efficient, general purpose optimizer capable of handling a wide range of formulations presents a significant challenge to those looking to take MKL out of the lab and into the real world.This problem was somewhat alleviated by the development of the Generalized Multiple Kernel Learning (GMKL) formulation which admits fairly general kernel parameterizations and regularizers subject to mild constraints. However, the projected gradient descent GMKL optimizer is inefficient as the computation of the step size and a reasonably accurate objective function value or gradient direction are all expensive. We overcome these limitations by developing a Spectral Projected Gradient (SPG) descent optimizer which: a) takes into account second order information in selecting step sizes; b) employs a non-monotone step size selection criterion requiring fewer function evaluations; c) is robust to gradient noise, and d) can take quick steps when far away from the optimum.We show that our proposed SPG-GMKL optimizer can be an order of magnitude faster than projected gradient descent on even small and medium sized datasets. In some cases, SPG-GMKL can even outperform state-of-the-art specialized optimization algorithms developed for a single MKL formulation. Furthermore, we demonstrate that SPG-GMKL can scale well beyond gradient descent to large problems involving a million kernels or half a million data points. Our code and implementation are available publically.

Journal ArticleDOI
TL;DR: The proposed GL-MKL determines the optimal base kernels, including the associated weights and kernel parameters, and results in improved recognition performance, and can be extended to address heterogeneous variable selection problems.
Abstract: We propose a novel multiple kernel learning (MKL) algorithm with a group lasso regularizer, called group lasso regularized MKL (GL-MKL), for heterogeneous feature fusion and variable selection. For problems of feature fusion, assigning a group of base kernels for each feature type in an MKL framework provides a robust way in fitting data extracted from different feature domains. Adding a mixed norm constraint (i.e., group lasso) as the regularizer, we can enforce the sparsity at the group/feature level and automatically learn a compact feature set for recognition purposes. More precisely, our GL-MKL determines the optimal base kernels, including the associated weights and kernel parameters, and results in improved recognition performance. Besides, our GL-MKL can also be extended to address heterogeneous variable selection problems. For such problems, we aim to select a compact set of variables (i.e., feature attributes) for comparable or improved performance. Our proposed method does not need to exhaustively search for the entire variable space like prior sequential-based variable selection methods did, and we do not require any prior knowledge on the optimal size of the variable subset either. To verify the effectiveness and robustness of our GL-MKL, we conduct experiments on video and image datasets for heterogeneous feature fusion, and perform variable selection on various UCI datasets.

Journal ArticleDOI
TL;DR: This paper proposes a novel action representation method which differs significantly from the existing interest point based representation in that only the global distribution information of interest points is exploited and holistic features from clouds ofinterest points accumulated over multiple temporal scales are extracted.

Proceedings Article
09 Jul 2012
TL;DR: Experimental results show that the fusion of the HOG+Haar with GMKL outperforms the other three classification schemes and Generalized Multiple Kernel Learning (GMKL) that can learn the trade-off between HOG and Haar descriptors by constructing an optimal kernel with many base kernels.
Abstract: Vehicle detection in wide area motion imagery (WAMI) is an important problem in computer science, which if solved, supports urban traffic management, emergency responder routing, and accident discovery Due to large amount of camera motion, the small number of pixels on target objects, and the low frame rate of the WAMI data, vehicle detection is much more challenging than the task in traditional video imagery Since the object in wide area imagery covers a few pixels, feature information of shape, texture, and appearance information are limited for vehicle detection and classification performance Histogram of Gradients (HOG) and Haar descriptors have been used in human and face detection successfully, only using the intensity of an image, and HOG and Haar descriptors have different advantages In this paper, we propose a classification scheme which combines HOG and Haar descriptors by using Generalized Multiple Kernel Learning (GMKL) that can learn the trade-off between HOG and Haar descriptors by constructing an optimal kernel with many base kernels Due to the large number of Haar features, we first use a cascade of boosting classifier which is a variant of Gentle AdaBoost and has the ability to do feature selection to select a small number of features from a huge feature set Then, we combine the HOG descriptors and the selected Haar features and use GMKL to train the final classifier In our experiments, we evaluate the performance of HOG+Haar with GMKL, HOG with GMKL, Haar with GMKL, and also the cascaded boosting classifier on Columbus Large Image Format (CLIF) dataset Experimental results show that the fusion of the HOG+Haar with GMKL outperforms the other three classification schemes

Posted Content
Mehmet Gönen1
TL;DR: This paper proposed a fully conjugate Bayesian formulation and derived a deterministic variational approximation, which allows to combine hundreds or thousands of kernels very efficiently, requiring less than a minute.
Abstract: Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is focused on the computational efficiency issue. However, it is still not feasible to combine many kernels using existing Bayesian approaches due to their high time complexity. We propose a fully conjugate Bayesian formulation and derive a deterministic variational approximation, which allows us to combine hundreds or thousands of kernels very efficiently. We briefly explain how the proposed method can be extended for multiclass learning and semi-supervised learning. Experiments with large numbers of kernels on benchmark data sets show that our inference method is quite fast, requiring less than a minute. On one bioinformatics and three image recognition data sets, our method outperforms previously reported results with better generalization performance.

Journal Article
TL;DR: In this paper, a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints is presented, which can be applied to standard squared-norm regularization, the Lasso, the group Lasso and other regularization schemes.
Abstract: We present a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints. The bound can be applied to standard squared-norm regularization, the Lasso, the group Lasso, some versions of the group Lasso with overlapping groups, multiple kernel learning and other regularization schemes. In all these cases competitive results are obtained. A novel feature of our bound is that it can be applied in an infinite dimensional setting such as the Lasso in a separable Hilbert space or multiple kernel learning with a countable number of kernels.

Book ChapterDOI
01 Oct 2012
TL;DR: This paper shows that in a typical surgical training setup, video data can be equally discriminative and proposes and evaluates three approaches to surgical gesture classification from video.
Abstract: Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on kinematic and dynamic cues, such as time to completion, speed, forces, torque, or robot trajectories. In this paper we show that in a typical surgical training setup, video data can be equally discriminative. To that end, we propose and evaluate three approaches to surgical gesture classification from video. In the first one, we model each video clip from each surgical gesture as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words and use a bag-of-features (BoF) approach to classify new video clips. In the third approach, we use multiple kernel learning to combine the LDS and BoF approaches. Our experiments show that methods based on video data perform equally well as the state-of-the-art approaches based on kinematic data.

Journal ArticleDOI
TL;DR: This letter proposes to use spatiotemporal monogenic binary patterns to describe both appearance and motion information of the dynamic sequences to perform better than the state-of-the-art methods, and are robust to illumination variations.
Abstract: Feature representation is an important research topic in facial expression recognition from video sequences. In this letter, we propose to use spatiotemporal monogenic binary patterns to describe both appearance and motion information of the dynamic sequences. Firstly, we use monogenic signals analysis to extract the magnitude, the real picture and the imaginary picture of the orientation of each frame, since the magnitude can provide much appearance information and the orientation can provide complementary information. Secondly, the phase-quadrant encoding method and the local bit exclusive operator are utilized to encode the real and imaginary pictures from orientation in three orthogonal planes, and the local binary pattern operator is used to capture the texture and motion information from the magnitude through three orthogonal planes. Finally, both concatenation method and multiple kernel learning method are respectively exploited to handle the feature fusion. The experimental results on the Extended Cohn-Kanade and Oulu-CASIA facial expression databases demonstrate that the proposed methods perform better than the state-of-the-art methods, and are robust to illumination variations.

Book ChapterDOI
07 Oct 2012
TL;DR: It is shown that a scalable optimization process in the Fourier domain can be used to identify the different frequency bands that are useful for prediction on training data and recover efficient and scalable linear reformulations for both single and multiple kernel learning.
Abstract: Approximations based on random Fourier embeddings have recently emerged as an efficient and formally consistent methodology to design large-scale kernel machines [23]. By expressing the kernel as a Fourier expansion, features are generated based on a finite set of random basis projections, sampled from the Fourier transform of the kernel, with inner products that are Monte Carlo approximations of the original non-linear model. Based on the observation that different kernel-induced Fourier sampling distributions correspond to different kernel parameters, we show that a scalable optimization process in the Fourier domain can be used to identify the different frequency bands that are useful for prediction on training data. This approach allows us to design a family of linear prediction models where we can learn the hyper-parameters of the kernel together with the weights of the feature vectors jointly. Under this methodology, we recover efficient and scalable linear reformulations for both single and multiple kernel learning. Experiments show that our linear models produce fast and accurate predictors for complex datasets such as the Visual Object Challenge 2011 and ImageNet ILSVRC 2011.

Book ChapterDOI
07 Oct 2012
TL;DR: This work proposes a new learning algorithm, called Generalized Adaptive lp-norm Multiple Kernel Learning (GA-MKL), to learn an adapted robust classifier based on multiple base kernels constructed from image features and multiple sets of pre-learned classifiers of all the classes.
Abstract: We introduce a new framework for image classification that extends beyond the window sampling of fixed spatial pyramids to include a comprehensive set of windows densely sampled over location, size and aspect ratio. To effectively deal with this large set of windows, we derive a concise high-level image feature using a two-level extraction method. At the first level, window-based features are computed from local descriptors (e.g., SIFT, spatial HOG, LBP) in a process similar to standard feature extractors. Then at the second level, the new image feature is determined from the window-based features in a manner analogous to the first level. This higher level of abstraction offers both efficient handling of dense samples and reduced sensitivity to misalignment. More importantly, our simple yet effective framework can readily accommodate a large number of existing pooling/coding methods, allowing them to extract features beyond the spatial pyramid representation. To effectively fuse the second level feature with a standard first level image feature for classification, we additionally propose a new learning algorithm, called Generalized Adaptive lp-norm Multiple Kernel Learning (GA-MKL), to learn an adapted robust classifier based on multiple base kernels constructed from image features and multiple sets of pre-learned classifiers of all the classes. Extensive evaluation on the object recognition (Caltech256) and scene recognition (15Scenes) benchmark datasets demonstrates that the proposed method outperforms state-of-the-art image classification algorithms under a broad range of settings.

Journal ArticleDOI
TL;DR: The experimental results indicate that multiple kernel SVM is a kind of highly competitive data-driven modeling method for the blast furnace system and can provide reliable indication for blast furnace operators to take control actions.
Abstract: This paper constructs the framework of the reproducing kernel Hilbert space for multiple kernel learning, which provides clear insights into the reason that multiple kernel support vector machines (SVM) outperform single kernel SVM. These results can serve as a fundamental guide to account for the superiority of multiple kernel to single kernel learning. Subsequently, the constructed multiple kernel learning algorithms are applied to model a nonlinear blast furnace system only based on its input-output signals. The experimental results not only confirm the superiority of multiple kernel learning algorithms, but also indicate that multiple kernel SVM is a kind of highly competitive data-driven modeling method for the blast furnace system and can provide reliable indication for blast furnace operators to take control actions.

Journal ArticleDOI
01 Jul 2012
TL;DR: A novel effective method to classify the wrist pulse blood flow signals by using the multiple kernel learning (MKL) algorithm to combine multiple types of features to further enhance the classification performance.
Abstract: Wrist pulse signal is of great importance in the analysis of the health status and pathologic changes of a person. A number of feature extraction methods have been proposed to extract linear and nonlinear, and time and frequency features of wrist pulse signal. These features are heterogeneous in nature and are likely to contain complementary information, which highlights the need for the integration of heterogeneous features for pulse classification and diagnosis. In this paper, we propose a novel effective method to classify the wrist pulse blood flow signals by using the multiple kernel learning (MKL) algorithm to combine multiple types of features. In the proposed method, seven types of features are first extracted from the wrist pulse blood flow signals using the state-of-the-art pulse feature extraction methods, and are then fed to an efficient MKL method, SimpleMKL, to combine heterogeneous features for more effective classification. Experimental results show that the proposed method is promising in integrating multiple types of pulse features to further enhance the classification performance.

Proceedings Article
01 Nov 2012
TL;DR: This paper proposes a fully automatic approach for person-independent 3D facial expression recognition that outperforms most of the state-of-the-art ones by using the SimpleMKL algorithm with the chi-square kernel.
Abstract: In this paper, we propose a fully automatic approach for person-independent 3D facial expression recognition. In order to extract discriminative expression features, each aligned 3D facial surface is compactly represented as multiple global histograms of local normal patterns from multiple normal components and multiple binary encoding scales, namely Multi-Scale Local Normal Patterns (MS-LNPs). 3D facial expression recognition is finally carried out by modeling multiple kernel learning (MKL) to efficiently embed and combine these histogram based features. By using the SimpleMKL algorithm with the chi-square kernel, we achieved an average recognition rate of 80.14% based on a fair experimental setup. To the best of our knowledge, our method outperforms most of the state-of-the-art ones.

Journal ArticleDOI
TL;DR: The multiple kernel version of Transductive SVM (a cluster assumption based approach) is proposed and it is solved based on DC (Difference of Convex functions) programming.

Journal Article
TL;DR: This work presents a MKL optimization algorithm based on stochastic gradient descent that has a guaranteed convergence rate and introduces a p-norm formulation of MKL that controls the level of sparsity of the solution, leading to an easier optimization problem.
Abstract: In recent years there has been a lot of interest in designing principled classification algorithms over multiple cues, based on the intuitive notion that using more features should lead to better performance. In the domain of kernel methods, a principled way to use multiple features is the Multi Kernel Learning (MKL) approach. Here we present a MKL optimization algorithm based on stochastic gradient descent that has a guaranteed convergence rate. We directly solve the MKL problem in the primal formulation. By having a p-norm formulation of MKL, we introduce a parameter that controls the level of sparsity of the solution, while leading to an easier optimization problem. We prove theoretically and experimentally that 1) our algorithm has a faster convergence rate as the number of kernels grows; 2) the training complexity is linear in the number of training examples; 3) very few iterations are sufficient to reach good solutions. Experiments on standard benchmark databases support our claims.

Proceedings Article
26 Jun 2012
TL;DR: A fully conjugate Bayesian formulation is proposed and derived, which allows us to combine hundreds or thousands of kernels very efficiently and can be extended for multiclass learning and semi-supervised learning.
Abstract: Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is focused on the computational efficiency issue. However, it is still not feasible to combine many kernels using existing Bayesian approaches due to their high time complexity. We propose a fully conjugate Bayesian formulation and derive a deterministic variational approximation, which allows us to combine hundreds or thousands of kernels very efficiently. We briefly explain how the proposed method can be extended for multiclass learning and semi-supervised learning. Experiments with large numbers of kernels on benchmark data sets show that our inference method is quite fast, requiring less than a minute. On one bioinformatics and three image recognition data sets, our method outperforms previously reported results with better generalization performance.

Posted Content
TL;DR: This paper shows that Multiple Kernel Learning can be framed as a standard binary classification problem with additional constraints that ensure the positive definiteness of the learned kernel.
Abstract: With the advent of kernel methods, automating the task of specifying a suitable kernel has become increasingly important. In this context, the Multiple Kernel Learning (MKL) problem of finding a combination of pre-specified base kernels that is suitable for the task at hand has received significant attention from researchers. In this paper we show that Multiple Kernel Learning can be framed as a standard binary classification problem with additional constraints that ensure the positive definiteness of the learned kernel. Framing MKL in this way has the distinct advantage that it makes it easy to leverage the extensive research in binary classification to develop better performing and more scalable MKL algorithms that are conceptually simpler, and, arguably, more accessible to practitioners. Experiments on nine data sets from different domains show that, despite its simplicity, the proposed technique compares favorably with current leading MKL approaches.

Journal Article
TL;DR: A non-sparse version of MK-FDA is proposed, which imposes a general lp norm regularisation on the kernel weights, and it is demonstrated that lp MK-fDA improves upon sparse MK- FDA in many practical situations and tends to outperform its SVM counterpart.
Abstract: Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general lp norm regularisation on the kernel weights. We formulate the associated optimisation problem as a semi-infinite program (SIP), and adapt an iterative wrapper algorithm to solve it. We then discuss, in light of latest advances in MKL optimisation techniques, several reformulations and optimisation strategies that can potentially lead to significant improvements in the efficiency and scalability of MK-FDA. We carry out extensive experiments on six datasets from various application areas, and compare closely the performance of lp MK-FDA, fixed norm MK-FDA, and several variants of SVM-based MKL (MK-SVM). Our results demonstrate that lp MK-FDA improves upon sparse MK-FDA in many practical situations. The results also show that on image categorisation problems, lp MK-FDA tends to outperform its SVM counterpart. Finally, we also discuss the connection between (MK-)FDA and (MK-)SVM, under the unified framework of regularised kernel machines.

Journal ArticleDOI
Jingjing Yang1, Yonghong Tian1, Ling-Yu Duan1, Tiejun Huang1, Wen Gao1 
TL;DR: A group-sensitive multiple kernel learning (GS-MKL) method is proposed for object recognition to accommodate the intraclass diversity and the interclass correlation and has achieved encouraging performance comparable to the state-of-the-art and outperformed several existing MKL methods.
Abstract: In this paper, a group-sensitive multiple kernel learning (GS-MKL) method is proposed for object recognition to accommodate the intraclass diversity and the interclass correlation By introducing the “group” between the object category and individual images as an intermediate representation, GS-MKL attempts to learn group-sensitive multikernel combinations together with the associated classifier For each object category, the image corpus from the same category is partitioned into groups Images with similar appearance are partitioned into the same group, which corresponds to the subcategory of the object category Accordingly, intraclass diversity can be represented by the set of groups from the same category but with diverse appearances; interclass correlation can be represented by the correlation between groups from different categories GS-MKL provides a tractable solution to adapt multikernel combination to local data distribution and to seek a tradeoff between capturing the diversity and keeping the invariance for each object category Different from the simple hybrid grouping strategy that solves sample grouping and GS-MKL training independently, two sample grouping strategies are proposed to integrate sample grouping and GS-MKL training The first one is a looping hybrid grouping method, where a global kernel clustering method and GS-MKL interact with each other by sharing group-sensitive multikernel combination The second one is a dynamic divisive grouping method, where a hierarchical kernel-based grouping process interacts with GS-MKL Experimental results show that performance of GS-MKL does not significantly vary with different grouping strategies, but the looping hybrid grouping method produces slightly better results On four challenging data sets, our proposed method has achieved encouraging performance comparable to the state-of-the-art and outperformed several existing MKL methods