scispace - formally typeset
Search or ask a question

Showing papers on "Multiple kernel learning published in 2018"


Journal ArticleDOI
TL;DR: The proposed MKL with ANFIS based deep learning method follows two-fold approach and has produced high sensitivity, high specificity and less Mean Square Error for the for the KEGG Metabolic Reaction Network dataset.
Abstract: Multiple Kernel Learning with Adaptive Neuro-Fuzzy Inference System (MKL with ANFIS) based deep learning method is proposed in this paper for heart disease diagnosis. The proposed MKL with ANFIS based deep learning method follows two-fold approach. MKL method is used to divide parameters between heart disease patients and normal individuals. The result obtained from the MKL method is given to the ANFIS classifier to classify the heart disease and healthy patients. Sensitivity, Specificity and Mean Square Error (MSE) are calculated to evaluate the proposed MKL with ANFIS method. The proposed MKL with ANFIS is also compared with various existing deep learning methods such as Least Square with Support Vector Machine (LS with SVM), General Discriminant Analysis and Least Square Support Vector Machine (GDA with LS-SVM), Principal Component Analysis with Adaptive Neuro-Fuzzy Inference System (PCA with ANFIS) and Latent Dirichlet Allocation with Adaptive Neuro-Fuzzy Inference System (LDA with ANFIS). The results from the proposed MKL with ANFIS method has produced high sensitivity (98%), high specificity (99%) and less Mean Square Error (0.01) for the for the KEGG Metabolic Reaction Network dataset.

195 citations


Journal ArticleDOI
TL;DR: A multiscale deep feature learning method for high-resolution satellite image scene classification by warp the original satellite image into multiple different scales and developing a multiple kernel learning method to automatically learn the optimal combination of such features.
Abstract: In this paper, we propose a multiscale deep feature learning method for high-resolution satellite image scene classification. Specifically, we first warp the original satellite image into multiple different scales. The images in each scale are employed to train a deep convolutional neural network (DCNN). However, simultaneously training multiple DCNNs is time-consuming. To address this issue, we explore DCNN with spatial pyramid pooling (SPP-net). Since different SPP-nets have the same number of parameters, which share the identical initial values, and only fine-tuning the parameters in fully connected layers ensures the effectiveness of each network, thereby greatly accelerating the training process. Then, the multiscale satellite images are fed into their corresponding SPP-nets, respectively, to extract multiscale deep features. Finally, a multiple kernel learning method is developed to automatically learn the optimal combination of such features. Experiments on two difficult data sets show that the proposed method achieves favorable performance compared with other state-of-the-art methods.

154 citations


Journal ArticleDOI
TL;DR: A multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis and is able to retrieve previous findings in a single kernel PCA as well as provide a new image of the sample structures when a larger number of datasets are included in the analysis.
Abstract: Motivation Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account. Results We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. We applied our framework to analyse two public multi-omics datasets. First, the multiple metagenomic datasets, collected during the TARA Oceans expedition, was explored to demonstrate that our method is able to retrieve previous findings in a single kernel PCA as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis. To perform this analysis, a generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. Second, the multi-omics breast cancer datasets, provided by The Cancer Genome Atlas, is analysed using a kernel Self-Organizing Maps with both single and multi-omics strategies. The comparison of these two approaches demonstrates the benefit of our integration method to improve the representation of the studied biological system. Availability and implementation Proposed methods are available in the R package mixKernel, released on CRAN. It is fully compatible with the mixOmics package and a tutorial describing the approach can be found on mixOmics web site http://mixomics.org/mixkernel/. Contact jerome.mariette@inra.fr or nathalie.villa-vialaneix@inra.fr. Supplementary information Supplementary data are available at Bioinformatics online.

77 citations


Proceedings Article
29 Apr 2018
TL;DR: In this article, the authors proposed to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters, and transform the candidate solution into a new one that better approximates the discrete one.
Abstract: Spectral clustering has found extensive use in many areas. Most traditional spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretizing the learned labels by k-means clustering. Such common practice has two potential flaws, which may lead to severe information loss and performance degradation. First, predefined similarity graph might not be optimal for subsequent clustering. It is well-accepted that similarity graph highly affects the clustering results. To this end, we propose to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters. Second, the discrete solution may deviate from the spectral solution since k-means method is well-known as sensitive to the initialization of cluster centers. In this work, we transform the candidate solution into a new one that better approximates the discrete one. Finally, those three subtasks are integrated into a unified framework, with each subtask iteratively boosted by using the results of the others towards an overall optimal solution. It is known that the performance of a kernel method is largely determined by the choice of kernels. To tackle this practical problem of how to select the most suitable kernel for a particular data set, we further extend our model to incorporate multiple kernel learning ability. Extensive experiments demonstrate the superiority of our proposed method as compared to existing clustering approaches.

73 citations


Proceedings ArticleDOI
01 Jun 2018
TL;DR: In this paper, the authors proposed a novel multiple kernel learning (MKL) framework by following two intuitive assumptions: (i) each kernel is a perturbation of the consensus kernel; and (ii) the kernel that is close to the consensus kernels should be assigned a large weight.
Abstract: Multiple kernel learning (MKL) method is generally believed to perform better than single kernel method. However, some empirical studies show that this is not always true: the combination of multiple kernels may even yield an even worse performance than using a single kernel. There are two possible reasons for the failure: (i) most existing MKL methods assume that the optimal kernel is a linear combination of base kernels, which may not hold true; and (ii) some kernel weights are inappropriately assigned due to noises and carelessly designed algorithms. In this paper, we propose a novel MKL framework by following two intuitive assumptions: (i) each kernel is a perturbation of the consensus kernel; and (ii) the kernel that is close to the consensus kernel should be assigned a large weight. Impressively, the proposed method can automatically assign an appropriate weight to each kernel without introducing additional parameters, as existing methods do. The proposed framework is integrated into a unified framework for graph-based clustering and semi-supervised classification. We have conducted experiments on multiple benchmark datasets and our empirical results verify the superiority of the proposed framework.

70 citations


Journal ArticleDOI
TL;DR: A novel approach of extracting ROI features and interregional features based on multiple measures from MRI images to distinguish AD, MCI (including MCIc and MCInc), and health control, which outperforms some state-of-the-art methods in AD classification.
Abstract: Several anatomical magnetic resonance imaging (MRI) markers for Alzheimer's disease (AD) have been identified. Cortical gray matter volume, cortical thickness, and subcortical volume have been used successfully to assist the diagnosis of Alzheimer's disease including its early warning and developing stages, e.g., mild cognitive impairment (MCI) including MCI converted to AD (MCIc) and MCI not converted to AD (MCInc). Currently, these anatomical MRI measures have mainly been used separately. Thus, the full potential of anatomical MRI scans for AD diagnosis might not yet have been used optimally. Meanwhile, most studies currently only focused on morphological features of regions of interest (ROIs) or interregional features without considering the combination of them. To further improve the diagnosis of AD, we propose a novel approach of extracting ROI features and interregional features based on multiple measures from MRI images to distinguish AD, MCI (including MCIc and MCInc), and health control (HC). First, we construct six individual networks based on six different anatomical measures (i.e., CGMV, CT, CSA, CC, CFI, and SV) and Automated Anatomical Labeling (AAL) atlas for each subject. Then, for each individual network, we extract all node (ROI) features and edge (interregional) features, and denoted as node feature set and edge feature set, respectively. Therefore, we can obtain six node feature sets and six edge feature sets from six different anatomical measures. Next, each feature within a feature set is ranked by $F$ -score in descending order, and the top ranked features of each feature set are applied to MKBoost algorithm to obtain the best classification accuracy. After obtaining the best classification accuracy, we can get the optimal feature subset and the corresponding classifier for each node or edge feature set. Afterwards, to investigate the classification performance with only node features, we proposed a weighted multiple kernel learning (wMKL) framework to combine these six optimal node feature subsets, and obtain a combined classifier to perform AD classification. Similarly, we can obtain the classification performance with only edge features. Finally, we combine both six optimal node feature subsets and six optimal edge feature subsets to further improve the classification performance. Experimental results show that the proposed method outperforms some state-of-the-art methods in AD classification, and demonstrate that different measures contain complementary information.

62 citations


Journal ArticleDOI
TL;DR: This work uses a sparse version of Multiple Kernel Learning (MKL) to simultaneously learn the contribution of each brain region, previously defined by an atlas, to the decision function and shows how this can lead to improved overall generalisation performance.
Abstract: Pattern recognition models have been increasingly applied to neuroimaging data over the last two decades. These applications have ranged from cognitive neuroscience to clinical problems. A common limitation of these approaches is that they do not incorporate previous knowledge about the brain structure and function into the models. Previous knowledge can be embedded into pattern recognition models by imposing a grouping structure based on anatomically or functionally defined brain regions. In this work, we present a novel approach that uses group sparsity to model the whole brain multivariate pattern as a combination of regional patterns. More specifically, we use a sparse version of Multiple Kernel Learning (MKL) to simultaneously learn the contribution of each brain region, previously defined by an atlas, to the decision function. Our application of MKL provides two beneficial features: (1) it can lead to improved overall generalisation performance when the grouping structure imposed by the atlas is consistent with the data; (2) it can identify a subset of relevant brain regions for the predictive model. In order to investigate the effect of the grouping in the proposed MKL approach we compared the results of three different atlases using three different datasets. The method has been implemented in the new version of the open-source Pattern Recognition for Neuroimaging Toolbox (PRoNTo).

58 citations


Journal ArticleDOI
01 Jul 2018
TL;DR: P pairwiseMKL is introduced, the first method for time‐ and memory‐efficient learning with multiple pairwise kernels that provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem.
Abstract: Motivation Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem. Availability and implementation Code is available at https://github.com/aalto-ics-kepaco. Supplementary information Supplementary data are available at Bioinformatics online.

55 citations


Journal ArticleDOI
TL;DR: This paper fully considers the internal correlation between feature space and label space while fusing kernelized information from respective spaces and constructs a robust multi-label kernelized fuzzy rough set model, called RMFRS in this paper.

43 citations


Journal ArticleDOI
TL;DR: The kernelized online imbalanced learning (KOIL) algorithm is proposed, which produces a nonlinear classifier for the data by maximizing the AUC score while minimizing a functional regularizer.
Abstract: Classifying binary imbalanced streaming data is a significant task in both machine learning and data mining. Previously, online area under the receiver operating characteristic (ROC) curve (AUC) maximization has been proposed to seek a linear classifier. However, it is not well suited for handling nonlinearity and heterogeneity of the data. In this paper, we propose the kernelized online imbalanced learning (KOIL) algorithm, which produces a nonlinear classifier for the data by maximizing the AUC score while minimizing a functional regularizer. We address four major challenges that arise from our approach. First, to control the number of support vectors without sacrificing the model performance, we introduce two buffers with fixed budgets to capture the global information on the decision boundary by storing the corresponding learned support vectors. Second, to restrict the fluctuation of the learned decision function and achieve smooth updating, we confine the influence on a new support vector to its $k$ -nearest opposite support vectors. Third, to avoid information loss, we propose an effective compensation scheme after the replacement is conducted when either buffer is full. With such a compensation scheme, the performance of the learned model is comparable to the one learned with infinite budgets. Fourth, to determine good kernels for data similarity representation, we exploit the multiple kernel learning framework to automatically learn a set of kernels. Extensive experiments on both synthetic and real-world benchmark data sets demonstrate the efficacy of our proposed approach.

41 citations


Journal ArticleDOI
TL;DR: This paper presents multiple kernel learning (MKL) in the context of remote sensing (RS) image classification problems by illustrating main characteristics of different MKL algorithms and analyzing their properties in RS domain.
Abstract: This paper presents multiple kernel learning (MKL) in the context of remote sensing (RS) image classification problems by illustrating main characteristics of different MKL algorithms and analyzing their properties in RS domain. A categorization of different MKL algorithms is initially introduced, and some promising MKL algorithms for each category are presented. In particular, MKL algorithms presented only in machine learning are introduced in RS. Then, the investigated MKL algorithms are theoretically compared in terms of their: 1) computational complexities; 2) accuracy with different qualities of kernels; and 3) accuracy with different numbers of kernels. After the theoretical comparison, experimental analyses are carried out to compare different MKL algorithms in terms of: 1) model selection and 2) feature fusion problems. On the basis of the theoretical and experimental analyses of MKL algorithms, some guidelines for a proper selection of the MKL algorithms are derived.

Journal ArticleDOI
TL;DR: A new exemplar-based multi-view domain generalization (EMVDG) framework for visual recognition by learning robust classifier that are able to generalize well to arbitrary target domain based on the training samples with multiple types of features, inspired by multiple kernel learning.
Abstract: In this paper, we propose a new exemplar-based multi-view domain generalization (EMVDG) framework for visual recognition by learning robust classifier that are able to generalize well to arbitrary target domain based on the training samples with multiple types of features (i.e., multi-view features). In this framework, we aim to address two issues simultaneously. First, the distribution of training samples (i.e., the source domain) is often considerably different from that of testing samples (i.e., the target domain), so the performance of the classifiers learnt on the source domain may drop significantly on the target domain. Moreover, the testing data are often unseen during the training procedure. Second, when the training data are associated with multi-view features, the recognition performance can be further improved by exploiting the relation among multiple types of features. To address the first issue, considering that it has been shown that fusing multiple SVM classifiers can enhance the domain generalization ability, we build our EMVDG framework upon exemplar SVMs (ESVMs), in which a set of ESVM classifiers are learnt with each one trained based on one positive training sample and all the negative training samples. When the source domain contains multiple latent domains, the learnt ESVM classifiers are expected to be grouped into multiple clusters. To address the second issue, we propose two approaches under the EMVDG framework based on the consensus principle and the complementary principle, respectively. Specifically, we propose an EMVDG_CO method by adding a co-regularizer to enforce the cluster structures of ESVM classifiers on different views to be consistent based on the consensus principle. Inspired by multiple kernel learning, we also propose another EMVDG_MK method by fusing the ESVM classifiers from different views based on the complementary principle. In addition, we further extend our EMVDG framework to exemplar-based multi-view domain adaptation (EMVDA) framework when the unlabeled target domain data are available during the training procedure. The effectiveness of our EMVDG and EMVDA frameworks for visual recognition is clearly demonstrated by comprehensive experiments on three benchmark data sets.

Journal ArticleDOI
TL;DR: This article proposes a novel and effective approach to FER using multi-model two-dimensional and 3D videos, which encodes both static and dynamic clues by scattering convolution network, and adopts Multiple Kernel Learning to combine the features in the 2D and3D modalities and compute similarities to predict the expression label.
Abstract: Facial Expression Recognition (FER) is one of the most important topics in the domain of computer vision and pattern recognition, and it has attracted increasing attention for its scientific challenges and application potentials. In this article, we propose a novel and effective approach to FER using multi-model two-dimensional (2D) and 3D videos, which encodes both static and dynamic clues by scattering convolution network. First, a shape-based detection method is introduced to locate the start and the end of an expression in videos; segment its onset, apex, and offset states; and sample the important frames for emotion analysis. Second, the frames in Apex of 2D videos are represented by scattering, conveying static texture details. Those of 3D videos are processed in a similar way, but to highlight static shape details, several geometric maps in terms of multiple order differential quantities, i.e., Normal Maps and Shape Index Maps, are generated as the input of scattering, instead of original smooth facial surfaces. Third, the average of neighboring samples centred at each key texture frame or shape map in Onset is computed, and the scattering features extracted from all the average samples of 2D and 3D videos are then concatenated to capture dynamic texture and shape cues, respectively. Finally, Multiple Kernel Learning is adopted to combine the features in the 2D and 3D modalities and compute similarities to predict the expression label. Thanks to the scattering descriptor, the proposed approach not only encodes distinct local texture and shape variations of different expressions as by several milestone operators, such as SIFT, HOG, and so on, but also captures subtle information hidden in high frequencies in both channels, which is quite crucial to better distinguish expressions that are easily confused. The validation is conducted on the BU-4DFE and BP-4D databa ses, and the accuracies reached are very competitive, indicating its competency for this issue.

Journal ArticleDOI
TL;DR: A tailored nonlinear matrix completion model for human motion recovery is proposed that embeds motion data into a high dimensional Hilbert space where motion data is of desirable low-rank and is then used to recover motions.
Abstract: Human motion capture data has been widely used in many areas, but it involves a complex capture process and the captured data inevitably contains missing data due to the occlusions caused by the actor’s body or clothing. Motion recovery, which aims to recover the underlying complete motion sequence from its degraded observation, still remains as a challenging task due to the nonlinear structure and kinematics property embedded in motion data. Low-rank matrix completion-based methods have shown promising performance in short-time-missing motion recovery problems. However, low-rank matrix completion, which is designed for linear data, lacks the theoretic guarantee when applied to the recovery of nonlinear motion data. To overcome this drawback, we propose a tailored nonlinear matrix completion model for human motion recovery. Within the model, we first learn a combined low-rank kernel via multiple kernel learning. By exploiting the learned kernel, we embed the motion data into a high dimensional Hilbert space where motion data is of desirable low-rank and we then use the low-rank matrix completion to recover motions. In addition, we add two kinematic constraints to the proposed model to preserve the kinematics property of human motion. Extensive experiment results and comparisons with five other state-of-the-art methods demonstrate the advantage of the proposed method.

Journal ArticleDOI
TL;DR: This paper develops a novel fuzzy multiple kernel learning model based on the Hilbert–Schmidt independence criterion (HSIC) for classification, which it is called HSIC-FMKL and performs extensive experiments on real-world datasets from the UCI benchmark repository and the application domain of computational biology which validate the superiority of the proposed model in terms of prediction accuracy.
Abstract: Multiple kernel learning (MKL) is a principled approach to kernel combination and selection for a variety of learning tasks, such as classification, clustering, and dimensionality reduction. In this paper, we develop a novel fuzzy multiple kernel learning model based on the Hilbert–Schmidt independence criterion (HSIC) for classification, which we call HSIC-FMKL. In this model, we first propose an HSIC Lasso-based MKL formulation, which not only has a clear statistical interpretation that minimum redundant kernels with maximum dependence on output labels are found and combined, but also enables the global optimal solution to be computed efficiently by solving a Lasso optimization problem. Since the traditional support vector machine (SVM) is sensitive to outliers or noises in the dataset, fuzzy SVM (FSVM) is used to select the prediction hypothesis once the optimal kernel has been obtained. The main advantage of FSVM is that we can associate a fuzzy membership with each data point such that these data points can have different effects on the training of the learning machine. We propose a new fuzzy membership function using a heuristic strategy based on the HSIC. The proposed HSIC-FMKL is a two-stage kernel learning approach and the HSIC is applied in both stages. We perform extensive experiments on real-world datasets from the UCI benchmark repository and the application domain of computational biology which validate the superiority of the proposed model in terms of prediction accuracy.

Journal ArticleDOI
TL;DR: A deep automated skeletal bone age assessment model based on convolutional neural networks and support vector regression using multiple kernel learning (MKL) algorithm to process heterogeneous features is developed in this paper.
Abstract: Skeletal bone age assessment is a widely used standard procedure in both disease detection and growth prediction for children in endocrinology. Conventional manual assessment methods mainly rely on personal experience in observing X-ray images of left hand and wrist to calculate bone age, which show some intrinsic limitations from low efficiency to unstable accuracy. To address these problems, some automated methods based on image processing or machine learning have been proposed, while their performances are not satisfying enough yet in assessment accuracy. Motivated by the remarkable success of deep learning (DL) techniques in the fields of image classification and speech recognition, we develop a deep automated skeletal bone age assessment model based on convolutional neural networks (CNNs) and support vector regression (SVR) using multiple kernel learning (MKL) algorithm to process heterogeneous features in this paper. This deep framework has been constructed, not only exploring the X-ray images of hand and twist but also some other heterogeneous information like race and gender. The experiment results prove its better performance with higher bone age assessment accuracy on two different data sets compared with the state of the art, indicating that the fused heterogeneous features provide a better description of the degree of bones' maturation.

Journal ArticleDOI
TL;DR: A new algorithm that multiple kernel support vector regression (MKL-SVR) is proposed to complete this goal, which improves the accuracy and robustness of the speed estimation.
Abstract: Industrial loop detectors (ILDs) are the most common traffic detectors. In Shanghai, most of the ILDs are installed in a single loop way, which can detect various parameters, such as flow, saturation, and so on. However, they cannot detect the speed directly, which is one of the key inputs of intelligent transportation systems (ITS) for identifying the traffic state. Thus, this paper is dedicated to estimate speed accurately. It proposes a new algorithm that multiple kernel support vector regression (MKL-SVR) to complete this goal, which improves the accuracy and robustness of the speed estimation. Extensive experiments have been performed to evaluate the performances of MKL-SVR, compared with polynomial fitting, BP neural networks and SVR. All results indicate that the performances of MKL-SVR are the best and most robust.

Journal ArticleDOI
TL;DR: The potential of utilizing affect or emotion recognition research in AEH models is explored and the conceptual Emotion-based E-learning Model (EEM) with the proposed emotion recognition framework is proposed for future work.
Abstract: Adaptive Educational Hypermedia (AEH) e-learning models aim to personalize educational content and learning resources based on the needs of an individual learner. The Adaptive Hypermedia Architecture (AHA) is a specific implementation of the AEH model that exploits the cognitive characteristics of learner feedback to adapt resources accordingly. However, beside cognitive feedback, the learning realm generally includes both the affective and emotional feedback of the learner, which is often neglected in the design of e-learning models. This article aims to explore the potential of utilizing affect or emotion recognition research in AEH models. The framework is referred to as Multiple Kernel Learning Decision Tree Weighted Kernel Alignment (MKLDT-WFA). The MKLDT-WFA has two merits over classical MKL. First, the WFA component only preserves the relevant kernel weights to reduce redundancy and improve the discrimination for emotion classes. Second, training via the decision tree reduces the misclassification issues associated with the SimpleMKL. The proposed work has been evaluated on different emotion datasets and the results confirm the good performances. Finally, the conceptual Emotion-based E-learning Model (EEM) with the proposed emotion recognition framework is proposed for future work.

Journal ArticleDOI
TL;DR: This study addressed the problem of separating early‐ and late‐stage cancers from each other using their gene expression profiles and proposed to use a multiple kernel learning (MKL) formulation that makes use of pathways/gene sets to obtain satisfactory/improved predictive performance and identify biological mechanisms that might have an effect in cancer progression.
Abstract: Motivation Identifying molecular mechanisms that drive cancers from early to late stages is highly important to develop new preventive and therapeutic strategies. Standard machine learning algorithms could be used to discriminate early- and late-stage cancers from each other using their genomic characterizations. Even though these algorithms would get satisfactory predictive performance, their knowledge extraction capability would be quite restricted due to highly correlated nature of genomic data. That is why we need algorithms that can also extract relevant information about these biological mechanisms using our prior knowledge about pathways/gene sets. Results In this study, we addressed the problem of separating early- and late-stage cancers from each other using their gene expression profiles. We proposed to use a multiple kernel learning (MKL) formulation that makes use of pathways/gene sets (i) to obtain satisfactory/improved predictive performance and (ii) to identify biological mechanisms that might have an effect in cancer progression. We extensively compared our proposed MKL on gene sets algorithm against two standard machine learning algorithms, namely, random forests and support vector machines, on 20 diseases from the Cancer Genome Atlas cohorts for two different sets of experiments. Our method obtained statistically significantly better or comparable predictive performance on most of the datasets using significantly fewer gene expression features. We also showed that our algorithm was able to extract meaningful and disease-specific information that gives clues about the progression mechanism. Availability and implementation Our implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/mehmetgonen/gsbc together with the scripts that replicate the reported experiments.

Journal ArticleDOI
TL;DR: To enable the learner to discover and benefit from the underlying local coherence and diversity of the samples, the clustering procedure is incorporated into the canonical support vector machine-based LMKL framework and how the cluster structure is gradually revealed and the matrix-regularized kernel weights are obtained.
Abstract: Localized multiple kernel learning (LMKL) is an attractive strategy for combining multiple heterogeneous features with regard to their discriminative power for each individual sample. However, the learning of numerous local solutions may not scale well even for a moderately sized training set, and the independently learned local models may suffer from overfitting. Hence, in existing local methods, the distributed samples are typically assumed to share the same weights, and various unsupervised clustering methods are applied as preprocessing. In this paper, to enable the learner to discover and benefit from the underlying local coherence and diversity of the samples, we incorporate the clustering procedure into the canonical support vector machine-based LMKL framework. Then, to explore the relatedness among different samples, which has been ignored in a vector $\ell _{p}$ -norm analysis, we organize the cluster-specific kernel weights into a matrix and introduce a matrix-based extension of the $\ell _{p}$ -norm for constraint enforcement. By casting the joint optimization problem as a problem of alternating optimization, we show how the cluster structure is gradually revealed and how the matrix-regularized kernel weights are obtained. A theoretical analysis of such a regularizer is performed using a Rademacher complexity bound, and complementary empirical experiments on real-world data sets demonstrate the effectiveness of our technique.

Proceedings Article
13 May 2018
TL;DR: It is concluded that the proposed multiple kernel learning method is the best approach to date for Arabic dialect identification.
Abstract: We present a machine learning approach that ranked on the first place in the Arabic Dialect Identification (ADI) Closed Shared Tasks of the 2018 VarDial Evaluation Campaign. The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from speech or phonetic transcripts, we also use a kernel based on dialectal embeddings generated from audio recordings by the organizers. In the learning stage, we independently employ Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR). Preliminary experiments indicate that KRR provides better classification results. Our approach is shallow and simple, but the empirical results obtained in the 2018 ADI Closed Shared Task prove that it achieves the best performance. Furthermore, our top macro-F1 score (58.92%) is significantly better than the second best score (57.59%) in the 2018 ADI Shared Task, according to the statistical significance test performed by the organizers. Nevertheless, we obtain even better post-competition results (a macro-F1 score of 62.28%) using the audio embeddings released by the organizers after the competition. With a very similar approach (that did not include phonetic features), we also ranked first in the ADI Closed Shared Tasks of the 2017 VarDial Evaluation Campaign, surpassing the second best method by 4.62%. We therefore conclude that our multiple kernel learning method is the best approach to date for Arabic dialect identification.

Journal ArticleDOI
TL;DR: A novel method based on multiple-kernels for classifying non-stationary data streams, which addresses the mentioned challenges with special attention to the space complexity by learning multiple kernels and specifying the boundaries of classes in the feature (mapped) space of combined kernels.
Abstract: Due to the unprecedented speed and volume of generated raw data in most of applications, data stream mining has attracted a lot of attention recently. Methods for solving these problems should address challenges in this area such as infinite length, concept-drift, recurring concepts, and concept-evolution. Moreover, due to the speedy intrinsic of data streams, the time and space complexity of the methods are extremely important. This paper proposes a novel method based on multiple-kernels for classifying non-stationary data streams, which addresses the mentioned challenges with special attention to the space complexity. By learning multiple kernels and specifying the boundaries of classes in the feature (mapped) space of combined kernels, the required amount of memory will be decreased. These kernels will be updated regularly throughout the stream when the true labels of instances are received. Newly arrived instances will be classified with respect to their distance to boundaries of the previously known classes in the feature spaces. Due to the efficient memory usage, the computation time does not increase significantly through the stream. We evaluate the performance of the proposed method using a set of experiments conducted on both real and synthetic benchmark data sets. The experimental results show the superiority of the proposed method over the state-of-the-art methods in this area.

Journal ArticleDOI
01 Jul 2018
TL;DR: This work extends the state‐of‐the‐art kernel learning method by developing kernels for peak interactions to combine with kernels for peaks through multiple kernel learning (MKL), and formulates a sparse interaction model for metabolite peaks, which is computationally light and interpretable for fingerprint prediction.
Abstract: Motivation Recent success in metabolite identification from tandem mass spectra has been led by machine learning, which has two stages: mapping mass spectra to molecular fingerprint vectors and then retrieving candidate molecules from the database. In the first stage, i.e. fingerprint prediction, spectrum peaks are features and considering their interactions would be reasonable for more accurate identification of unknown metabolites. Existing approaches of fingerprint prediction are based on only individual peaks in the spectra, without explicitly considering the peak interactions. Also the current cutting-edge method is based on kernels, which are computationally heavy and difficult to interpret. Results We propose two learning models that allow to incorporate peak interactions for fingerprint prediction. First, we extend the state-of-the-art kernel learning method by developing kernels for peak interactions to combine with kernels for peaks through multiple kernel learning (MKL). Second, we formulate a sparse interaction model for metabolite peaks, which we call SIMPLE, which is computationally light and interpretable for fingerprint prediction. The formulation of SIMPLE is convex and guarantees global optimization, for which we develop an alternating direction method of multipliers (ADMM) algorithm. Experiments using the MassBank dataset show that both models achieved comparative prediction accuracy with the current top-performance kernel method. Furthermore SIMPLE clearly revealed individual peaks and peak interactions which contribute to enhancing the performance of fingerprint prediction. Availability and implementation The code will be accessed through http://mamitsukalab.org/tools/SIMPLE/.

Journal ArticleDOI
TL;DR: This work hypothesizes the presence of inter-regional co-activations (latent parameters) that combine diffusion kernels at multiple scales to characterize how FC could arise from SC and formulated a multiple kernel learning (MKL) scheme to estimate the latent parameters from training data.
Abstract: A challenging problem in cognitive neuroscience is to relate the structural connectivity (SC) to the functional connectivity (FC) to better understand how large-scale network dynamics underlying human cognition emerges from the relatively fixed SC architecture. Recent modeling attempts point to the possibility of a single diffusion kernel giving a good estimate of the FC. We highlight the shortcomings of the single-diffusion-kernel model (SDK) and propose a multi-scale diffusion scheme. Our multi-scale model is formulated as a reaction-diffusion system giving rise to spatio-temporal patterns on a fixed topology. We hypothesize the presence of inter-regional co-activations (latent parameters) that combine diffusion kernels at multiple scales to characterize how FC could arise from SC. We formulated a multiple kernel learning (MKL) scheme to estimate the latent parameters from training data. Our model is analytically tractable and complex enough to capture the details of the underlying biological phenomena. The parameters learned by the MKL model lead to highly accurate predictions of subject-specific FCs from test datasets at a rate of 71%, surpassing the performance of the existing linear and non-linear models. We provide an example of how these latent parameters could be used to characterize age-specific reorganization in the brain structure and function.

Journal ArticleDOI
TL;DR: Experimental results on several hyperspectral image datasets demonstrate that the proposed multiple kernel learning-based low rank representation at superpixel level (Sp_MKL_LRR) outperforms several state-of-the-art classifiers tested in terms of overall accuracy, average accuracy, and kappa statistic.
Abstract: High dimensional image classification is a fundamental technique for information retrieval from hyperspectral remote sensing data. However, data quality is readily affected by the atmosphere and noise in the imaging process, which makes it difficult to achieve good classification performance. In this paper, multiple kernel learning-based low rank representation at superpixel level (Sp_MKL_LRR) is proposed to improve the classification accuracy for hyperspectral images. Superpixels are generated first from the hyperspectral image to reduce noise effect and form homogeneous regions. An optimal superpixel kernel parameter is then selected by the kernel matrix using a multiple kernel learning framework. Finally, a kernel low rank representation is applied to classify the hyperspectral image. The proposed method offers two advantages. (1) The global correlation constraint is exploited by the low rank representation, while the local neighborhood information is extracted as the superpixel kernel adaptively learns the high-dimensional manifold features of the samples in each class; (2) It can meet the challenges of multiscale feature learning and adaptive parameter determination in the conventional kernel methods. Experimental results on several hyperspectral image datasets demonstrate that the proposed method outperforms several state-of-the-art classifiers tested in terms of overall accuracy, average accuracy, and kappa statistic.

Journal ArticleDOI
TL;DR: A novel multiple kernel learning (MKL) model that embodies the characteristics of ensemble learning, kernel learning, and representative learning is proposed to forecast the near future air quality (AQ).
Abstract: Air quality prediction is an important research issue due to the increasing impact of air pollution on the urban environment. However, existing methods often fail to forecast high-polluting air conditions, which is precisely what should be highlighted. In this paper, a novel multiple kernel learning (MKL) model that embodies the characteristics of ensemble learning, kernel learning, and representative learning is proposed to forecast the near future air quality (AQ). The centered alignment approach is used for learning kernels, and a boosting approach is used to determine the proper number of kernels. To demonstrate the performance of the proposed MKL model, its performance is compared to that of classical autoregressive integrated moving average (ARIMA) model; widely used parametric models like random forest (RF) and support vector machine (SVM); popular neural network models like multiple layer perceptron (MLP); and long short-term memory neural network. Datasets acquired from a coastal city Hong Kong and an inland city Beijing are used to train and validate all the models. Experiments show that the MKL model outperforms the other models. Moreover, the MKL model has better forecast ability for high health risk category AQ.

Journal ArticleDOI
TL;DR: Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data, which can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous.
Abstract: The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .

Journal ArticleDOI
TL;DR: Experimental results on three real HSIs confirm that the proposed classifiers outperform the other state-of-the-art representation-based classifiers.
Abstract: To adequately represent the nonlinearities in the high-dimensional feature space for hyperspectral images (HSIs), we propose a multiple kernel collaborative representation-based classifier (CRC) in this paper. Extended morphological profiles are first extracted from the original HSIs, because they can efficiently capture the spatial and spectral information. In the proposed method, a novel multiple kernel learning (MKL) model is embedded into CRC. Multiple kernel patterns, e.g., Naive, Multimetric, and Multiscale are adopted for the optimal set of basic kernels, which are helpful to capture the useful information from different pixel distributions, kernel metric spaces, and kernel scales. To learn an optimal linear combination of the predefined basic kernels, we add an extra training stage to the typical CRC where kernel weights are jointly learned with the representation coefficients from the training samples by minimizing the representation error. Moreover, by considering different contributions of dictionary atoms, the adaptive representation strategy is applied to the MKL framework via a dissimilarity-weighted regularizer to obtain a more robust representation of test pixels in the fused kernel space. Experimental results on three real HSIs confirm that the proposed classifiers outperform the other state-of-the-art representation-based classifiers.

Journal ArticleDOI
TL;DR: Pathway Induced Multiple Kernel Learning (PIMKL) as discussed by the authors exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a multiple kernel learning (MKL) algorithm.
Abstract: Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power, limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, and we have very little understanding about the mechanisms that lead to the prediction. While opaqueness concerning machine behaviour might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway Induced Multiple Kernel Learning (PIMKL), a novel methodology to reliably classify samples that can also help gain insights into the molecular mechanisms that underlie the classification. PIMKL exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning (MKL) algorithm, an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels for prediction of a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks.

Posted Content
TL;DR: In this article, the authors proposed a Localized Multiple Kernel Anomaly Detection (LMKAD) approach for one-class classification, where the weight for each kernel is assigned locally.
Abstract: Multi-kernel learning has been well explored in the recent past and has exhibited promising outcomes for multi-class classification and regression tasks. In this paper, we present a multiple kernel learning approach for the One-class Classification (OCC) task and employ it for anomaly detection. Recently, the basic multi-kernel approach has been proposed to solve the OCC problem, which is simply a convex combination of different kernels with equal weights. This paper proposes a Localized Multiple Kernel learning approach for Anomaly Detection (LMKAD) using OCC, where the weight for each kernel is assigned locally. Proposed LMKAD approach adapts the weight for each kernel using a gating function. The parameters of the gating function and one-class classifier are optimized simultaneously through a two-step optimization process. We present the empirical results of the performance of LMKAD on 25 benchmark datasets from various disciplines. This performance is evaluated against existing Multi Kernel Anomaly Detection (MKAD) algorithm, and four other existing kernel-based one-class classifiers to showcase the credibility of our approach. Our algorithm achieves significantly better Gmean scores while using a lesser number of support vectors compared to MKAD. Friedman test is also performed to verify the statistical significance of the results claimed in this paper.