scispace - formally typeset
Search or ask a question

Showing papers on "Multiple kernel learning published in 2015"


Proceedings ArticleDOI
01 Jan 2015
TL;DR: A novel way of extracting features from short texts, based on the activation values of an inner layer of a deep convolutional neural network, is presented and a parallelizable decision-level data fusion method is presented, which is much faster, though slightly less accurate.
Abstract: We present a novel way of extracting features from short texts, based on the activation values of an inner layer of a deep convolutional neural network. We use the extracted features in multimodal sentiment analysis of short video clips representing one sentence each. We use the combined feature vectors of textual, visual, and audio modalities to train a classifier based on multiple kernel learning, which is known to be good at heterogeneous data. We obtain 14% performance improvement over the state of the art and present a parallelizable decision-level data fusion method, which is much faster, though slightly less accurate.

449 citations


Journal ArticleDOI
TL;DR: An important characteristic of the presented approach is that it does not require any regularization parameters to control the weights of considered features so that different types of features can be efficiently exploited and integrated in a collaborative and flexible way.
Abstract: Hyperspectral image classification has been an active topic of research in recent years. In the past, many different types of features have been extracted (using both linear and nonlinear strategies) for classification problems. On the one hand, some approaches have exploited the original spectral information or other features linearly derived from such information in order to have classes which are linearly separable. On the other hand, other techniques have exploited features obtained through nonlinear transformations intended to reduce data dimensionality, to better model the inherent nonlinearity of the original data (e.g., kernels) or to adequately exploit the spatial information contained in the scene (e.g., using morphological analysis). Special attention has been given to techniques able to exploit a single kind of features, such as composite kernel learning or multiple kernel learning, developed in order to deal with multiple kernels. However, few approaches have been designed to integrate multiple types of features extracted from both linear and nonlinear transformations. In this paper, we develop a new framework for the classification of hyperspectral scenes that pursues the combination of multiple features. The ultimate goal of the proposed framework is to be able to cope with linear and nonlinear class boundaries present in the data, thus following the two main mixing models considered for hyperspectral data interpretation. An important characteristic of the presented approach is that it does not require any regularization parameters to control the weights of considered features so that different types of features can be efficiently exploited and integrated in a collaborative and flexible way. Our experimental results, conducted using a variety of input features and hyperspectral scenes, indicate that the proposed framework for multiple feature learning provides state-of-the-art classification results without significantly increasing computational complexity.

299 citations


Journal ArticleDOI
TL;DR: A general learning framework, termed multiple kernel extreme learning machines (MK-ELM), to address the lack of a general framework for ELM to integrate multiple heterogeneous data sources for classification and can achieve comparable or even better classification performance than state-of-the-art MKL algorithms, while incurring much less computational cost.

160 citations


Journal ArticleDOI
TL;DR: It is shown empirically that the advantage of using the method proposed in this paper is even clearer when noise features are added, and the proposed method has been compared with other baselines and three state-of-the-art MKL methods showing that the approach is often superior.

159 citations


Journal ArticleDOI
TL;DR: Current multiple kernel learning for dimensionality reduction approaches are applied and extended, and it is shown that one can even use several kernels per data type and thereby alleviate the user from having to choose the best kernel functions and kernel parameters for each data type beforehand.
Abstract: Motivation: Despite ongoing cancer research, available therapies are still limited in quantity and effectiveness, and making treatment decisions for individual patients remains a hard problem. Established subtypes, which help guide these decisions, are mainly based on individual data types. However, the analysis of multidimensional patient data involving the measurements of various molecular features could reveal intrinsic characteristics of the tumor. Large-scale projects accumulate this kind of data for various cancer types, but we still lack the computational methods to reliably integrate this information in a meaningful manner. Therefore, we apply and extend current multiple kernel learning for dimensionality reduction approaches. On the one hand, we add a regularization term to avoid overfitting during the optimization procedure, and on the other hand, we show that one can even use several kernels per data type and thereby alleviate the user from having to choose the best kernel functions and kernel parameters for each data type beforehand. Results: We have identified biologically meaningful subgroups for five different cancer types. Survival analysis has revealed significant differences between the survival times of the identified subtypes, with P values comparable or even better than state-of-the-art methods. Moreover, our resulting subtypes reflect combined patterns from the different data sources, and we demonstrate that input kernel matrices with only little information have less impact on the integrated kernel matrix. Our subtypes show different responses to specific therapies, which could eventually assist in treatment decision making. Availability and implementation: An executable is available upon request. Contact: ed.gpm.fni-ipm@aron or ed.gpm.fni-ipm@refiefpn

148 citations


Journal ArticleDOI
TL;DR: It is observed that while the direction of returns is not predictable using either text or returns, their size is, with text features producing significantly better performance than historical returns alone.
Abstract: We show how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features to increase classification performance and we develop an analytic center cutting plane method to solve the kernel learning problem efficiently. We observe that while the direction of returns is not predictable using either text or returns, their size is, with text features producing significantly better performance than historical returns alone.

135 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors explored and exploited preference image pairs (PIPs) such as the quality of image I is better than image B for training a robust blind image quality assessment (BIQA) model.
Abstract: Blind image quality assessment (BIQA) aims to predict perceptual image quality scores without access to reference images. State-of-the-art BIQA methods typically require subjects to score a large number of images to train a robust model. However, subjective quality scores are imprecise, biased, and inconsistent, and it is challenging to obtain a large-scale database, or to extend existing databases, because of the inconvenience of collecting images, training the subjects, conducting subjective experiments, and realigning human quality evaluations. To combat these limitations, this paper explores and exploits preference image pairs (PIPs) such as the quality of image ${\boldsymbol {I}}_{\boldsymbol {a}}$ is better than that of image ${\boldsymbol {I}}_{\boldsymbol {b}}$ for training a robust BIQA model. The preference label, representing the relative quality of two images, is generally precise and consistent, and is not sensitive to image content, distortion type, or subject identity; such PIPs can be generated at a very low cost. The proposed BIQA method is one of learning to rank. We first formulate the problem of learning the mapping from the image features to the preference label as one of classification. In particular, we investigate the utilization of a multiple kernel learning algorithm based on group lasso to provide a solution. A simple but effective strategy to estimate perceptual image quality scores is then presented. Experiments show that the proposed BIQA method is highly effective and achieves a performance comparable with that of state-of-the-art BIQA algorithms. Moreover, the proposed method can be easily extended to new distortion categories.

128 citations


Journal ArticleDOI
TL;DR: Experimental results show that the proposed method performs well on the classification of insect species, and outperforms the state-of-the-art methods of the generic insect categorization.

107 citations


Journal ArticleDOI
TL;DR: This work proposes a Multiple Kernel Learning (MKL) framework for sketch recognition, fusing several features common to sketches, and investigates the use of attributes as a high-level feature for sketches and shows how this complements low-level features for improving recognition performance under the MKL framework.

95 citations


Journal ArticleDOI
TL;DR: The novel Multiple Adaptive Reduced Kernel Extreme Learning Machine (MARK-ELM) is introduced which combines Multiple Kernel Boosting and Multiclass KELM to Network Intrusion Detection to improve the efficacy of network intrusion on data that contains instances of multiple classes of attacks.
Abstract: Apply Multiple Kernel Boosting and Multiclass KELM to Network Intrusion Detection.Tested approach on several machine learning datasets and the KDD Cup 99 dataset.Utilized Fractional Polynomial Kernels for the Network ID problem for the first time.Requires no feature selection, minimal pre-processing and works on imbalanced data.Achieves superior detection rates and lower false alarm rates than other approaches. Detection of cyber-based attacks on computer networks continues to be a relevant and challenging area of research. Daily reports of incidents appear in public media including major ex-filtrations of data for the purposes of stealing identities, credit card numbers, and intellectual property as well as to take control of network resources. Methods used by attackers constantly change in order to defeat techniques employed by information technology (IT) teams intended to discover or block intrusions. "Zero Day" attacks whose "signatures" are not yet in IT databases are continually being uncovered. Machine learning approaches have been widely used to increase the effectiveness of intrusion detection platforms. While some machine learning techniques are effective at detecting certain types of attacks, there are no known methods that can be applied universally and achieve consistent results for multiple attack types. The focus of our research is the development of a framework that combines the outputs of multiple learners in order to improve the efficacy of network intrusion on data that contains instances of multiple classes of attacks. We have chosen the Extreme Learning Machine (ELM) as the core learning algorithm due to recent research that suggests that ELMs are straightforward to implement, computationally efficient and have excellent learning performance characteristics on par with the Support Vector Machine (SVM), one of the most widely used and best performing machine learning platforms (Liu, Gao, & Li, 2012). We introduce the novel Multiple Adaptive Reduced Kernel Extreme Learning Machine (MARK-ELM) which combines Multiple Kernel Boosting (Xia & Hoi, 2013) with the Multiple Classification Reduced Kernel ELM (Deng, Zheng, & Zhang, 2013). We tested this approach on several machine learning datasets as well as the KDD Cup 99 (Hettich & Bay, 1999) intrusion detection dataset. Our results indicate that MARK-ELM works well for the majority of University of California, Irvine (UCI) Machine Learning Repository small datasets and is scalable for larger datasets. For UCI datasets we achieved performance similar to the MKBoost Support Vector Machine (SVM) approach. In our experiments we demonstrate that MARK-ELM achieves superior detection rates and much lower false alarm rates than other approaches on intrusion detection data.

89 citations


Journal ArticleDOI
TL;DR: The basic idea of kernel alignment and its theoretical properties, as well as the extensions and improvements for specific learning problems, are introduced and the typical applications, including kernel parameter tuning, multiple kernel learning, spectral kernel learning and feature selection and extraction are reviewed.
Abstract: The success of kernel methods is very much dependent on the choice of kernel. Kernel design and learning a kernel from the data require evaluation measures to assess the quality of the kernel. In recent years, the notion of kernel alignment, which measures the degree of agreement between a kernel and a learning task, is widely used for kernel selection due to its effectiveness and low computational complexity. In this paper, we present an overview of the research progress of kernel alignment and its applications. We introduce the basic idea of kernel alignment and its theoretical properties, as well as the extensions and improvements for specific learning problems. The typical applications, including kernel parameter tuning, multiple kernel learning, spectral kernel learning and feature selection and extraction, are reviewed in the context of classification framework. The relationship between kernel alignment and other evaluation measures is also explored. Finally, concluding remarks and future directions are presented.

Journal ArticleDOI
TL;DR: This study presented a machine learning approach, named as PredcircRNA, focused on distinguishing circularRNA from other lncRNAs using multiple kernel learning, and showed that the proposed method can classify circular RNA from other types of lnc RNAs with an accuracy of 0.778.
Abstract: Recently circular RNA (circularRNA) has been discovered as an increasingly important type of long non-coding RNA (lncRNA), playing an important role in gene regulation, such as functioning as miRNA sponges. So it is very promising to identify circularRNA transcripts from de novo assembled transcripts obtained by high-throughput sequencing, such as RNA-seq data. In this study, we presented a machine learning approach, named as PredcircRNA, focused on distinguishing circularRNA from other lncRNAs using multiple kernel learning. Firstly we extracted different sources of discriminative features, including graph features, conservation information and sequence compositions, ALU and tandem repeats, SNP densities and open reading frames (ORFs) from transcripts. Secondly, to better integrate features from different sources, we proposed a computational approach based on a multiple kernel learning framework to fuse those heterogeneous features. Our preliminary 5-fold cross-validation result showed that our proposed method can classify circularRNA from other types of lncRNAs with an accuracy of 0.778, sensitivity of 0.781, specificity of 0.770, precision of 0.784 and MCC of 0.554 in our constructed gold-standard dataset, respectively. Our feature importance analysis based on Random Forest illustrated some discriminative features, such as conservation features and a GTAG sequence motif. Our PredcircRNA tool is available for download at https://github.com/xypan1232/PredcircRNA.

Journal ArticleDOI
01 May 2015
TL;DR: A novel framework for person-independent expression recognition by combining multiple types of facial features via multiple kernel learning (MKL) in multiclass support vector machines (SVM) that outperforms the state-of-the-art methods and the SimpleMKL-based multiclass-SVM for facial expression recognition.
Abstract: Automatic recognition of facial expressions is an interesting and challenging research topic in the field of pattern recognition due to applications such as human---machine interface design and developmental psychology. Designing classifiers for facial expression recognition with high reliability is a vital step in this research. This paper presents a novel framework for person-independent expression recognition by combining multiple types of facial features via multiple kernel learning (MKL) in multiclass support vector machines (SVM). Existing MKL-based approaches jointly learn the same kernel weights with $$l_{1}$$l1-norm constraint for all binary classifiers, whereas our framework learns one kernel weight vector per binary classifier in the multiclass-SVM with $$l_{p}$$lp-norm constraints $$(p \ge 1)$$(p?1), which considers both sparse and non-sparse kernel combinations within MKL. We studied the effect of $$l_{p}$$lp-norm MKL algorithm for learning the kernel weights and empirically evaluated the recognition results of six basic facial expressions and neutral faces with respect to the value of "$$p$$p". In our experiments, we combined two popular facial feature representations, histogram of oriented gradient and local binary pattern histogram, with two kernel functions, the heavy-tailed radial basis function and the polynomial function. Our experimental results on the CK$$+$$+, MMI and GEMEP-FERA face databases as well as our theoretical justification show that this framework outperforms the state-of-the-art methods and the SimpleMKL-based multiclass-SVM for facial expression recognition.

Journal ArticleDOI
Yanfeng Gu1, Qingwang Wang1, Hong Wang2, Di You3, Ye Zhang1 
TL;DR: The proposed algorithms, especially for KNMF-based MKL, achieve the outstanding performance for hyperspectral image classification with few labeled samples when compared with several state-of-the-art algorithms.
Abstract: In this paper, a novel multiple kernel learning (MKL) algorithm is proposed for the classification of hyperspectral images. The proposed MKL algorithm adopts a two-step strategy to learn a multiple kernel machine. In the first step, unsupervised learning is carried out to learn a combined kernel from the predefined base kernels. In our algorithms, low-rank nonnegative matrix factorization (NMF) is used to carry out the unsupervised learning and learn an optimal combined kernel. Furthermore, the kernel NMF (KNMF) is introduced to substitute NMF for enhancing the ability of the unsupervised learning with the predefined base kernels. In the second step, the optimal kernel is embedded into the standard optimization routine of support vector machine (SVM). In addition, we address a major challenge in hyperspectral data classification, i.e., using very few labeled samples in a high-dimensional space. Experiments are conducted on three real hyperspectral datasets, and the experimental results show that the proposed algorithms, especially for KNMF-based MKL, achieve the outstanding performance for hyperspectral image classification with few labeled samples when compared with several state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: This study demonstrates that the evaluation criteria used to examine the effectiveness of a financial market price forecasting method should be the profit and profit-risk ratio, rather than errors in prediction.
Abstract: Our proposed prediction and learning method is a hybrid referred to as MKL-GA, which combines multiple kernel learning (MKL) for regression (MKR) and a genetic algorithm (GA) to construct the trading rules. In this study, we demonstrate that the evaluation criteria used to examine the effectiveness of a financial market price forecasting method should be the profit and profit-risk ratio, rather than errors in prediction. Thus, it is necessary to use a price prediction method and a trading rules learning method. We tested the proposed method on the foreign exchange market for the USD/JPY currency pair, where the features used for prediction were extracted from the trading history of the three main currency pairs with three different short-term horizons. MKR is essential for utilizing the information contained in many of the features derived from different information sources and for various representations of the same information source. The GA is essential for generating trading rules, which are described using a mixture of discrete structures and continuous parameters. First, the MKR predicts the change in the exchange rate based on technical indicators such as the moving average convergence and divergence of the three currency pairs. Next, the GA generates a trading rule by combining the results of the MKR with several commonly used overbought/oversold technical indicators. The experimental results show that the proposed hybrid method outperforms other baseline methods in terms of the returns and return-risk ratio. In addition, the kernel weights employed for different currency pairs and the different time horizons used in the MKR step, as well as the trading strategy generated in the GA step, should be beneficial during actual trading.

Journal ArticleDOI
TL;DR: Many experiments on a variety of real-world datasets show that the proposed approach among a number of well-known related techniques, results in accurate and fast classifications.

Proceedings ArticleDOI
21 Jul 2015
TL;DR: The capability of the proposed feature representation method in outperforming the state-of-the-art monogenic signal approach to solving the micro-expression recognition problem is demonstrated.
Abstract: A monogenic signal is a two-dimensional analytical signal that provides the local information of magnitude, phase, and orientation. While it has been applied on the field of face and expression recognition [1], [2], [3], there are no known usages for subtle facial micro-expressions. In this paper, we propose a feature representation method which succinctly captures these three low-level components at multiple scales. Riesz wavelet transform is employed to obtain multi-scale monogenic wavelets, which are formulated by quaternion representation. Instead of summing up the multi-scale monogenic representations, we consider all monogenic representations across multiple scales as individual features. For classification, two schemes were applied to integrate these multiple feature representations: a fusion-based method which combines the features efficiently and discriminately using the ultra-fast, optimized Multiple Kernel Learning (UFO-MKL) algorithm; and concatenation-based method where the features are combined into a single feature vector and classified by a linear SVM. Experiments carried out on a recent spontaneous micro-expression database demonstrated the capability of the proposed method in outperforming the state-of-the-art monogenic signal approach to solving the micro-expression recognition problem.

01 Jan 2015
TL;DR: A novel framework for person-independent expression recog- nition by combining multiple types of facial features via multiple kernel learning (MKL) in multiclass support vector machines (SVM), which outperforms the state-of-the-art methods.
Abstract: Automaticrecognitionoffacialexpressionsisan interesting and challenging research topic in the field of pat- tern recognition due to applications such as human-machine interface design and developmental psychology. Designing classifiers for facial expression recognition with high relia- bility is a vital step in this research. This paper presents a novel framework for person-independent expression recog- nition by combining multiple types of facial features via multiple kernel learning (MKL) in multiclass support vector machines (SVM). Existing MKL-based approaches jointly learn the same kernel weights withl1-norm constraint for all binary classifiers, whereas our framework learns one kernel weight vector per binary classifier in the multiclass-SVM with lp-norm constraints (p ≥ 1), which considers both sparse and non-sparse kernel combinations within MKL. We studied the effect of lp-norm MKL algorithm for learning the kernel weights and empirically evaluated the recog- nition results of six basic facial expressions and neutral faces with respect to the value of "p". In our experiments, we combined two popular facial feature representations, histogram of oriented gradient and local binary pattern his- togram, with two kernel functions, the heavy-tailed radial basis function and the polynomial function. Our experi- mental results on the CK+, MMI and GEMEP-FERA face databases as well as our theoretical justification show that this framework outperforms the state-of-the-art methods and

Proceedings ArticleDOI
10 Dec 2015
TL;DR: This paper presents a method for any scenario multi-class weather classification based on multiple weather features and multiple kernel learning, and shows that the proposed method can efficiently recognize weather on MWI dataset.
Abstract: Multi-class weather classification from single images is a fundamental operation in many outdoor computer vision applications. However, it remains difficult and the limited work is carried out for addressing the difficulty. Moreover, existing method is based on the fixed scene. In this paper we present a method for any scenario multi-class weather classification based on multiple weather features and multiple kernel learning. Our approach extracts multiple weather features and takes properly processing. By combining these features into high dimensional vectors, we utilize multiple kernel learning to learn an adaptive classifier. We collect an outdoor image set that contains 20K images called MWI (Multi-class Weather Image) set. Experimental results show that the proposed method can efficiently recognize weather on MWI dataset.

Proceedings Article
Peng Zhou1, Liang Du1, Lei Shi1, Hanmo Wang1, Yi-Dong Shen1 
25 Jul 2015
TL;DR: This paper proposes a novel method for learning a robust yet low-rank kernel for clustering tasks, observing that the noises of each kernel have specific structures, so it can make full use of them to clean multiple input kernels and then aggregate them into a robust, low- rank consensus kernel.
Abstract: Kernel-based methods, such as kernel k-means and kernel PCA, have been widely used in machine learning tasks. The performance of these methods critically depends on the selection of kernel functions; however, the challenge is that we usually do not know what kind of kernels is suitable for the given data and task in advance; this leads to research on multiple kernel learning, i.e. we learn a consensus kernel from multiple candidate kernels. Existing multiple kernel learning methods have difficulty in dealing with noises. In this paper, we propose a novel method for learning a robust yet low-rank kernel for clustering tasks. We observe that the noises of each kernel have specific structures, so we can make full use of them to clean multiple input kernels and then aggregate them into a robust, low-rank consensus kernel. The underlying optimization problem is hard to solve and we will show that it can be solved via alternating minimization, whose convergence is theoretically guaranteed. Experimental results on several benchmark data sets further demonstrate the effectiveness of our method.

Proceedings ArticleDOI
19 Apr 2015
TL;DR: This paper redefine multiple kernels using a deep architecture, where a global kernel is learned as a multi-layered linear combination of activation functions, each one involves a combination of several elementary or intermediate functions on multiple features.
Abstract: It is commonly agreed that the success of support vector machines (SVMs), is highly dependent on the choice of particular similarity functions referred to as kernels. The latter are usually handcrafted or designed using appropriate optimization schemes. Multiple kernel learning (MKL) is one possible scheme that designs kernels as sparse or convex linear combinations of existing elementary functions. However, this results into shallow kernels, which are powerless to capture the right similarity between data, especially when content of these data is highly semantic. In this paper, we redefine multiple kernels using a deep architecture. In this new formulation, a global kernel is learned as a multi-layered linear combination of activation functions, each one involves a combination of several elementary or intermediate functions on multiple features. We propose three different settings to learn the weights of these kernel combinations; supervised, unsupervised and semi-supervised. When plugged into SVMs, the resulting deep multiple kernels show a gain, compared to shallow kernels, for the challenging task of image annotation using the ImageCLEF benchmark.

Journal ArticleDOI
TL;DR: In this article, a generalized adaptive $ell _{p}$ -norm multiple kernel learning (GA-MKL) was proposed to learn a robust classifier based on multiple base kernels constructed from the new image features and multiple sets of prelearned classifiers from other classes.
Abstract: We present a framework for image classification that extends beyond the window sampling of fixed spatial pyramids and is supported by a new learning algorithm. Based on the observation that fixed spatial pyramids sample a rather limited subset of the possible image windows, we propose a method that accounts for a comprehensive set of windows densely sampled over location, size, and aspect ratio. A concise high-level image feature is derived to effectively deal with this large set of windows, and this higher level of abstraction offers both efficient handling of the dense samples and reduced sensitivity to misalignment. In addition to dense window sampling, we introduce generalized adaptive $\ell _{p}$ -norm multiple kernel learning (GA-MKL) to learn a robust classifier based on multiple base kernels constructed from the new image features and multiple sets of prelearned classifiers from other classes. With GA-MKL, multiple levels of image features are effectively fused, and information is shared among different classifiers. Extensive evaluation on benchmark datasets for object recognition (Caltech256 and Caltech101) and scene recognition (15Scenes) demonstrate that the proposed method outperforms the state-of-the-art under a broad range of settings.

Journal ArticleDOI
TL;DR: The proposed ProMK iteratively optimizes the phases of learning optimal weights and reduces the empirical loss of multi-label classifier for each of the labels simultaneously and performs better than previously proposed protein function prediction approaches that integrate multiple data sources and multi- label multiple kernel learning methods.
Abstract: High-throughput experimental techniques provide a wide variety of heterogeneous proteomic data sources. To exploit the information spread across multiple sources for protein function prediction, these data sources are transformed into kernels and then integrated into a composite kernel. Several methods first optimize the weights on these kernels to produce a composite kernel, and then train a classifier on the composite kernel. As such, these approaches result in an optimal composite kernel, but not necessarily in an optimal classifier. On the other hand, some approaches optimize the loss of binary classifiers and learn weights for the different kernels iteratively. For multi-class or multi-label data, these methods have to solve the problem of optimizing weights on these kernels for each of the labels, which are computationally expensive and ignore the correlation among labels. In this paper, we propose a method called Predicting Pro tein Function using M ultiple K ernels (ProMK). ProMK iteratively optimizes the phases of learning optimal weights and reduces the empirical loss of multi-label classifier for each of the labels simultaneously. ProMK can integrate kernels selectively and downgrade the weights on noisy kernels. We investigate the performance of ProMK on several publicly available protein function prediction benchmarks and synthetic datasets. We show that the proposed approach performs better than previously proposed protein function prediction approaches that integrate multiple data sources and multi-label multiple kernel learning methods. The codes of our proposed method are available at https://sites.google.com/site/guoxian85/promk.

Journal ArticleDOI
08 Jan 2015
TL;DR: An evaluation of a number of feature selection techniques for classification in a biomedical image texture dataset (2-DE gel images) finds the best technique found is SVM-RFE, with an AUROC score of ($95.88±0.39), but this method is not significantly better than RFE-TREE, R FE-RF and grouped MKL, whilst MKL uses lower number of features, increasing the interpretability of the results.
Abstract: The interpretation of the results in a classification problem can be enhanced, specially in image texture analysis problems, by feature selection techniques, knowing which features contribute more to the classification performance. This paper presents an evaluation of a number of feature selection techniques for classification in a biomedical image texture dataset (2-DE gel images), with the aim of studying their performance and the stability in the selection of the features. We analyse three different techniques: subgroup-based multiple kernel learning (MKL), which can perform a feature selection by down-weighting or eliminating subsets of features which shares similar characteristic, and two different conventional feature selection techniques such as recursive feature elimination (RFE), with different classifiers (naive Bayes, support vector machines, bagged trees, random forest and linear discriminant analysis), and a genetic algorithm-based approach with an SVM as decision function. The different classifiers were compared using a ten times tenfold cross-validation model, and the best technique found is SVM-RFE, with an AUROC score of ( $$95.88 \pm 0.39\,\%$$ ). However, this method is not significantly better than RFE-TREE, RFE-RF and grouped MKL, whilst MKL uses lower number of features, increasing the interpretability of the results. MKL selects always the same features, related to wavelet-based textures, while RFE methods focuses specially co-occurrence matrix-based features, but with high instability in the number of features selected.

Journal ArticleDOI
Jun Shi1, Xiao Liu1, Yan Li2, Qi Zhang1, Yingjie Li1, Shihui Ying1 
TL;DR: The two-stage multi-view learning based sleep staging framework outperforms all other classification methods compared in this work, while JCR is superior to JSR.

Journal ArticleDOI
TL;DR: This study shows that multivariate machine learning approaches integrating multi-modal and multisource imaging data can classify FEP patients with high accuracy, and specific grey matter structures and white matter bundles reach high classification reliability when using different imaging modalities and indices.
Abstract: Currently, most of the classification studies of psychosis focused on chronic patients and employed single machine learning approaches. To overcome these limitations, we here compare, to our best knowledge for the first time, different classification methods of first-episode psychosis (FEP) using multi-modal imaging data exploited on several cortical and subcortical structures and white matter fiber bundles. 23 FEP patients and 23 age-, gender-, and race-matched healthy participants were included in the study. An innovative multivariate approach based on multiple kernel learning (MKL) methods was implemented on structural MRI and diffusion tensor imaging. MKL provides the best classification performances in comparison with the more widely used support vector machine, enabling the definition of a reliable automatic decisional system based on the integration of multi-modal imaging information. Our results show a discrimination accuracy greater than 90 % between healthy subjects and patients with FEP. Regions with an accuracy greater than 70 % on different imaging sources and measures were middle and superior frontal gyrus, parahippocampal gyrus, uncinate fascicles, and cingulum. This study shows that multivariate machine learning approaches integrating multi-modal and multisource imaging data can classify FEP patients with high accuracy. Interestingly, specific grey matter structures and white matter bundles reach high classification reliability when using different imaging modalities and indices, potentially outlining a prefronto-limbic network impaired in FEP with particular regard to the right hemisphere.

Proceedings ArticleDOI
30 Nov 2015
TL;DR: This work proposes an ℓp-normed genetic algorithm MKL (GAMKLp), which uses a genetic algorithm to learn the weights of a set of pre-computed kernel matrices for use with MKL classification, and proves that this approach is equivalent to a previously proposed fuzzy integral aggregation of multiple kernels called fuzzy integral: genetic algorithm (FIGA).
Abstract: Kernel methods for classification is a well-studied area in which data are implicitly mapped from a lower-dimensional space to a higher-dimensional space to improve classification accuracy. However, for most kernel methods, one must still choose a kernel to use for the problem. Since there is, in general, no way of knowing which kernel is the best, multiple kernel learning (MKL) is a technique used to learn the aggregation of a set of valid kernels into a single (ideally) superior kernel. The aggregation can be done using weighted sums of the pre-computed kernels, but determining the summation weights is not a trivial task. A popular and successful approach to this problem is MKL-group lasso (MKLGL), where the weights and classification surface are simultaneously solved by iteratively optimizing a min-max optimization until convergence. In this work, we propose an l p -normed genetic algorithm MKL (GAMKL p ), which uses a genetic algorithm to learn the weights of a set of pre-computed kernel matrices for use with MKL classification. We prove that this approach is equivalent to a previously proposed fuzzy integral aggregation of multiple kernels called fuzzy integral: genetic algorithm (FIGA). A second algorithm, which we call decision-level fuzzy integral MKL (DeFIMKL), is also proposed, where a fuzzy measure with respect to the fuzzy Choquet integral is learned via quadratic programming, and the decision value—viz., the class label—is computed using the fuzzy Choquet integral aggregation. Experiments on several benchmark data sets show that our proposed algorithms can outperform MKLGL when applied to support vector machine (SVM)-based classification.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: A Multiple Kernel Learning (MKL) approach to learn different weights of different bunches of features which are grouped by complexity, and defines a notion of kernel complexity, namely Kernel Spectral Complexity, and shows how this complexity relates to the well-known Empirical Rademacher Complexity for a natural class of functions which include SVM.
Abstract: Kernels for structures, including graphs, generally suffer of the diagonally dominant gram matrix issue, the effect by which the number of sub-structures, or features, shared between instances are very few with respect to those shared by an instance with itself. A parametric rule is typically used to reduce the weights of largest (more complex) sub-structures. The particular rule which is adopted is in fact a strong external bias that may strongly affect the resulting predictive performance. Thus, in principle, the applied rule should be validated in addition to the other hyper-parameters of the kernel. Nevertheless, for the majority of graph kernels proposed in literature, the parameters of the weighting rule are fixed a priori. The contribution of this paper is two-fold. Firstly, we propose a Multiple Kernel Learning (MKL) approach to learn different weights of different bunches of features which are grouped by complexity. Secondly, we define a notion of kernel complexity, namely Kernel Spectral Complexity, and we show how this complexity relates to the well-known Empirical Rademacher Complexity for a natural class of functions which include SVM. The proposed approach is applied to a recently defined graph kernel and evaluated on several real-world datasets. The obtained results show that our approach outperforms the original kernel on all the considered tasks.

Journal ArticleDOI
TL;DR: This work proposes a novel support vector machine MT-MKL framework that considers an implicitly defined set of conic combinations of task objectives and demonstrates that the framework is capable of achieving a better classification performance, when compared with other similar MTL approaches.
Abstract: A traditional and intuitively appealing Multitask Multiple Kernel Learning (MT-MKL) method is to optimize the sum (thus, the average) of objective functions with (partially) shared kernel function, which allows information sharing among the tasks. We point out that the obtained solution corresponds to a single point on the Pareto Front (PF) of a multiobjective optimization problem, which considers the concurrent optimization of all task objectives involved in the Multitask Learning (MTL) problem. Motivated by this last observation and arguing that the former approach is heuristic, we propose a novel support vector machine MT-MKL framework that considers an implicitly defined set of conic combinations of task objectives. We show that solving our framework produces solutions along a path on the aforementioned PF and that it subsumes the optimization of the average of objective functions as a special case. Using the algorithms we derived, we demonstrate through a series of experimental results that the framework is capable of achieving a better classification performance, when compared with other similar MTL approaches.

Journal ArticleDOI
TL;DR: Two feature selection methods to deal with heterogeneous data that include continuous and categorical variables are introduced and shown to offer state-of-the-art performances on a variety of high-dimensional classification tasks.