scispace - formally typeset
Search or ask a question

Showing papers on "Multiple kernel learning published in 2016"


Journal ArticleDOI
TL;DR: This paper reviews traditional as well as state-of-the-art ensemble methods and thus can serve as an extensive summary for practitioners and beginners.
Abstract: Ensemble methods use multiple models to get better performance. Ensemble methods have been used in multiple research fields such as computational intelligence, statistics and machine learning. This paper reviews traditional as well as state-of-the-art ensemble methods and thus can serve as an extensive summary for practitioners and beginners. The ensemble methods are categorized into conventional ensemble methods such as bagging, boosting and random forest, decomposition methods, negative correlation learning methods, multi-objective optimization based ensemble methods, fuzzy ensemble methods, multiple kernel learning ensemble methods and deep learning based ensemble methods. Variations, improvements and typical applications are discussed. Finally this paper gives some recommendations for future research directions.

455 citations


Journal ArticleDOI
TL;DR: A novel multiple kernel learning (MKL) framework to incorporate both spectral and spatial features for hyperspectral image classification, which is called multiple-structure-element nonlinear MKL (MultiSE-NMKL).
Abstract: In this paper, we propose a novel multiple kernel learning (MKL) framework to incorporate both spectral and spatial features for hyperspectral image classification, which is called multiple-structure-element nonlinear MKL (MultiSE-NMKL). In the proposed framework, multiple structure elements (MultiSEs) are employed to generate extended morphological profiles (EMPs) to present spatial–spectral information. In order to better mine interscale and interstructure similarity among EMPs, a nonlinear MKL (NMKL) is introduced to learn an optimal combined kernel from the predefined linear base kernels. We integrate this NMKL with support vector machines (SVMs) and reduce the min–max problem to a simple minimization problem. The optimal weight for each kernel matrix is then solved by a projection-based gradient descent algorithm. The advantages of using nonlinear combination of base kernels and multiSE-based EMP are that similarity information generated from the nonlinear interaction of different kernels is fully exploited, and the discriminability of the classes of interest is deeply enhanced. Experiments are conducted on three real hyperspectral data sets. The experimental results show that the proposed method achieves better performance for hyperspectral image classification, compared with several state-of-the-art algorithms. The MultiSE EMPs can provide much higher classification accuracy than using a single-SE EMP.

215 citations


Journal ArticleDOI
TL;DR: KronRLS-MKL, which models the drug-target interaction problem as a link prediction task on bipartite networks, allows the integration of multiple heterogeneous information sources for the identification of new interactions, and can also work with networks of arbitrary size.
Abstract: Drug-target networks are receiving a lot of attention in late years, given its relevance for pharmaceutical innovation and drug lead discovery. Different in silico approaches have been proposed for the identification of new drug-target interactions, many of which are based on kernel methods. Despite technical advances in the latest years, these methods are not able to cope with large drug-target interaction spaces and to integrate multiple sources of biological information. We propose KronRLS-MKL, which models the drug-target interaction problem as a link prediction task on bipartite networks. This method allows the integration of multiple heterogeneous information sources for the identification of new interactions, and can also work with networks of arbitrary size. Moreover, it automatically selects the more relevant kernels by returning weights indicating their importance in the drug-target prediction at hand. Empirical analysis on four data sets using twenty distinct kernels indicates that our method has higher or comparable predictive performance than 18 competing methods in all prediction tasks. Moreover, the predicted weights reflect the predictive quality of each kernel on exhaustive pairwise experiments, which indicates the success of the method to automatically reveal relevant biological sources. Our analysis show that the proposed data integration strategy is able to improve the quality of the predicted interactions, and can speed up the identification of new drug-target interactions as well as identify relevant information for the task. The source code and data sets are available at www.cin.ufpe.br/~acan/kronrlsmkl/ .

163 citations


Journal ArticleDOI
Mingsheng Long1, Jianmin Wang1, Yue Cao1, Jiaguang Sun1, Philip S. Yu1 
TL;DR: A unified deep adaptation framework for jointly learning transferable representation and classifier to enable scalable domain adaptation, by taking the advantages of both deep learning and optimal two-sample matching is proposed.
Abstract: Domain adaptation generalizes a learning model across source domain and target domain that are sampled from different distributions. It is widely applied to cross-domain data mining for reusing labeled information and mitigating labeling consumption. Recent studies reveal that deep neural networks can learn abstract feature representation, which can reduce, but not remove, the cross-domain discrepancy. To enhance the invariance of deep representation and make it more transferable across domains, we propose a unified deep adaptation framework for jointly learning transferable representation and classifier to enable scalable domain adaptation, by taking the advantages of both deep learning and optimal two-sample matching. The framework constitutes two inter-dependent paradigms, unsupervised pre-training for effective training of deep models using deep denoising autoencoders, and supervised fine-tuning for effective exploitation of discriminative information using deep neural networks, both learned by embedding the deep representations to reproducing kernel Hilbert spaces (RKHSs) and optimally matching different domain distributions. To enable scalable learning, we develop a linear-time algorithm using unbiased estimate that scales linearly to large samples. Extensive empirical results show that the proposed framework significantly outperforms state of the art methods on diverse adaptation tasks: sentiment polarity prediction, email spam filtering, newsgroup content categorization, and visual object recognition.

161 citations


Proceedings Article
12 Feb 2016
TL;DR: This paper proposes an MKKM clustering with a novel, effective matrix-induced regularization to reduce such redundancy and enhance the diversity of the selected kernels and shows that maximizing the kernel alignment for clustering can be viewed as a special case of this approach.
Abstract: Multiple kernel k-means (MKKM) clustering aims to optimally combine a group of pre-specified kernels to improve clustering performance. However, we observe that existing MKKM algorithms do not sufficiently consider the correlation among these kernels. This could result in selecting mutually redundant kernels and affect the diversity of information sources utilized for clustering, which finally hurts the clustering performance. To address this issue, this paper proposes an MKKM clustering with a novel, effective matrix-induced regularization to reduce such redundancy and enhance the diversity of the selected kernels. We theoretically justify this matrix-induced regularization by revealing its connection with the commonly used kernel alignment criterion. Furthermore, this justification shows that maximizing the kernel alignment for clustering can be viewed as a special case of our approach and indicates the extendability of the proposed matrix-induced regularization for designing better clustering algorithms. As experimentally demonstrated on five challenging MKL benchmark data sets, our algorithm significantly improves existing MKKM and consistently outperforms the state-of-the-art ones in the literature, verifying the effectiveness and advantages of incorporating the proposed matrix-induced regularization.

133 citations


Journal ArticleDOI
TL;DR: The proposed discriminative multiple kernel learning method for spectral image classification can achieve a substantial improvement in classification performance without strict limitation for selection of basic kernels and reduces the computational burden by requiring fewer support vectors.
Abstract: In this paper, we propose a discriminative multiple kernel learning (DMKL) method for spectral image classification. The core idea of the proposed method is to learn an optimal combined kernel from predefined basic kernels by maximizing separability in reproduction kernel Hilbert space. DMKL achieves the maximum separability via finding an optimal projective direction according to statistical significance, which leads to the minimum within-class scatter and maximum between-class scatter instead of a time-consuming search for the optimal kernel combination. Fisher criterion (FC) and maximum margin criterion (MMC) are used to find the optimal projective direction, thus leading to two variants of the proposed method, DMKL-FC and DMKL-MMC, respectively. After learning the projective direction, all basic kernels are projected to generate a discriminative combined kernel. Three merits are realized by DMKL. First, DMKL can achieve a substantial improvement in classification performance without strict limitation for selection of basic kernels. Second, the discriminating scales of a Gaussian kernel, the useful bands for classification, and the competitive sizes of spatial filters can be selected by ranking the corresponding weights, where the large weights correspond to the most relevant. Third, DMKL reduces the computational burden by requiring fewer support vectors. Experiments are conducted on two hyperspectral data sets and one multispectral data set. The corresponding experimental results demonstrate that the proposed algorithms can achieve the best performance with satisfactory computational efficiency for spectral image classification, compared with several state-of-the-art algorithms.

111 citations


Journal ArticleDOI
01 Dec 2016
TL;DR: A multiple kernel ensemble learning (MKEL) approach for software defect classification and prediction is proposed, and a new sample weight vector updating strategy is designed to reduce the cost of risk caused by misclassifying defective modules as non-defective ones.
Abstract: Software defect prediction aims to predict the defect proneness of new software modules with the historical defect data so as to improve the quality of a software system. Software historical defect data has a complicated structure and a marked characteristic of class-imbalance; how to fully analyze and utilize the existing historical defect data and build more precise and effective classifiers has attracted considerable researchers' interest from both academia and industry. Multiple kernel learning and ensemble learning are effective techniques in the field of machine learning. Multiple kernel learning can map the historical defect data to a higher-dimensional feature space and make them express better, and ensemble learning can use a series of weak classifiers to reduce the bias generated by the majority class and obtain better predictive performance. In this paper, we propose to use the multiple kernel learning to predict software defect. By using the characteristics of the metrics mined from the open source software, we get a multiple kernel classifier through ensemble learning method, which has the advantages of both multiple kernel learning and ensemble learning. We thus propose a multiple kernel ensemble learning (MKEL) approach for software defect classification and prediction. Considering the cost of risk in software defect prediction, we design a new sample weight vector updating strategy to reduce the cost of risk caused by misclassifying defective modules as non-defective ones. We employ the widely used NASA MDP datasets as test data to evaluate the performance of all compared methods; experimental results show that MKEL outperforms several representative state-of-the-art defect prediction methods.

110 citations


Proceedings Article
09 Jul 2016
TL;DR: A novel MKC algorithm with a "local" kernel alignment, which only requires that the similarity of a sample to its k-nearest neighbours be aligned with the ideal similarity matrix, which helps the clustering algorithm to focus on closer sample pairs that shall stay together and avoids involving unreliable similarity evaluation for farther sample pairs.
Abstract: Kernel alignment has recently been employed for multiple kernel clustering (MKC). However, we find that most of existing works implement this alignment in a global manner, which: i) indiscriminately forces all sample pairs to be equally aligned with the same ideal similarity; and ii) is inconsistent with a well-established concept that the similarity evaluated for two farther samples in a high dimensional space is less reliable. To address these issues, this paper proposes a novel MKC algorithm with a "local" kernel alignment, which only requires that the similarity of a sample to its k-nearest neighbours be aligned with the ideal similarity matrix. Such an alignment helps the clustering algorithm to focus on closer sample pairs that shall stay together and avoids involving unreliable similarity evaluation for farther sample pairs. We derive a new optimization problem to implement this idea, and design a two-step algorithm to efficiently solve it. As experimentally demonstrated on six challenging multiple kernel learning benchmark data sets, our algorithm significantly outperforms the state-of-the-art comparable methods in the recent literature, verifying the effectiveness and superiority of maximizing local kernel alignment.

70 citations


Journal ArticleDOI
TL;DR: This paper proposes an effective approach called co-labeling to solve the multi-view weakly labeled learning problem, which aims to learn an optimal classifier from a set of pseudo-label vectors generated by using the classifiers trained from other views.
Abstract: It is often expensive and time consuming to collect labeled training samples in many real-world applications. To reduce human effort on annotating training samples, many machine learning techniques (e.g., semi-supervised learning (SSL), multi-instance learning (MIL), etc.) have been studied to exploit weakly labeled training samples. Meanwhile, when the training data is represented with multiple types of features, many multi-view learning methods have shown that classifiers trained on different views can help each other to better utilize the unlabeled training samples for the SSL task. In this paper, we study a new learning problem called multi-view weakly labeled learning, in which we aim to develop a unified approach to learn robust classifiers by effectively utilizing different types of weakly labeled multi-view data from a broad range of tasks including SSL, MIL and relative outlier detection (ROD). We propose an effective approach called co-labeling to solve the multi-view weakly labeled learning problem. Specifically, we model the learning problem on each view as a weakly labeled learning problem, which aims to learn an optimal classifier from a set of pseudo-label vectors generated by using the classifiers trained from other views. Unlike traditional co-training approaches using a single pseudo-label vector for training each classifier, our co-labeling approach explores different strategies to utilize the predictions from different views, biases and iterations for generating the pseudo-label vectors, making our approach more robust for real-world applications. Moreover, to further improve the weakly labeled learning on each view, we also exploit the inherent group structure in the pseudo-label vectors generated from different strategies, which leads to a new multi-layer multiple kernel learning problem. Promising results for text-based image retrieval on the NUS-WIDE dataset as well as news classification and text categorization on several real-world multi-view datasets clearly demonstrate that our proposed co-labeling approach achieves state-of-the-art performance for various multi-view weakly labeled learning problems including multi-view SSL, multi-view MIL and multi-view ROD.

69 citations


Journal ArticleDOI
01 May 2016
TL;DR: Study of how the concurrent, and appropriately weighted, usage of news articles, having different degrees of relevance to the target stock, can improve the performance of financial forecasting and support the decision-making process of investors and traders.
Abstract: The market state changes when a new piece of information arrives. It affects decisions made by investors and is considered to be an important data source that can be used for financial forecasting. Recently information derived from news articles has become a part of financial predictive systems. The usage of news articles and their forecasting potential have been extensively researched. However, so far no attempts have been made to utilise different categories of news articles simultaneously. This paper studies how the concurrent, and appropriately weighted, usage of news articles, having different degrees of relevance to the target stock, can improve the performance of financial forecasting and support the decision-making process of investors and traders. Stock price movements are predicted using the multiple kernel learning technique which integrates information extracted from multiple news categories while separate kernels are utilised to analyse each category. News articles are partitioned according to their relevance to the target stock, its sub-industry, industry, group industry and sector. The experiments are run on stocks from the Health Care sector and show that increasing the number of relevant news categories used as data sources for financial forecasting improves the performance of the predictive system in comparison with approaches based on a lower number of categories. We use financial news articles from multiple categories to predict price movements.The multiple kernel learning approach is proposed for integrating information.Articles are assigned to news categories based on the relevance to the target stock.Simultaneous usage of several news categories improves the forecasting performance.Increasing the number of relevant news categories improves the performance.

62 citations


Journal ArticleDOI
TL;DR: The experimental results show that the proposed CS-SMKL achieves better performances for hyperspectral image classification compared with several state-of-the-art algorithms, and the results confirm the capability of the method in selecting the useful features.
Abstract: In recent years, many studies on hyperspectral image classification have shown that using multiple features can effectively improve the classification accuracy. As a very powerful means of learning, multiple kernel learning (MKL) can conveniently be embedded in a variety of characteristics. This paper proposes a class-specific sparse MKL (CS-SMKL) framework to improve the capability of hyperspectral image classification. In terms of the features, extended multiattribute profiles are adopted because it can effectively represent the spatial and spectral information of hyperspectral images. CS-SMKL classifies the hyperspectral images, simultaneously learns class-specific significant features, and selects class-specific weights. Using an $L_{1}$ -norm constraint (i.e., group lasso) as the regularizer, we can enforce the sparsity at the group/feature level and automatically learn a compact feature set for the classification of any two classes. More precisely, our CS-SMKL determines the associated weights of optimal base kernels for any two classes and results in improved classification performances. The advantage of the proposed method is that only the features useful for the classification of any two classes can be retained, which leads to greatly enhanced discriminability. Experiments are conducted on three hyperspectral data sets. The experimental results show that the proposed method achieves better performances for hyperspectral image classification compared with several state-of-the-art algorithms, and the results confirm the capability of the method in selecting the useful features.

Journal ArticleDOI
TL;DR: Six different fusion models inspired by the early fusion schemes, late fusion scheme, and intermediate fusion schemes are presented, which obtained significant improvements with the proposed fusion schemes relative to the usual fusion schemesrelative state-of-the-art methods.

Journal ArticleDOI
TL;DR: A full view of the string kernels approach is given and insights into two kinds of language transfer effects, namely, word choice (lexical transfer) and morphological differences are offered.
Abstract: The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Recently, an approach that uses only character p-grams as features has been proposed for the task of native language identification NLI. The approach obtained state-of-the-art results by combining several string kernels using multiple kernel learning. Despite the fact that the approach based on string kernels performs so well, several questions about this method remain unanswered. First, it is not clear why such a simple approach can compete with far more complex approaches that take words, lemmas, syntactic information, or even semantics into account. Second, although the approach is designed to be language independent, all experiments to date have been on English. This work is an extensive study that aims to systematically present the string kernel approach and to clarify the open questions mentioned above. A broad set of native language identification experiments were conducted to compare the string kernels approach with other state-of-the-art methods. The empirical results obtained in all of the experiments conducted in this work indicate that the proposed approach achieves state-of-the-art performance in NLI, reaching an accuracy that is 1.7% above the top scoring system of the 2013 NLI Shared Task. Furthermore, the results obtained on both the Arabic and the Norwegian corpora demonstrate that the proposed approach is language independent. In the Arabic native language identification task, string kernels show an increase of more than 17% over the best accuracy reported so far. The results of string kernels on Norwegian native language identification are also significantly better than the state-of-the-art approach. In addition, in a cross-corpus experiment, the proposed approach shows that it can also be topic independent, improving the state-of-the-art system by 32.3%. To gain additional insights about the string kernels approach, the features selected by the classifier as being more discriminating are analyzed in this work. The analysis also offers information about localized language transfer effects, since the features used by the proposed model are p-grams of various lengths. The features captured by the model typically include stems, function words, and word prefixes and suffixes, which have the potential to generalize over purely word-based features. By analyzing the discriminating features, this article offers insights into two kinds of language transfer effects, namely, word choice lexical transfer and morphological differences. The goal of the current study is to give a full view of the string kernels approach and shed some light on why this approach works so well.

Proceedings ArticleDOI
20 Mar 2016
TL;DR: A kernel-based nonlinear connectivity model based on which it obtains topology revealing PCs is proposed, and a data-driven approach is advocated to learn the combination of multiple kernel functions that optimizes the data fit.
Abstract: Partial correlations (PCs) of functional magnetic resonance imaging (fMRI) time series play a principal role in revealing connectivity of brain networks. To explore nonlinear behavior of the blood-oxygen-level dependent signal, the present work postulates a kernel-based nonlinear connectivity model based on which it obtains topology revealing PCs. Instead of relying on a single predefined kernel, a data-driven approach is advocated to learn the combination of multiple kernel functions that optimizes the data fit. Synthetically generated data based on both a dynamic causal and a linear model are used to validate the proposed approach in resting-state fMRI scenarios, highlighting the gains in edge detection performance when compared with the popular linear PC method. Tests on real fMRI data demonstrate that connectivity patterns revealed by linear and nonlinear models are different.

Journal ArticleDOI
TL;DR: A new regression method for continuous estimation of the intensity of facial behavior interpretation, called Doubly Sparse Relevance Vector Machine (DSRVM), which enforces double sparsity by jointly selecting the most relevant training examples and the most important kernels relevant for interpretation of observed facial expressions.
Abstract: Certain inner feelings and physiological states like pain are subjective states that cannot be directly measured, but can be estimated from spontaneous facial expressions. Since they are typically characterized by subtle movements of facial parts, analysis of the facial details is required. To this end, we formulate a new regression method for continuous estimation of the intensity of facial behavior interpretation, called Doubly Sparse Relevance Vector Machine (DSRVM). DSRVM enforces double sparsity by jointly selecting the most relevant training examples (a.k.a. relevance vectors) and the most important kernels associated with facial parts relevant for interpretation of observed facial expressions. This advances prior work on multi-kernel learning, where sparsity of relevant kernels is typically ignored. Empirical evaluation on challenging Shoulder Pain videos, and the benchmark DISFA and SEMAINE datasets demonstrate that DSRVM outperforms competing approaches with a multi-fold reduction of running times in training and testing.

Journal ArticleDOI
TL;DR: This paper proposes a novel method for scene-free multi-class weather classification from single images based on multiple category-specific dictionary learning and multiple kernel learning and learns dictionaries based on these features.

Proceedings Article
01 Jun 2016
TL;DR: It is argued that enhanced expressiveness is important when the networks are small due to run-time constraints (such as those imposed by mobile applications) and in large-scale settings, the additional capacity of SimNets can be controlled with proper regularization, yielding accuracies comparable to state of the art ConvNets.
Abstract: We present a deep layered architecture that generalizes convolutional neural networks (ConvNets). The architecture, called SimNets, is driven by two operators: (i) a similarity function that generalizes inner-product, and (ii) a log-mean-exp function called MEX that generalizes maximum and average. The two operators applied in succession give rise to a standard neuron but in "feature space". The feature spaces realized by SimNets depend on the choice of the similarity operator. The simplest setting, which corresponds to a convolution, realizes the feature space of the Exponential kernel, while other settings realize feature spaces of more powerful kernels (Generalized Gaussian, which includes as special cases RBF and Laplacian), or even dynamically learned feature spaces (Generalized Multiple Kernel Learning). As a result, the SimNet contains a higher abstraction level compared to a traditional ConvNet. We argue that enhanced expressiveness is important when the networks are small due to run-time constraints (such as those imposed by mobile applications). Empirical evaluation validates the superior expressiveness of SimNets, showing a significant gain in accuracy over ConvNets when computational resources at run-time are limited. We also show that in large-scale settings, where computational complexity is less of a concern, the additional capacity of SimNets can be controlled with proper regularization, yielding accuracies comparable to state of the art ConvNets.

Journal ArticleDOI
TL;DR: An efficient multiple-feature learning-based model with adaptive weights for effectively classifying complex hyperspectral images with limited training samples and a novel decision fusion strategy that combines linear and multiple kernel features to balance the classification results of different classifiers.
Abstract: Linearly derived features have been widely used in hyperspectral image classification to find linear separability of certain classes in recent years. Moreover, nonlinearly transformed features are more effective for class discrimination in real analysis scenarios. However, few efforts have attempted to combine both linear and nonlinear features in the same framework even if they can demonstrate some complementary properties. Moreover, conventional multiple-feature learning-based approaches deal with different features equally, which is not reasonable. This paper proposes an efficient multiple-feature learning-based model with adaptive weights for effectively classifying complex hyperspectral images with limited training samples. A new diversity kernel function is proposed first to simulate the vision perception and analysis procedure of human beings. It could simultaneously evaluate the contrast differences of global features and spatial coherence. Since existing multiple-kernel feature models are always time-consuming, we then design a new adaptive weighted multiple kernel learning method. It employs kernel projection, which could lower the dimensionalities and also learn kernel weights to further discriminate the classification boundaries. For combining both linear and nonlinear features, this paper also proposes a novel decision fusion strategy. The method combines linear and multiple kernel features to balance the classification results of different classifiers. The proposed scheme is tested on several hyperspectral data sets and extended to multisource feature classification environment. The experimental results show that the proposed classification method outperforms most of the existing ones and significantly reduces the computational complexity.

Journal ArticleDOI
TL;DR: A new weighted average combination method is presented, which is shown to perform better than MKL in both accuracy and efficiency in experiments, and is integrated into the k-nearest neighbors (kNNs) framework.
Abstract: In object classification, feature combination can usually be used to combine the strength of multiple complementary features and produce better classification results than any single one While multiple kernel learning (MKL) is a popular approach to feature combination in object classification, it does not always perform well in practical applications On one hand, the optimization process in MKL usually involves a huge consumption of computation and memory space On the other hand, in some cases, MKL is found to perform no better than the baseline combination methods This observation motivates us to investigate the underlying mechanism of feature combination with average combination and weighted average combination As a result, we empirically find that in average combination, it is better to use a sample of the most powerful features instead of all, whereas in one type of weighted average combination, the best classification accuracy comes from a nearly sparse combination We integrate these observations into the k-nearest neighbors (kNNs) framework, based on which we further discuss some issues related to sparse solution and MKL Finally, by making use of the kNN framework, we present a new weighted average combination method, which is shown to perform better than MKL in both accuracy and efficiency in experiments We believe that the work in this paper is helpful in exploring the mechanism underlying feature combination

Proceedings ArticleDOI
20 Mar 2016
TL;DR: This paper proposes a new framework that significantly reduces the complexity of deep multiple kernels, and designs its equivalent deep map network (DMN), using multi-layer explicit maps that approximate the initial DKN with a high precision.
Abstract: Deep multiple kernel learning is a powerful technique that selects and deeply combines multiple elementary kernels in order to provide the best performance on a given classification task. This technique, particularly effective, becomes intractable when handling large scale datasets; indeed, multiple nonlinear kernel combinations are time and memory demanding., In this paper, we propose a new framework that significantly reduces the complexity of deep multiple kernels. Given a deep kernel network (DKN), our method designs its equivalent deep map network (DMN), using multi-layer explicit maps that approximate the initial DKN with a high precision. When combined with support vector machines, the design of DMN preserves high classification accuracy compared to its underlying DKN while being (at least) an order of magnitude faster. Experiments conducted on the challenging Im-ageCLEF2013 annotation benchmark, show that the proposed DMN is indeed effective and highly efficient.

Proceedings Article
01 Dec 2016
TL;DR: This paper presents a method that uses only character p-grams as features for the Arabic Dialect Identification (ADI) Closed Shared Task of the DSL 2016 Challenge and has an important advantage in that it is language independent and linguistic theory neutral, as it does not require any NLP tools.
Abstract: The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Unlike the common approach, we present a method that uses only character p-grams (also known as n-grams) as features for the Arabic Dialect Identification (ADI) Closed Shared Task of the DSL 2016 Challenge. The proposed approach combines several string kernels using multiple kernel learning. In the learning stage, we try both Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR), and we choose KDA as it gives better results in a 10-fold cross-validation carried out on the training set. Our approach is shallow and simple, but the empirical results obtained in the ADI Shared Task prove that it achieves very good results. Indeed, we ranked on the second place with an accuracy of 50.91% and a weighted F1 score of 51.31%. We also present improved results in this paper, which we obtained after the competition ended. Simply by adding more regularization into our model to make it more suitable for test data that comes from a different distribution than training data, we obtain an accuracy of 51.82% and a weighted F1 score of 52.18%. Furthermore, the proposed approach has an important advantage in that it is language independent and linguistic theory neutral, as it does not require any NLP tools.

Journal ArticleDOI
TL;DR: A novel method, task-dependent multi-task multiple kernel learning (TD-MTMKL), to jointly detect the absence and presence of multiple AUs and captures commonalities and adapts to variations among co-occurred AUs.

Journal ArticleDOI
Jie Feng1, Licheng Jiao1, Tao Sun1, Hongying Liu1, Xiangrong Zhang1 
TL;DR: Experimental results on several hyperspectral images demonstrate the effectiveness of the proposed MKL method in terms of classification performance and computation efficiency.
Abstract: In hyperspectral images, band selection plays a crucial role for land-cover classification. Multiple kernel learning (MKL) is a popular feature selection method by selecting the relevant features and classifying the images simultaneously. Unfortunately, a large number of spectral bands in hyperspectral images result in excessive kernels, which limit the application of MKL. To address this problem, a novel MKL method based on discriminative kernel clustering (DKC) is proposed. In the proposed method, a discriminative kernel alignment (KA) (DKA) is defined. Traditional KA measures kernel similarity independently of the current classification task. Compared with KA, DKA measures the similarity of discriminative information by introducing the comparison of intraclass and interclass similarities. It can evaluate both kernel redundancy and kernel synergy for classification. Then, DKA-based affinity-propagation clustering is devised to reduce the kernel scale and retain the kernels having high discrimination and low redundancy for classification. Additionally, an analysis of necessity for DKC in hyperspectral band selection is provided by empirical Rademacher complexity. Experimental results on several hyperspectral images demonstrate the effectiveness of the proposed band selection method in terms of classification performance and computation efficiency.

Journal ArticleDOI
TL;DR: In this paper, a semi-infinite linear programming (SILP) based multiple-kernel learning (MKL) method was proposed for ELM, where the kernel function can be automatically learned as a combination of multiple kernels.
Abstract: The extreme learning machine (ELM) is a new method for using single hidden layer feed-forward networks with a much simpler training method. While conventional kernel-based classifiers are based on a single kernel, in reality, it is often desirable to base classifiers on combinations of multiple kernels. In this paper, we propose the issue of multiple-kernel learning (MKL) for ELM by formulating it as a semi-infinite linear programming. We further extend this idea by integrating with techniques of MKL. The kernel function in this ELM formulation no longer needs to be fixed, but can be automatically learned as a combination of multiple kernels. Two formulations of multiple-kernel classifiers are proposed. The first one is based on a convex combination of the given base kernels, while the second one uses a convex combination of the so-called equivalent kernels. Empirically, the second formulation is particularly competitive. Experiments on a large number of both toy and real-world data sets (including high-magnification sampling rate image data set) show that the resultant classifier is fast and accurate and can also be easily trained by simply changing linear program.

Journal ArticleDOI
TL;DR: This paper proposes to optimize the network over an adaptive backpropagation MLMKL framework using the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error, and achieves high performance.
Abstract: Multiple kernel learning (MKL) approach has been proposed for kernel methods and has shown high performance for solving some real-world applications. It consists on learning the optimal kernel from one layer of multiple predefined kernels. Unfortunately, this approach is not rich enough to solve relatively complex problems. With the emergence and the success of the deep learning concept, multilayer of multiple kernel learning (MLMKL) methods were inspired by the idea of deep architecture. They are introduced in order to improve the conventional MKL methods. Such architectures tend to learn deep kernel machines by exploring the combinations of multiple kernels in a multilayer structure. However, existing MLMKL methods often have trouble with the optimization of the network for two or more layers. Additionally, they do not always outperform the simplest method of combining multiple kernels (i.e., MKL). In order to improve the effectiveness of MKL approaches, we introduce, in this paper, a novel backpropagation MLMKL framework. Specifically, we propose to optimize the network over an adaptive backpropagation algorithm. We use the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error. We test our proposed method through a large set of experiments on a variety of benchmark data sets. We have successfully optimized the system over many layers. Empirical results over an extensive set of experiments show that our algorithm achieves high performance compared to the traditional MKL approach and existing MLMKL methods.

Journal ArticleDOI
TL;DR: This paper proposes a novel graph model in each single-view case to encode class-specific person-person interaction patterns, designed to preserve the complex spatial structure among skeletal joints according to their activity levels as well as the spatio-temporal joint features.
Abstract: This paper addresses the problem of recognizing human skeletal interactions using multiview data captured from depth sensors. The interactions among people are important cues for group and crowd human behavior analysis. In this paper, we focus on modeling the person–person skeletal interactions for human activity recognition. First, we propose a novel graph model in each single-view case to encode class-specific person–person interaction patterns. Particularly, we model each person–person interaction by an attributed graph, which is designed to preserve the complex spatial structure among skeletal joints according to their activity levels as well as the spatio-temporal joint features. Then, combining the graph models for each single-view case, we propose the multigraph model to characterize each multiview interaction. Finally, we apply a general multiple kernel learning method to determine the optimal kernel weights for the proposed multigraph model while the optimal classifier is jointly learned. We evaluate the proposed approach on the M $^2$ I dataset, the SBU Kinect interaction dataset, and our interaction dataset. The experimental results show that our proposed approach outperforms several existing interaction recognition methods.

Book ChapterDOI
17 Oct 2016
TL;DR: A kernel-learning based method to integrate multimodal imaging and genetic data for Alzheimer's disease diagnosis is proposed, which introduces a novel structured sparsity regularizer for feature selection and fusion, which is different from conventional lasso and group lasso based methods.
Abstract: A kernel-learning based method is proposed to integrate multimodal imaging and genetic data for Alzheimer’s disease (AD) diagnosis. To facilitate structured feature learning in kernel space, we represent each feature with a kernel and then group kernels according to modalities. In view of the highly redundant features within each modality and also the complementary information across modalities, we introduce a novel structured sparsity regularizer for feature selection and fusion, which is different from conventional lasso and group lasso based methods. Specifically, we enforce a penalty on kernel weights to simultaneously select features sparsely within each modality and densely combine different modalities. We have evaluated the proposed method using magnetic resonance imaging (MRI) and positron emission tomography (PET), and single-nucleotide polymorphism (SNP) data of subjects from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The effectiveness of our method is demonstrated by both the clearly improved prediction accuracy and the discovered brain regions and SNPs relevant to AD.

Proceedings ArticleDOI
20 Mar 2016
TL;DR: An algorithm is proposed that designs kernels as a part of Laplacian SVM learning which correspond to deep multi-layered combinations of elementary kernels which capture simple - linear - as well as intricate - nonlinear - relationships between data.
Abstract: Semi-supervised learning seeks to build accurate classification machines by taking advantage of both labeled and unlabeled data. This learning scheme is useful especially when labeled data are scarce while unlabeled ones are abundant. Among the existing semi-supervised learning algorithms, Laplacian support vector machines (SVMs) are known to be particularly powerful but their success is highly dependent on the choice of kernels., In this paper, we propose an algorithm that designs kernels as a part of Laplacian SVM learning. The proposed kernels correspond to deep multi-layered combinations of elementary kernels which capture simple — linear — as well as intricate — nonlinear — relationships between data. Our optimization process finds both the parameters of the deep kernels and the Laplacian SVMs in a unified framework resulting into highly discriminative and accurate classifiers. When applied to the challenging ImageCLEF2013 Photo Annotation benchmark, the proposed deep kernels show significant and consistent gain compared to existing elementary kernels as well as standard multiple kernels.

01 Jan 2016
TL;DR: The issue of multiple-kernel learning (MKL) for ELM is proposed by formulating it as a semi-infinite linear programming by integrating with techniques of MKL and the resultant classifier is fast and accurate and can be easily trained by simply changing linear program.
Abstract: The extreme learning machine (ELM) is a new method for using single hidden layer feed-forward networks with a much simpler training method While conventional kernel-based classifiers are based on a single kernel, in reality, it is often desirable to base classifiers on combinations of multiple kernels In this paper, we pro- pose the issue of multiple-kernel learning (MKL) for ELM by formulating it as a semi-infinite linear pro- gramming We further extend this idea by integrating with techniques of MKL The kernel function in this ELM formulation no longer needs to be fixed, but can be automatically learned as a combination of multiple ker- nels Two formulations of multiple-kernel classifiers are proposed The first one is based on a convex combination of the given base kernels, while the second one uses a convex combination of the so-called equivalent kernels Empirically, the second formulation is particularly com- petitive Experiments on a large number of both toy and real-world data sets (including high-magnification sam- pling rate image data set) show that the resultant classifier is fast and accurate and can also be easily trained by simply changing linear program

Journal ArticleDOI
TL;DR: Multiple Kernel Learning (MKL) can explore multiple dimensions simultaneously and performs feature selection during modeling of the data and multiple dimensions of ECoG signal can contribute to numerical processing.