scispace - formally typeset
Search or ask a question

Showing papers on "Classifier chains published in 2013"


Proceedings ArticleDOI
04 Nov 2013
TL;DR: Experiments on diverse benchmark datasets, followed by the Wilcoxon test for assessing statistical significance, indicate that the proposed genetic algorithm for optimizing the label ordering in classifier chains produces more accurate classifiers.
Abstract: First proposed in 2009, the classifier chains model (CC) has become one of the most influential algorithms for multi-label classification. It is distinguished by its simple and effective approach to exploit label dependencies. The CC method involves the training of q single-label binary classifiers, where each one is solely responsible for classifying a specific label in ll, ..., lq. These q classifiers are linked in a chain, such that each binary classifier is able to consider the labels predicted by the previous ones as additional information at classification time. The label ordering has a strong effect on predictive accuracy, however it is decided at random and/or combining random orders via an ensemble. A disadvantage of the ensemble approach consists of the fact that it is not suitable when the goal is to generate interpretable classifiers. To tackle this problem, in this work we propose a genetic algorithm for optimizing the label ordering in classifier chains. Experiments on diverse benchmark datasets, followed by the Wilcoxon test for assessing statistical significance, indicate that the proposed strategy produces more accurate classifiers.

98 citations


01 Jan 2013
TL;DR: In this article, the authors analyze the influence of the discrepancy between the feature spaces used in training and testing on the performance of classifier chains and propose two modifications to overcome this problem.
Abstract: Classifier chains have recently been proposed as an appealing method for tackling the multi-label classification task. In addition to several empirical studies showing its state-of-the-art performance, especially when being used in its ensemble variant, there are also some first results on theoretical properties of classifier chains. Continuing along this line, we analyze the influence of a potential pitfall of the learning process, namely the discrepancy between the feature spaces used in training and testing: While true class labels are used as supplementary attributes for training the binary models along the chain, the same models need to rely on estimations of these labels at prediction time. We elucidate under which circumstances the attribute noise thus created can affect the overall prediction performance. As a result of our findings, we propose two modifications of classifier chains that are meant to overcome this problem. Experimentally, we show that our variants are indeed able to produce better results in cases where the original chaining process is likely to fail.

29 citations


Proceedings ArticleDOI
26 May 2013
TL;DR: This paper presents a novel double-Monte Carlo scheme (M2CC), both for finding a good chain sequence and performing efficient inference, which remains tractable for high-dimensional data sets and obtains the best overall accuracy.
Abstract: Multi-label classification (MLC) is the supervised learning problem where an instance may be associated with multiple labels. Modeling dependencies between labels allows MLC methods to improve their performance at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies. On the one hand, the original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors down the chain. On the other hand, a recent Bayes-optimal method improves the performance, but is computationally intractable in practice. Here we present a novel double-Monte Carlo scheme (M2CC), both for finding a good chain sequence and performing efficient inference. The M2CC algorithm remains tractable for high-dimensional data sets and obtains the best overall accuracy, as shown on several real data sets with input dimension as high as 1449 and up to 103 labels.

27 citations


Journal ArticleDOI
TL;DR: A hybrid approach is proposed to learn the semantic concepts of images automatically using continuous probabilistic latent semantic analysis (PLSA) and its corresponding Expectation-Maximization (EM) algorithm.

26 citations


Book ChapterDOI
15 May 2013
TL;DR: This paper proposes selective ensemble of classifier chains (SECC) which tries to select a subset of classifiers to composite the ensemble whilst keeping or improving the performance, and forms this problem as a convex optimization problem which can be efficiently solved by the stochastic gradient descend method.
Abstract: In multi-label learning, the relationship among labels is well accepted to be important, and various methods have been proposed to exploit label relationships. Amongst them, ensemble of classifier chains (ECC) which builds multiple chaining classifiers by random label orders has drawn much attention. However, the ensembles generated by ECC are often unnecessarily large, leading to extra high computational and storage cost. To tackle this issue, in this paper, we propose selective ensemble of classifier chains (SECC) which tries to select a subset of classifier chains to composite the ensemble whilst keeping or improving the performance. More precisely, we focus on the performance measure F1-score, and formulate this problem as a convex optimization problem which can be efficiently solved by the stochastic gradient descend method. Experiments show that, compared with ECC, SECC is able to obtain much smaller ensembles while achieving better or at least comparable performance.

25 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: This work proposes a novel multi-label learning algorithm called Ensemble of Sampled Classifier Chains (ESCC) to improve clinical text data classification, which automatically learns to select relevant disease information that is helpful to improve classification performance when exploiting possible disease relation.
Abstract: Clinical data describing a patient's health status can be multi-labelled. For example, a clinical record describing patient suffering from cough and fever should be tagged with both two disease labels. These co-occurred labels often have interrelation which can be exploited to improve disease classifications. In this work, we treat the categorization of free clinical text as a multi-label learning problem. However, we discover that some commonly used multi-label learning methods might suffer from some severe side effects in exploiting complicated disease label relation, such as over-exploitation of label relation and error-propagation in label prediction. Based on these findings, we propose a novel multi-label learning algorithm called Ensemble of Sampled Classifier Chains (ESCC) to improve clinical text data classification. ESCC automatically learns to select relevant disease information that is helpful to improve classification performance when exploiting possible disease relation. In our conducted experiments, ESCC shows strong advantages over other state-of-the-art multi-label algorithms on medical text data with significant improvement in performance. The proposed algorithm is promising in mining knowledge from a wide range of multi-label medical text data.

17 citations


Proceedings ArticleDOI
01 Jan 2013
TL;DR: This paper proposes online distributed algorithms which can learn how to construct the optimal classifier chain in order to maximize the stream mining performance (i.e. mining accuracy minus cost) based on the dynamically-changing data characteristics.
Abstract: A plethora of emerging Big Data applications require processing and analyzing streams of data to extract valuable information in real-time. For this, chains of classifiers which can detect various concepts need to be constructed in real-time. In this paper, we propose online distributed algorithms which can learn how to construct the optimal classifier chain in order to maximize the stream mining performance (i.e. mining accuracy minus cost) based on the dynamically-changing data characteristics. The proposed solution does not require the distributed local classifiers to exchange any information when learning at runtime. Moreover, our algorithm requires only limited feedback of the mining performance to enable the learning of the optimal classifier chain. We model the learning problem of the optimal classifier chain at run-time as a multi-player multi-armed bandit problem with limited feedback. To our best knowledge, this paper is the first that applies bandit techniques to stream mining problems. However, existing bandit algorithms are inefficient in the considered scenario due to the fact that each component classifier learns its optimal classification functions using only the aggregate overall reward without knowing its own individual reward and without exchanging information with other classifiers. We prove that the proposed algorithms achieve logarithmic learning regret uniformly over time and hence, they are order optimal. Therefore, the long-term time average performance loss tends to zero. We also design learning algorithms whose regret is linear in the number of classification functions. This is much smaller than the regret results which can be obtained using existing bandit algorithms that scale linearly in the number of classifier chains and exponentially in the number of classification functions.

12 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: These experiments demonstrate that, even though designed to work online, EDDO delivers estimators of competitive accuracy compared to batch Bayesian structure learners and batch variants of EDDO.
Abstract: We address the problem of estimating a discrete joint density online, that is, the algorithm is only provided the current example and its current estimate. The proposed online estimator of discrete densities, EDDO (Estimation of Discrete Densities Online), uses classifier chains to model dependencies among features. Each classifier in the chain estimates the probability of one particular feature. Because a single chain may not provide a reliable estimate, we also consider ensembles of classifier chains and ensembles of weighted classifier chains. For all density estimators, we provide consistency proofs and propose algorithms to perform certain inference tasks. The empirical evaluation of the estimators is conducted in several experiments and on data sets of up to several million instances: We compare them to density estimates computed from Bayesian structure learners, evaluate them under the influence of noise, measure their ability to deal with concept drift, and measure the run-time performance. Our experiments demonstrate that, even though designed to work online, EDDO delivers estimators of competitive accuracy compared to batch Bayesian structure learners and batch variants of EDDO.

9 citations


Posted Content
TL;DR: This work proposes to use an ensemble of classifier chains combined with a histogram-of-segments representation for multi-label classification of birdsong, and shows that the proposed method usually outperforms binary relevance and is better in some cases and worse in others compared to the MIML algorithms.
Abstract: Bird sound data collected with unattended microphones for automatic surveys, or mobile devices for citizen science, typically contain multiple simultaneously vocalizing birds of different species. However, few works have considered the multi-label structure in birdsong. We propose to use an ensemble of classifier chains combined with a histogram-of-segments representation for multi-label classification of birdsong. The proposed method is compared with binary relevance and three multi-instance multi-label learning (MIML) algorithms from prior work (which focus more on structure in the sound, and less on structure in the label sets). Experiments are conducted on two real-world birdsong datasets, and show that the proposed method usually outperforms binary relevance (using the same features and base-classifier), and is better in some cases and worse in others compared to the MIML algorithms.

7 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: This work proposes MIML-ECC (ensemble of classifier chains), which exploits bag-level context through label correlations to improve instance-level prediction accuracy and achieves higher or comparable accuracy in comparison to several recent methods and baselines.
Abstract: In multi-instance multi-label (MIML) instance annotation, the goal is to learn an instance classifier while training on a MIML dataset, which consists of bags of instances paired with label sets, instance labels are not provided in the training data. The MIML formulation can be applied in many domains. For example, in an image domain, bags are images, instances are feature vectors representing segments in the images, and the label sets are lists of objects or categories present in each image. Although many MIML algorithms have been developed for predicting the label set of a new bag, only a few have been specifically designed to predict instance labels. We propose MIML-ECC (ensemble of classifier chains), which exploits bag-level context through label correlations to improve instance-level prediction accuracy. The proposed method is scalable in all dimensions of a problem (bags, instances, classes, and feature dimension), and has no parameters that require tuning (which is a problem for prior methods). In experiments on two image datasets, a bioacoustics dataset, and two artificial datasets, MIML-ECC achieves higher or comparable accuracy in comparison to several recent methods and baselines.

6 citations


Book ChapterDOI
18 Sep 2013
TL;DR: A novel heuristic approach for finding appropriate label order in chain is presented and it is demonstrated that the method obtains competitive overall accuracy and is also tractable to higher-dimensional data.
Abstract: Multi-label classification, in opposite to conventional classification, assumes that each data instance may be associated with more than one labels simultaneously Multi-label learning methods take advantage of dependencies between labels, but this implies greater learning computational complexity The paper considers Classifier Chain multi-label classification method, which in original form is fast, but assumes the order of labels in the chain This leads to propagation of inference errors down the chain On the other hand recent Bayes-optimal method, Probabilistic Classifier Chain, overcomes this drawback, but is computationally intractable In order to find the trade off solution it is presented a novel heuristic approach for finding appropriate label order in chain It is demonstrated that the method obtains competitive overall accuracy and is also tractable to higher-dimensional data

12 Jun 2013
TL;DR: This work proposes a new method that consider the multi-label learning problem in which portion of label assignment is missing, and extends the work of ensemble classifier chain to learn models using training examples with incomplete label assignment.
Abstract: Many methods have been explored in the literature of multi-label learning, ranging from simple problem transformation to more complex method that capture correlation among labels. However, mostly all existing works do not address the challenge with incomplete label data. The goal of this project is to extend the work of ensemble classifier chain to learn models using training examples with incomplete label assignment. This scenario is highly expected in many real-world application. For example, in image annotation, a user provides partial tags, or label assignment, for the image. We propose a new method that consider the multi-label learning problem in which portion of label assignment is missing. A further evaluation is covered in this project to study the effect of different parameters accompany this approach.