Classifier chains for multi-label classification

doi:10.1007/S10994-011-5256-5

Home
/
Papers
/
Classifier chains for multi-label classification

Journal Article•DOI•

Classifier chains for multi-label classification

Jesse Read¹, Bernhard Pfahringer¹, Geoff Holmes¹, Eibe Frank¹•Institutions (1)

University of Waikato¹

01 Dec 2011-Machine Learning (Springer US)-Vol. 85, Iss: 3, pp 333-359

TL;DR: This paper presents a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity, and illustrates the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.

read less

Abstract: The widely known binary relevance method for multi-label classification, which considers each label as an independent binary problem, has often been overlooked in the literature due to the perceived inadequacy of not directly modelling label correlations. Most current methods invest considerable complexity to model interdependencies between labels. This paper shows that binary relevance-based methods have much to offer, and that high predictive performance can be obtained without impeding scalability to large datasets. We exemplify this with a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity. We extend this approach further in an ensemble framework. An extensive empirical evaluation covers a broad range of multi-label datasets with a variety of evaluation metrics. The results illustrate the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A Review On Multi-Label Learning Algorithms

[...]

Min-Ling Zhang¹, Zhi-Hua Zhou²•Institutions (2)

Southeast University¹, Nanjing University²

01 Aug 2014-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper aims to provide a timely review on this area with emphasis on state-of-the-art multi-label learning algorithms with relevant analyses and discussions.

...read moreread less

Abstract: Multi-label learning studies the problem where each example is represented by a single instance while associated with a set of labels simultaneously. During the past decade, significant amount of progresses have been made toward this emerging machine learning paradigm. This paper aims to provide a timely review on this area with emphasis on state-of-the-art multi-label learning algorithms. Firstly, fundamentals on multi-label learning including formal definition and evaluation metrics are given. Secondly and primarily, eight representative multi-label learning algorithms are scrutinized under common notations with relevant analyses and discussions. Thirdly, several related learning settings are briefly summarized. As a conclusion, online resources and open research problems on multi-label learning are outlined for reference purposes.

...read moreread less

2,495 citations

Cites background from "Classifier chains for multi-label c..."

...In Classifier Chains [72], [73], binary assignment is represented by 0 and 1....
[...]
...67 · |D|) [72] or with replacement (|D(r)| = |D|) [73]....
[...]
...The basic idea of this algorithm is to transform the multilabel learning problem into a chain of binary classification problems, where subsequent binary classifiers in the chain is built upon the predictions of preceding ones [72], [73]....
[...]

Journal Article•DOI•

Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes

[...]

Mincheol Kim¹, Hyunseok Oh¹, Sang-Cheol Park¹, Jongsik Chun¹•Institutions (1)

Seoul National University¹

01 Feb 2014-International Journal of Systematic and Evolutionary Microbiology

TL;DR: The overall distribution of ANI values generated by pairwise comparison of 6787 genomes of prokaryotes belonging to 22 phyla was investigated, finding an apparent distinction in the overall ANI distribution between intra- and interspecies relationships at around 95-96% ANI.

...read moreread less

Abstract: Among available genome relatedness indices, average nucleotide identity (ANI) is one of the most robust measurements of genomic relatedness between strains, and has great potential in the taxonomy of bacteria and archaea as a substitute for the labour-intensive DNA–DNA hybridization (DDH) technique. An ANI threshold range (95–96 %) for species demarcation had previously been suggested based on comparative investigation between DDH and ANI values, albeit with rather limited datasets. Furthermore, its generality was not tested on all lineages of prokaryotes. Here, we investigated the overall distribution of ANI values generated by pairwise comparison of 6787 genomes of prokaryotes belonging to 22 phyla to see whether the suggested range can be applied to all species. There was an apparent distinction in the overall ANI distribution between intra- and interspecies relationships at around 95–96 % ANI. We went on to determine which level of 16S rRNA gene sequence similarity corresponds to the currently accepted ANI threshold for species demarcation using over one million comparisons. A twofold cross-validation statistical test revealed that 98.65 % 16S rRNA gene sequence similarity can be used as the threshold for differentiating two species, which is consistent with previous suggestions (98.2–99.0 %) derived from comparative studies between DDH and 16S rRNA gene sequence similarity. Our findings should be useful in accelerating the use of genomic sequence data in the taxonomy of bacteria and archaea.

...read moreread less

2,227 citations

Cites methods from "Classifier chains for multi-label c..."

...The F measure was originally introduced for measuring classification performance in information retrieval processes (van Rijsbergen, 1979) and has been used frequently in assessing the performance of binary or multilabel classifiers (Lan et al., 2012; Read et al., 2011)....
[...]

Book•

Multiple Classifier Systems

[...]

Nikunj C. Oza, Robi Polikar, Josef Kittler, Fabio Roli

01 Jan 2008

TL;DR: Novel computational approaches for deep learning of behaviors as opposed to just static patterns will be presented, based on structured nonnegative matrix factorizations of matrices that encode observation frequencies of behaviors.

...read moreread less

Abstract: Future Directions -- Semi-supervised Multiple Classifier Systems: Background and Research Directions -- Boosting -- Boosting GMM and Its Two Applications -- Boosting Soft-Margin SVM with Feature Selection for Pedestrian Detection -- Observations on Boosting Feature Selection -- Boosting Multiple Classifiers Constructed by Hybrid Discriminant Analysis -- Combination Methods -- Decoding Rules for Error Correcting Output Code Ensembles -- A Probability Model for Combining Ranks -- EER of Fixed and Trainable Fusion Classifiers: A Theoretical Study with Application to Biometric Authentication Tasks -- Mixture of Gaussian Processes for Combining Multiple Modalities -- Dynamic Classifier Integration Method -- Recursive ECOC for Microarray Data Classification -- Using Dempster-Shafer Theory in MCF Systems to Reject Samples -- Multiple Classifier Fusion Performance in Networked Stochastic Vector Quantisers -- On Deriving the Second-Stage Training Set for Trainable Combiners -- Using Independence Assumption to Improve Multimodal Biometric Fusion -- Design Methods -- Half-Against-Half Multi-class Support Vector Machines -- Combining Feature Subsets in Feature Selection -- ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments -- Using Decision Tree Models and Diversity Measures in the Selection of Ensemble Classification Models -- Ensembles of Classifiers from Spatially Disjoint Data -- Optimising Two-Stage Recognition Systems -- Design of Multiple Classifier Systems for Time Series Data -- Ensemble Learning with Biased Classifiers: The Triskel Algorithm -- Cluster-Based Cumulative Ensembles -- Ensemble of SVMs for Incremental Learning -- Performance Analysis -- Design of a New Classifier Simulator -- Evaluation of Diversity Measures for Binary Classifier Ensembles -- Which Is the Best Multiclass SVM Method? An Empirical Study -- Over-Fitting in Ensembles of Neural Network Classifiers Within ECOC Frameworks -- Between Two Extremes: Examining Decompositions of the Ensemble Objective Function -- Data Partitioning Evaluation Measures for Classifier Ensembles -- Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation -- Ensemble Confidence Estimates Posterior Probability -- Applications -- Using Domain Knowledge in the Random Subspace Method: Application to the Classification of Biomedical Spectra -- An Abnormal ECG Beat Detection Approach for Long-Term Monitoring of Heart Patients Based on Hybrid Kernel Machine Ensemble -- Speaker Verification Using Adapted User-Dependent Multilevel Fusion -- Multi-modal Person Recognition for Vehicular Applications -- Using an Ensemble of Classifiers to Audit a Production Classifier -- Analysis and Modelling of Diversity Contribution to Ensemble-Based Texture Recognition Performance -- Combining Audio-Based and Video-Based Shot Classification Systems for News Videos Segmentation -- Designing Multiple Classifier Systems for Face Recognition -- Exploiting Class Hierarchies for Knowledge Transfer in Hyperspectral Data.

...read moreread less

1,073 citations

Cites methods from "Classifier chains for multi-label c..."

...As binary classification algorithm, we have employed logistic regression and SVM with RBF kernel provided in LIBSVM [19], for the cases of weighted Euclidean and weighted loss-based decodings respectively....
[...]
...We adapted the measure NDCG [19] (Normalized Discounted Cumulative Gain) to form our metatarget....
[...]
...In order to construct a more informative gene network we performed the integration by adding two more functional gene networks (FI and HumanNet) taken from the literature [18,19], thus obtaining a final integration of 10 biomolecular networks (Table 1)....
[...]

Proceedings Article•DOI•

CNN-RNN: A Unified Framework for Multi-label Image Classification

[...]

Jiang Wang¹, Yi Yang¹, Junhua Mao², Zhiheng Huang³, Chang Huang, Wei Xu¹ - Show less +2 more•Institutions (3)

Baidu¹, University of California, Berkeley², Facebook³

27 Jun 2016

TL;DR: In this article, a CNN-RNN framework is proposed to learn a joint image-label embedding to characterize the semantic label dependency as well as the image label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework.

...read moreread less

Abstract: While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image. Traditional approaches to multi-label image classification learn independent classifiers for each category and employ ranking or thresholding on the classification results. These techniques, although working well, fail to explicitly exploit the label dependencies in an image. In this paper, we utilize recurrent neural networks (RNNs) to address this problem. Combined with CNNs, the proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image-label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework. Experimental results on public benchmark datasets demonstrate that the proposed architecture achieves better performance than the state-of-the-art multi-label classification models.

...read moreread less

933 citations

Journal Article•DOI•

An extensive experimental comparison of methods for multi-label learning

[...]

Gjorgji Madjarov, Dragi Kocev¹, Dejan Gjorgjevikj, Sašo Deroski¹•Institutions (1)

Jožef Stefan Institute¹

01 Sep 2012-Pattern Recognition

TL;DR: The results of the analysis show that for multi-label classification the best performing methods overall are random forests of predictive clustering trees (RF-PCT) and hierarchy of multi- label classifiers (HOMER), followed by binary relevance (BR) and classifier chains (CC).

...read moreread less

711 citations

Cites methods from "Classifier chains for multi-label c..."

...Each base classifier makes a multi-label prediction and then these predictions are combined by using some voting scheme (e.g., majority or probability distribution voting)....
[...]
...This method proved to be particularly competitive in terms of efficiency....
[...]
...All rights reserved....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Data Mining: Practical Machine Learning Tools and Techniques

[...]

Ian H. Witten, Eibe Frank, Mark Hall

25 Oct 1999

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

...read moreread less

20,196 citations

"Classifier chains for multi-label c..." refers methods in this paper

...We evaluate all algorithms under a WEKA-based [17] framework running under Java JDK 1....
[...]

Journal Article•DOI•

The WEKA data mining software: an update

[...]

Mark Hall, Eibe Frank¹, Geoffrey Holmes¹, Bernhard Pfahringer¹, Peter Reutemann¹, Ian H. Witten¹ - Show less +2 more•Institutions (1)

University of Waikato¹

16 Nov 2009-Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

19,603 citations

"Classifier chains for multi-label c..." refers methods in this paper

...We evaluate all algorithms using our open-source WEKA-based (Hall et al. 2009) software,2 which also provides a wrapper around the MULAN software3 that contains additional methods....
[...]
...Improving these algorithms, including threshold selection, was a focus of the work in Kiritchenko (2005). AdaBoost-based methods have mainly been used in bioinformatics applications (where boosting and decision trees are particularly popular (Kiritchenko 2005))....
[...]
...We evaluate all algorithms under a WEKA-based [17] framework running under Java JDK 1.6 with the following settings....
[...]

Journal Article•DOI•

Induction of Decision Trees

[...]

J. R. Quinlan

25 Mar 1986-Machine Learning

TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.

...read moreread less

Abstract: The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.

...read moreread less

17,177 citations

Journal Article•DOI•

Bagging predictors

[...]

Leo Breiman

01 Aug 1996

TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.

...read moreread less

Abstract: Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.

...read moreread less

16,118 citations

Journal Article•

Statistical Comparisons of Classifiers over Multiple Data Sets

[...]

Janez Demšar

01 Dec 2006-Journal of Machine Learning Research

TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.

...read moreread less

Abstract: While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

...read moreread less

10,306 citations