scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Classifier chains for multi-label classification

01 Dec 2011-Machine Learning (Springer US)-Vol. 85, Iss: 3, pp 333-359
TL;DR: This paper presents a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity, and illustrates the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.
Abstract: The widely known binary relevance method for multi-label classification, which considers each label as an independent binary problem, has often been overlooked in the literature due to the perceived inadequacy of not directly modelling label correlations. Most current methods invest considerable complexity to model interdependencies between labels. This paper shows that binary relevance-based methods have much to offer, and that high predictive performance can be obtained without impeding scalability to large datasets. We exemplify this with a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity. We extend this approach further in an ensemble framework. An extensive empirical evaluation covers a broad range of multi-label datasets with a variety of evaluation metrics. The results illustrate the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This paper aims to provide a timely review on this area with emphasis on state-of-the-art multi-label learning algorithms with relevant analyses and discussions.
Abstract: Multi-label learning studies the problem where each example is represented by a single instance while associated with a set of labels simultaneously. During the past decade, significant amount of progresses have been made toward this emerging machine learning paradigm. This paper aims to provide a timely review on this area with emphasis on state-of-the-art multi-label learning algorithms. Firstly, fundamentals on multi-label learning including formal definition and evaluation metrics are given. Secondly and primarily, eight representative multi-label learning algorithms are scrutinized under common notations with relevant analyses and discussions. Thirdly, several related learning settings are briefly summarized. As a conclusion, online resources and open research problems on multi-label learning are outlined for reference purposes.

2,495 citations


Cites background from "Classifier chains for multi-label c..."

  • ...In Classifier Chains [72], [73], binary assignment is represented by 0 and 1....

    [...]

  • ...67 · |D|) [72] or with replacement (|D(r)| = |D|) [73]....

    [...]

  • ...The basic idea of this algorithm is to transform the multilabel learning problem into a chain of binary classification problems, where subsequent binary classifiers in the chain is built upon the predictions of preceding ones [72], [73]....

    [...]

Journal ArticleDOI
TL;DR: The overall distribution of ANI values generated by pairwise comparison of 6787 genomes of prokaryotes belonging to 22 phyla was investigated, finding an apparent distinction in the overall ANI distribution between intra- and interspecies relationships at around 95-96% ANI.
Abstract: Among available genome relatedness indices, average nucleotide identity (ANI) is one of the most robust measurements of genomic relatedness between strains, and has great potential in the taxonomy of bacteria and archaea as a substitute for the labour-intensive DNA–DNA hybridization (DDH) technique. An ANI threshold range (95–96 %) for species demarcation had previously been suggested based on comparative investigation between DDH and ANI values, albeit with rather limited datasets. Furthermore, its generality was not tested on all lineages of prokaryotes. Here, we investigated the overall distribution of ANI values generated by pairwise comparison of 6787 genomes of prokaryotes belonging to 22 phyla to see whether the suggested range can be applied to all species. There was an apparent distinction in the overall ANI distribution between intra- and interspecies relationships at around 95–96 % ANI. We went on to determine which level of 16S rRNA gene sequence similarity corresponds to the currently accepted ANI threshold for species demarcation using over one million comparisons. A twofold cross-validation statistical test revealed that 98.65 % 16S rRNA gene sequence similarity can be used as the threshold for differentiating two species, which is consistent with previous suggestions (98.2–99.0 %) derived from comparative studies between DDH and 16S rRNA gene sequence similarity. Our findings should be useful in accelerating the use of genomic sequence data in the taxonomy of bacteria and archaea.

2,227 citations


Cites methods from "Classifier chains for multi-label c..."

  • ...The F measure was originally introduced for measuring classification performance in information retrieval processes (van Rijsbergen, 1979) and has been used frequently in assessing the performance of binary or multilabel classifiers (Lan et al., 2012; Read et al., 2011)....

    [...]

Book
01 Jan 2008
TL;DR: Novel computational approaches for deep learning of behaviors as opposed to just static patterns will be presented, based on structured nonnegative matrix factorizations of matrices that encode observation frequencies of behaviors.
Abstract: Future Directions -- Semi-supervised Multiple Classifier Systems: Background and Research Directions -- Boosting -- Boosting GMM and Its Two Applications -- Boosting Soft-Margin SVM with Feature Selection for Pedestrian Detection -- Observations on Boosting Feature Selection -- Boosting Multiple Classifiers Constructed by Hybrid Discriminant Analysis -- Combination Methods -- Decoding Rules for Error Correcting Output Code Ensembles -- A Probability Model for Combining Ranks -- EER of Fixed and Trainable Fusion Classifiers: A Theoretical Study with Application to Biometric Authentication Tasks -- Mixture of Gaussian Processes for Combining Multiple Modalities -- Dynamic Classifier Integration Method -- Recursive ECOC for Microarray Data Classification -- Using Dempster-Shafer Theory in MCF Systems to Reject Samples -- Multiple Classifier Fusion Performance in Networked Stochastic Vector Quantisers -- On Deriving the Second-Stage Training Set for Trainable Combiners -- Using Independence Assumption to Improve Multimodal Biometric Fusion -- Design Methods -- Half-Against-Half Multi-class Support Vector Machines -- Combining Feature Subsets in Feature Selection -- ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments -- Using Decision Tree Models and Diversity Measures in the Selection of Ensemble Classification Models -- Ensembles of Classifiers from Spatially Disjoint Data -- Optimising Two-Stage Recognition Systems -- Design of Multiple Classifier Systems for Time Series Data -- Ensemble Learning with Biased Classifiers: The Triskel Algorithm -- Cluster-Based Cumulative Ensembles -- Ensemble of SVMs for Incremental Learning -- Performance Analysis -- Design of a New Classifier Simulator -- Evaluation of Diversity Measures for Binary Classifier Ensembles -- Which Is the Best Multiclass SVM Method? An Empirical Study -- Over-Fitting in Ensembles of Neural Network Classifiers Within ECOC Frameworks -- Between Two Extremes: Examining Decompositions of the Ensemble Objective Function -- Data Partitioning Evaluation Measures for Classifier Ensembles -- Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation -- Ensemble Confidence Estimates Posterior Probability -- Applications -- Using Domain Knowledge in the Random Subspace Method: Application to the Classification of Biomedical Spectra -- An Abnormal ECG Beat Detection Approach for Long-Term Monitoring of Heart Patients Based on Hybrid Kernel Machine Ensemble -- Speaker Verification Using Adapted User-Dependent Multilevel Fusion -- Multi-modal Person Recognition for Vehicular Applications -- Using an Ensemble of Classifiers to Audit a Production Classifier -- Analysis and Modelling of Diversity Contribution to Ensemble-Based Texture Recognition Performance -- Combining Audio-Based and Video-Based Shot Classification Systems for News Videos Segmentation -- Designing Multiple Classifier Systems for Face Recognition -- Exploiting Class Hierarchies for Knowledge Transfer in Hyperspectral Data.

1,073 citations


Cites methods from "Classifier chains for multi-label c..."

  • ...As binary classification algorithm, we have employed logistic regression and SVM with RBF kernel provided in LIBSVM [19], for the cases of weighted Euclidean and weighted loss-based decodings respectively....

    [...]

  • ...We adapted the measure NDCG [19] (Normalized Discounted Cumulative Gain) to form our metatarget....

    [...]

  • ...In order to construct a more informative gene network we performed the integration by adding two more functional gene networks (FI and HumanNet) taken from the literature [18,19], thus obtaining a final integration of 10 biomolecular networks (Table 1)....

    [...]

Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, a CNN-RNN framework is proposed to learn a joint image-label embedding to characterize the semantic label dependency as well as the image label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework.
Abstract: While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image. Traditional approaches to multi-label image classification learn independent classifiers for each category and employ ranking or thresholding on the classification results. These techniques, although working well, fail to explicitly exploit the label dependencies in an image. In this paper, we utilize recurrent neural networks (RNNs) to address this problem. Combined with CNNs, the proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image-label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework. Experimental results on public benchmark datasets demonstrate that the proposed architecture achieves better performance than the state-of-the-art multi-label classification models.

933 citations

Journal ArticleDOI
TL;DR: The results of the analysis show that for multi-label classification the best performing methods overall are random forests of predictive clustering trees (RF-PCT) and hierarchy of multi- label classifiers (HOMER), followed by binary relevance (BR) and classifier chains (CC).

711 citations


Cites methods from "Classifier chains for multi-label c..."

  • ...Each base classifier makes a multi-label prediction and then these predictions are combined by using some voting scheme (e.g., majority or probability distribution voting)....

    [...]

  • ...This method proved to be particularly competitive in terms of efficiency....

    [...]

  • ...All rights reserved....

    [...]

References
More filters
Book
25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

20,196 citations


"Classifier chains for multi-label c..." refers methods in this paper

  • ...We evaluate all algorithms under a WEKA-based [17] framework running under Java JDK 1....

    [...]

Journal ArticleDOI
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

19,603 citations


"Classifier chains for multi-label c..." refers methods in this paper

  • ...We evaluate all algorithms using our open-source WEKA-based (Hall et al. 2009) software,2 which also provides a wrapper around the MULAN software3 that contains additional methods....

    [...]

  • ...Improving these algorithms, including threshold selection, was a focus of the work in Kiritchenko (2005). AdaBoost-based methods have mainly been used in bioinformatics applications (where boosting and decision trees are particularly popular (Kiritchenko 2005))....

    [...]

  • ...We evaluate all algorithms under a WEKA-based [17] framework running under Java JDK 1.6 with the following settings....

    [...]

Journal ArticleDOI
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Abstract: The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.

17,177 citations

Journal ArticleDOI
01 Aug 1996
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Abstract: Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.

16,118 citations

Journal Article
TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.
Abstract: While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

10,306 citations


"Classifier chains for multi-label c..." refers methods in this paper

  • ...We apply the Nemenyi test (Demšar 2006) to indicate statistical significance....

    [...]