scispace - formally typeset
Search or ask a question

Showing papers on "Classifier chains published in 2020"


Journal ArticleDOI
TL;DR: Two extensions of ECC's basic approach are presented, where a varying number of binary models per label are built and chains of different sizes are constructed in order to improve the exploitation of majority examples with approximately the same computational budget.
Abstract: Class imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics.

56 citations


Journal ArticleDOI
TL;DR: The sensitivity of five problem transformation and two ensemble methods to four types of classifiers is studied and the statistical performance of a classifier is found to be generally consistent across the metrics for any given method.

25 citations


Proceedings ArticleDOI
01 Aug 2020
TL;DR: It is shown that the translation, stemming, and stopword removal are not effective, and the problem of dependencies between labels greatly affects the results of classification.
Abstract: Hate speech and abusive words spread widely on social media. The impact of hate speech on social media is hazardous, which can lead to discrimination, social conflict, and even genocide. Hate speech also has target types, categories, and levels. This research discusses the classification of hate speech and abusive words in the text on social media Twitter in Indonesian, English, and a mixture of both up to the types, categories, and levels. Classification of hate speech multilabel text is investigated using RFDT, BiLSTM, and BiLSTM with the pre-trained BERT model. The Classifier Chains, Label Powerset, and Binary Relevance methods are also used as data transformation, and TF-IDF is also used as feature extraction combined with the RFDT classification method. Some scenarios of the preprocessing stage are also carried out to find the best results, namely full preprocess, without stopword removal, and without stemming and without stopword removal. The problem of having Indonesian, English, and a mixture of both is solved in two ways, namely, without being translated and translated into Indonesian. The best results with an accuracy of 76.12% were obtained using the RFDT classification method with Classifier Chains, without translation, without stemming, and without stopword removal. This research also shows that the translation, stemming, and stopword removal are not effective, and the problem of dependencies between labels greatly affects the results of classification.

15 citations


Journal ArticleDOI
TL;DR: A novel and effective algorithm named LSF-CC, i.e. Label Specific Features based Classifier Chain for multi-label classification, which outperforms well-established approaches in terms of classification performances.
Abstract: Multi-label classification tackles the problems in which each instance is associated with multiple labels. Due to the interdependence among labels, exploiting label correlations is the main means to enhance the performances of classifiers and a variety of corresponding multi-label algorithms have been proposed. Among those algorithms Classifier Chains (CC) is one of the most effective methods. It induces binary classifiers for each label, and these classifiers are linked in a chain. In the chain, the labels predicted by previous classifiers are used as additional features for the current classifier. The original CC has two shortcomings which potentially decrease classification performances: random label ordering, noise in original and additional features. To deal with these problems, we propose a novel and effective algorithm named LSF-CC, i.e. Label Specific Features based Classifier Chain for multi-label classification. At first, a feature estimating technique is employed to produce a list of most relevant features and labels for each label. According to these lists, we define a chain to guarantee that the most frequent labels that appear in these lists are top-ranked. Then, label specific features can be selected from the original feature space and label space. Based on these label specific features, corresponding binary classifiers are learned for each label. Experiments on several multi-label data sets from various domains have shown that the proposed method outperforms well-established approaches.

12 citations


Journal ArticleDOI
10 Oct 2020-Entropy
TL;DR: This work proposes a partial classifier chain method with feature selection (PCC-FS) that exploits the label correlation between label and feature spaces and thus solves the two previously mentioned problems simultaneously.
Abstract: Multi-label classification (MLC) is a supervised learning problem where an object is naturally associated with multiple concepts because it can be described from various dimensions. How to exploit the resulting label correlations is the key issue in MLC problems. The classifier chain (CC) is a well-known MLC approach that can learn complex coupling relationships between labels. CC suffers from two obvious drawbacks: (1) label ordering is decided at random although it usually has a strong effect on predictive performance; (2) all the labels are inserted into the chain, although some of them may carry irrelevant information that discriminates against the others. In this work, we propose a partial classifier chain method with feature selection (PCC-FS) that exploits the label correlation between label and feature spaces and thus solves the two previously mentioned problems simultaneously. In the PCC-FS algorithm, feature selection is performed by learning the covariance between feature set and label set, thus eliminating the irrelevant features that can diminish classification performance. Couplings in the label set are extracted, and the coupled labels of each label are inserted simultaneously into the chain structure to execute the training and prediction activities. The experimental results from five metrics demonstrate that, in comparison to eight state-of-the-art MLC algorithms, the proposed method is a significant improvement on existing multi-label classification.

11 citations


Proceedings ArticleDOI
01 Dec 2020
TL;DR: This paper proposes a well-performing instance-specific algorithm configuration model which selects an (almost) optimal configuration of modules for a given problem instance and the structure of this configuration model is able to capture inter-dependencies between modules.
Abstract: In this paper, we rely on previous work proposing a modularized version of CMA-ES, which captures several alterations to the conventional CMA-ES developed in recent years. Each alteration provides significant advantages under certain problem properties, e.g., multi-modality, high conditioning. These distinct advancements are implemented as modules which result in 4608 unique versions of CMA-ES. Previous findings illustrate the competitive advantage of enabling and disabling the aforementioned modules for different optimization problems. Yet, this modular CMA-ES is lacking a method to automatically determine when the activation of specific modules is auspicious and when it is not. We propose a well-performing instance-specific algorithm configuration model which selects an (almost) optimal configuration of modules for a given problem instance. In addition, the structure of this configuration model is able to capture inter-dependencies between modules, e.g., two (or more) modules might only be advantageous in unison for some problem types, making the orchestration of modules a crucial task. This is accomplished by chaining multiple random forest classifiers together into a so-called Classifier Chain based on a set of numerical features extracted by means of Exploratory Landscape Analysis (ELA) to describe the given problem instances.

10 citations


Journal ArticleDOI
TL;DR: An alternative estimation strategy, minimum error chain policy, is introduced that gradually expands the input space using the estimations that approximate to true characteristics of outputs, namely out-of-bag estimations in tree-based ensemble framework.

10 citations


Journal ArticleDOI
TL;DR: This paper studies and develops methods for regressor chains, and presents a sequential Monte Carlo scheme in the framework of a probabilistic regressor chain that can be effective, flexible and useful in several types of data.

8 citations


Book ChapterDOI
03 Jun 2020
TL;DR: This work investigates the applicability of several machine learning models and classifier chains (CC) to medical unstructured text classification and shows that using CC strategy allows to improve classification performance.
Abstract: Structuring medical text using international standards allows to improve interoperability and quality of predictive modelling. Medical text classification task facilitates information extraction. In this work we investigate the applicability of several machine learning models and classifier chains (CC) to medical unstructured text classification. The experimental study was performed on a corpus of 11671 manually labeled Russian medical notes. The results showed that using CC strategy allows to improve classification performance. Ensemble of classifier chains based on linear SVC showed the best result: 0.924 micro F-measure, 0.872 micro precision and 0.927 micro recall.

5 citations


Proceedings ArticleDOI
30 Nov 2020
TL;DR: This research proposes a system that classifies hate speech written in Indonesian language on Twitter and handles the noisiness of twitter data, such as mixed languages and non-standard text, using Support Vector Machines as a classifier.
Abstract: Hate speech has become a hot issue as it spreads massively on today’s social media with specific targets, categories, and levels. In addition, hate speech can cause social conflict and even genocide. This research proposes a system that classifies hate speech written in Indonesian language on Twitter. It also handles the noisiness of twitter data, such as mixed languages and non-standard text. We not only use Support Vector Machines (SVM) as a classifier, but also compare it with other methods, such as deep learning, CNN and DistilBERT. Apart from standard text preprocessing, we propose to accommodate the effect of translating in handling the multilingual content. The data transformation methods used in the SVM model are Label Power-set (LP) and Classifier Chains (CC). The experiment result shows that the classification using the SVM and CC without stemming, stopword removal, and translation provides the best accuracy of 74.88%. The best SVM hyperparameter on multilabel classification is the sigmoid kernel, the regularization parameter value of 10, and the gamma value of 0.1. Stemming, stopword removal, and translation preprocessing are less effective in this research. Moreover, CNN has a flaw in predicting labels for the training data with a low occurrence rate.

4 citations


Proceedings Article
01 Jan 2020
TL;DR: The proposed novel method parCC (parsimonious classifier chains) is proposed that controls the total number of features without significant deterioration in the quality of the prediction and is applied to predict multimorbidity using various medical diagnostic tests.
Abstract: We study a problem of learning classifier chains in multilabel classification with a special focus on feature selection. It turns out that standard classifier chains tend to select too many features, when feature selection method is embedded in base learner, which is due to the fact that selection is performed separately for each of the models in the chain. This can be a serious limitation in domains where the acquisition of feature values is costly or where including too many features (e.g. diagnostic tests) is associated with negative effects. We propose a novel method parCC (parsimonious classifier chains) that controls the total number of features without significant deterioration in the quality of the prediction. In the proposed method we jointly learn all models in the chain by combining `2,1 regularization to select features shared across the models and `1 regularization to select relevant labels in each model. In theoretical analysis we provide a bound on generalization error for the algorithm using Rademacher complexity. We apply our method to predict multimorbidity (co-occurrence of multiple diseases in one patient) using various medical diagnostic tests. The experiments carried out on a large clinical database (MIMIC III) show that parCC achieves higher accuracy than related methods when the number of features is limited. We also demonstrate the efficacy of the proposed method on a set of standard benchmark datasets.

Journal ArticleDOI
TL;DR: A data-driven approach to group users in a Non-Orthogonal Multiple Access (NOMA) MIMO setting by coupling a Classifier Chain with a Gradient Boosting Decision Tree (GBDT), namely, the LightGBM algorithm.
Abstract: In this article, we propose a data-driven approach to group users in a Non-Orthogonal Multiple Access (NOMA) MIMO setting. Specifically, we formulate user clustering as a multi-label classification problem and solve it by coupling a Classifier Chain (CC) with a Gradient Boosting Decision Tree (GBDT), namely, the LightGBM algorithm. The performance of the proposed CC-LightGBM scheme is assessed via numerical simulations. For benchmarking, we consider two classical adaptation learning schemes: Multi-Label k-Nearest Neighbours (ML-KNN) and Multi-Label Twin Support Vector Machines (ML-TSVM); as well as other naive approaches. Besides, we also compare the computational complexity of the proposed scheme with those of the aforementioned benchmarks.

Journal ArticleDOI
TL;DR: The quantitative and qualitative analysis of the obtained results shows that the multi-label classification approach provides meaningful and descriptive mineral maps and outperforms the single-label RF classification for the mineral mapping task.
Abstract: . A multi-label classification concept is introduced for the mineral mapping task in drill-core hyperspectral data analysis. As opposed to traditional classification methods, this approach has the advantage of considering the different mineral mixtures present in each pixel. For the multi-label classification, the well-known Classifier Chain method (CC) is implemented using the Random Forest (RF) algorithm as the base classifier. High-resolution mineralogical data obtained from Scanning Electron Microscopy (SEM) instrument equipped with the Mineral Liberation Analysis (MLA) software are used for generating the training data set. The drill-core hyperspectral data used in this paper cover the visible-near infrared (VNIR) and the short-wave infrared (SWIR) range of the electromagnetic spectrum. The quantitative and qualitative analysis of the obtained results shows that the multi-label classification approach provides meaningful and descriptive mineral maps and outperforms the single-label RF classification for the mineral mapping task.

Proceedings ArticleDOI
06 Nov 2020
TL;DR: This article explored methods of using neural network classifiers in the classifier chain model and tried to address some problems with such architecture while compare their performance on different types of data using different metrics with each other and with other well performing multi-label classification methods.
Abstract: Multi-label classification is a generalization of a multi-class classification problem where one entity can belong to more than one class from the class set. Recent works have proposed multiple methods of solving this problem that involves both statistical and deep learning methods. While methods exist for using deep learning models for this problem, most of them require the model to have a high dimension output vector and the property of inter-dependency of classes has not been explored. An ensemble of statistical models called the chain classifiers can be used to address these issues. This study explores methods of using neural network classifiers in the classifier chain model and tries to address some problems with such architecture while compare their performance on different types of data using different metrics with each other and with other well performing multi-label classification methods.

Book ChapterDOI
19 Oct 2020
TL;DR: This work combines the boosting of extreme gradient boosted trees with the concept of dynamic classifier chains (DCC), an effective and scalable state-of-the-art technique, and incorporates DCC in a fast multi-label extension of XGBoost which is made publicly available.
Abstract: Classifier chains is a key technique in multi-label classification, since it allows to consider label dependencies effectively. However, the classifiers are aligned according to a static order of the labels. In the concept of dynamic classifier chains (DCC) the label ordering is chosen for each prediction dynamically depending on the respective instance at hand. We combine this concept with the boosting of extreme gradient boosted trees (XGBoost), an effective and scalable state-of-the-art technique, and incorporate DCC in a fast multi-label extension of XGBoost which we make publicly available. As only positive labels have to be predicted and these are usually only few, the training costs can be further substantially reduced. Moreover, as experiments on eleven datasets show, the length of the chain allows for more control over the usage of previous predictions and hence over the measure one wants to optimize.

Proceedings ArticleDOI
21 Oct 2020
TL;DR: In this article, the authors used Binary relevance (BR), Classifier Chains (CC), Random K-Iabelsets (RAkEL), Multi-Label k-Nearest Neighbor (ML-KNN) and Multi-label classification was used in this research.
Abstract: Non-communicable diseases: NCDs are one of leading causes of death in the world. Multi-NCDs patients tend to undergo and suffer from multiple coexistent diseases. This research aims at classifying NCDs patients who are diagnosed with other NCDs. Multi-label classification was used in this research. There are four diseases types used in this study, i.e. diabetes, hyper-tension, cardiovascular and stroke. Binary relevance (BR), Classifier Chains (CC), The random k-Iabelsets (RAkEL) and Multi-Label k-Nearest Neighbor (ML-KNN) are adopted to transform Multi-NCDs to disease label. The experiments are conducted on the physical examination datasets collected from electronic health records. In the experiments, the comparative results of the techniques are demonstrated. The result showed that the RAkEL method outperformed other methods and achieved the best accuracy of 91.07%.

Journal Article
TL;DR: This work is analyzing the past available data to predict the tags automatically based on the question a user enters which increases the enhancement of user experience.
Abstract: Nowadays data plays a major role in every aspect of our life. The past data that is available can be used for analysis and to predict the future. For websites that are based on learning, the old data which the users are posting and tagging can be used to analyze and predict what new implementations can be done to increase the user experience. Similarly, Stack Overflow is the largest learning forum that is used by most of the developers to learn and share their programming knowledge. To post a question, users need to enter the tags related to the question manually. Here we are analyzing the past available data to predict the tags automatically based on the question a user enters which increases the enhancement of user experience.

Posted Content
Bohlender, Simon, Loza Mencia, Eneldo, Kulessa, Moritz 
TL;DR: In this paper, the authors combine the concept of dynamic classifier chains (DCC) with the boosting of extreme gradient boosted trees (XGBoost), an effective and scalable state-of-the-art technique.
Abstract: Classifier chains is a key technique in multi-label classification, since it allows to consider label dependencies effectively. However, the classifiers are aligned according to a static order of the labels. In the concept of dynamic classifier chains (DCC) the label ordering is chosen for each prediction dynamically depending on the respective instance at hand. We combine this concept with the boosting of extreme gradient boosted trees (XGBoost), an effective and scalable state-of-the-art technique, and incorporate DCC in a fast multi-label extension of XGBoost which we make publicly available. As only positive labels have to be predicted and these are usually only few, the training costs can be further substantially reduced. Moreover, as experiments on eleven datasets show, the length of the chain allows for a more control over the usage of previous predictions and hence over the measure one want to optimize.