Open AccessJournal ArticleDOI

An up-to-date comparison of state-of-the-art classification algorithms

- 01 Oct 2017 -

- Vol. 82, Iss: 82, pp 128-150

Chats0

TLDR

It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines and Random Forests, while being the fastest algorithm in terms of prediction efficiency.

Abstract:

Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.We compare the accuracy of 11 classification algorithms pairwise and groupwise.We examine separately the training, parameter-tuning, and testing time.GBDT and Random Forests yield highest accuracy, outperforming SVM.GBDT is the fastest in testing, Naive Bayes the fastest in training. Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

Fig. 20. Box plots for AUC (raw values).

Table 15 Training time (in seconds) efficiency results for different classification algorithms.

Fig. 19. The number of data sets on which each classifier achieves the best accuracy, grouped by the number of features.

Fig. 18. The number of data sets on which each classifier achieves the best accuracy, grouped by the number of classes.

Fig. 22. Box plots for AUC (mean ranks).

Table 6 Accuracy results for different classification algorithms on 71 data sets.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Review of Classification Algorithms for EEG-based Brain-Computer Interfaces: A 10-year Update

Fabien Lotte, +7 more

- 16 Apr 2018 -

Journal of Neural Engineering

TL;DR: A comprehensive overview of the modern classification algorithms used in EEG-based BCIs is provided, the principles of these methods and guidelines on when and how to use them are presented, and a number of challenges to further advance EEG classification in BCI are identified.

...read moreread less

Journal ArticleDOI

A comparative analysis of gradient boosting algorithms

Candice Bentéjac, +2 more

- 01 Mar 2021 -

Artificial Intelligence Review

TL;DR: A comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed and indicates that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small.

...read moreread less

Journal ArticleDOI

Deep Learning Fault Diagnosis Method Based on Global Optimization GAN for Unbalanced Data

Funa Zhou, +5 more

- 01 Jan 2020 -

Knowledge Based Systems

TL;DR: New generator and discriminator of Generative Adversarial Network (GAN) are designed in this paper to generate more discriminant fault samples using a scheme of global optimization to solve the problem of unbalanced fault samples.

...read moreread less

Journal ArticleDOI

Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network

Wei Wang, +2 more

- 01 Aug 2019 -

Journal of Ambient Intelligence and Huma...

TL;DR: This work reconstructs the high-dimensional features of Android applications (apps) and employ multiple CNN to detect Android malware and proposes a hybrid model based on deep autoencoder (DAE) and convolutional neural network (CNN), which shows powerful ability in feature extraction and malware detection.

...read moreread less

Journal ArticleDOI

MTES: An Intelligent Trust Evaluation Scheme in Sensor-Cloud-Enabled Industrial Internet of Things

Tian Wang, +4 more

- 01 Mar 2020 -

IEEE Transactions on Industrial Informat...

TL;DR: A mobile edge computing-based intelligent trust evaluation scheme is proposed to comprehensively evaluate the trustworthiness of sensor nodes using probabilistic graphical model and can effectively ensure the trustworthy of sensor node nodes and decrease the energy consumption.

...read moreread less

Collapse

References

PDF

Open Access

More filters

A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods

Hsuan-Tien Lin

TL;DR: This paper discusses non-PSD kernels through the viewpoint of separability, and shows that the sigmoid kernel matrix is conditionally positive definite (CPD) in certain parameters and thus are valid kernels there.

...read moreread less

Book ChapterDOI

Robust Feature Selection Using Ensemble Feature Selection Techniques

Yvan Saeys, +2 more

TL;DR: It is shown that ensemble feature selection techniques show great promise for high-dimensional domains with small sample sizes, and provide more robust feature subsets than a single feature selection technique.

...read moreread less

Journal ArticleDOI

An experimental comparison of classification algorithms for imbalanced credit scoring data sets

Iain Brown, +1 more

- 01 Feb 2012 -

Expert Systems With Applications

TL;DR: The results from this empirical study indicate that the random forest and gradient boosting classifiers perform very well in a credit scoring context and are able to cope comparatively well with pronounced class imbalances in these data sets.

...read moreread less

Journal ArticleDOI

A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring

Yufei Xia, +3 more

- 15 Jul 2017 -

Expert Systems With Applications

TL;DR: A sequential ensemble credit scoring model based on a variant of gradient boosting machine (i.e., extreme gradient boosting (XGBoost) is proposed, which demonstrates that Bayesian hyper-parameter optimization performs better than random search, grid search, and manual search.

...read moreread less

Yahoo! Learning to Rank Challenge Overview

Olivier Chapelle, +1 more

TL;DR: This paper provides an overview and an analysis of this challenge, along with a detailed description of the released datasets, used internally at Yahoo! for learning the web search ranking function.

...read moreread less

Collapse

Random Forests

Leo Breiman

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

- 01 Oct 2001 -

Annals of Statistics

UCI Machine Learning Repository

A. Asuncion

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Yoav Freund, +1 more

Frequently Asked Questions (2)

Q1. What have the authors contributed in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, the authors carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine ( ELM ), Sparse Representation based Classification ( SRC ), and Deep Learning ( DL ), which have not been thoroughly investigated in existing comparative studies.

Q2. What have the authors stated for future works in "An up-to-date comparison of state- of-the-art classification algorithms" ?

In the future work, the authors will further investigate the performance of the 11 classifiers in specific application domains and with different feature selection methods.

An up-to-date comparison of state-of-the-art classification algorithms

Figures

Citations

A Review of Classification Algorithms for EEG-based Brain-Computer Interfaces: A 10-year Update

A comparative analysis of gradient boosting algorithms

Deep Learning Fault Diagnosis Method Based on Global Optimization GAN for Unbalanced Data

Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network

MTES: An Intelligent Trust Evaluation Scheme in Sensor-Cloud-Enabled Industrial Internet of Things

References

A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods

Robust Feature Selection Using Ensemble Feature Selection Techniques

An experimental comparison of classification algorithms for imbalanced credit scoring data sets

A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring

Yahoo! Learning to Rank Challenge Overview

Related Papers (5)

Random Forests

Scikit-learn: Machine Learning in Python

Greedy function approximation: A gradient boosting machine.

UCI Machine Learning Repository

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Frequently Asked Questions (2)

Q1. What have the authors contributed in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Q2. What have the authors stated for future works in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Trending Questions (1)