Open AccessJournal ArticleDOI

An up-to-date comparison of state-of-the-art classification algorithms

- 01 Oct 2017 -

- Vol. 82, Iss: 82, pp 128-150

Chats0

TLDR

It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines and Random Forests, while being the fastest algorithm in terms of prediction efficiency.

Abstract:

Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.We compare the accuracy of 11 classification algorithms pairwise and groupwise.We examine separately the training, parameter-tuning, and testing time.GBDT and Random Forests yield highest accuracy, outperforming SVM.GBDT is the fastest in testing, Naive Bayes the fastest in training. Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

Fig. 20. Box plots for AUC (raw values).

Table 15 Training time (in seconds) efficiency results for different classification algorithms.

Fig. 19. The number of data sets on which each classifier achieves the best accuracy, grouped by the number of features.

Fig. 18. The number of data sets on which each classifier achieves the best accuracy, grouped by the number of classes.

Fig. 22. Box plots for AUC (mean ranks).

Table 6 Accuracy results for different classification algorithms on 71 data sets.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Review of Classification Algorithms for EEG-based Brain-Computer Interfaces: A 10-year Update

Fabien Lotte, +7 more

- 16 Apr 2018 -

Journal of Neural Engineering

TL;DR: A comprehensive overview of the modern classification algorithms used in EEG-based BCIs is provided, the principles of these methods and guidelines on when and how to use them are presented, and a number of challenges to further advance EEG classification in BCI are identified.

...read moreread less

Journal ArticleDOI

A comparative analysis of gradient boosting algorithms

Candice Bentéjac, +2 more

- 01 Mar 2021 -

Artificial Intelligence Review

TL;DR: A comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed and indicates that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small.

...read moreread less

Journal ArticleDOI

Deep Learning Fault Diagnosis Method Based on Global Optimization GAN for Unbalanced Data

Funa Zhou, +5 more

- 01 Jan 2020 -

Knowledge Based Systems

TL;DR: New generator and discriminator of Generative Adversarial Network (GAN) are designed in this paper to generate more discriminant fault samples using a scheme of global optimization to solve the problem of unbalanced fault samples.

...read moreread less

Journal ArticleDOI

Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network

Wei Wang, +2 more

- 01 Aug 2019 -

Journal of Ambient Intelligence and Huma...

TL;DR: This work reconstructs the high-dimensional features of Android applications (apps) and employ multiple CNN to detect Android malware and proposes a hybrid model based on deep autoencoder (DAE) and convolutional neural network (CNN), which shows powerful ability in feature extraction and malware detection.

...read moreread less

Journal ArticleDOI

MTES: An Intelligent Trust Evaluation Scheme in Sensor-Cloud-Enabled Industrial Internet of Things

Tian Wang, +4 more

- 01 Mar 2020 -

IEEE Transactions on Industrial Informat...

TL;DR: A mobile edge computing-based intelligent trust evaluation scheme is proposed to comprehensively evaluate the trustworthiness of sensor nodes using probabilistic graphical model and can effectively ensure the trustworthy of sensor node nodes and decrease the energy consumption.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Evaluating multiple classifiers for stock price direction prediction

Michel Ballings, +3 more

- 15 Nov 2015 -

Expert Systems With Applications

TL;DR: The results indicate that Random Forest is the top algorithm followed by Support Vector Machines, Kernel Factory, AdaBoost, Neural Networks, K-Nearest Neighbors and Logistic Regression in the domain of stock price direction prediction.

...read moreread less

Journal ArticleDOI

On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c

George Udny Yule

- 01 Jan 1900 -

Philosophical transactions - Royal Socie...

TL;DR: In the ordinary theory of statistical correlation, normal or otherwise, material susceptible of continuous variation, or at least of variation by a considerable number of discontinuous steps, certain practical cases arise, where either no variation is thinkable at all, or else is not measured or possibly measurable.

...read moreread less

Journal ArticleDOI

Statlog: comparison of classification algorithms on large real-world problems

Ross D. King, +2 more

- 01 May 1995 -

Applied Artificial Intelligence

TL;DR: A set of data set descriptors is developed to help decide which algorithms are suited to particular data sets, including data sets with extreme distributions and with many binary/categorical attributes.

...read moreread less

Proceedings ArticleDOI

Stochastic gradient boosted distributed decision trees

Jerry Ye, +3 more

TL;DR: Two different distributed methods that generates exact stochastic GBDT models are presented, the first is a MapReduce implementation and the second utilizes MPI on the Hadoop grid environment.

...read moreread less

Proceedings ArticleDOI

An Empirical Study of Learning from Imbalanced Data Using Random Forest

Taghi M. Khoshgoftaar, +2 more

TL;DR: A comprehensive suite of experiments that analyze the performance of the random forest (RF) learner implemented in Weka are discussed, providing an extensive empirical evaluation of RF learners built from imbalanced data.

...read moreread less

Collapse

Random Forests

Leo Breiman

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

- 01 Oct 2001 -

Annals of Statistics

UCI Machine Learning Repository

A. Asuncion

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Yoav Freund, +1 more

Frequently Asked Questions (2)

Q1. What have the authors contributed in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, the authors carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine ( ELM ), Sparse Representation based Classification ( SRC ), and Deep Learning ( DL ), which have not been thoroughly investigated in existing comparative studies.

Q2. What have the authors stated for future works in "An up-to-date comparison of state- of-the-art classification algorithms" ?

In the future work, the authors will further investigate the performance of the 11 classifiers in specific application domains and with different feature selection methods.

An up-to-date comparison of state-of-the-art classification algorithms

Figures

Citations

A Review of Classification Algorithms for EEG-based Brain-Computer Interfaces: A 10-year Update

A comparative analysis of gradient boosting algorithms

Deep Learning Fault Diagnosis Method Based on Global Optimization GAN for Unbalanced Data

Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network

MTES: An Intelligent Trust Evaluation Scheme in Sensor-Cloud-Enabled Industrial Internet of Things

References

Evaluating multiple classifiers for stock price direction prediction

On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c

Statlog: comparison of classification algorithms on large real-world problems

Stochastic gradient boosted distributed decision trees

An Empirical Study of Learning from Imbalanced Data Using Random Forest

Related Papers (5)

Random Forests

Scikit-learn: Machine Learning in Python

Greedy function approximation: A gradient boosting machine.

UCI Machine Learning Repository

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Frequently Asked Questions (2)

Q1. What have the authors contributed in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Q2. What have the authors stated for future works in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Trending Questions (1)