Open AccessJournal ArticleDOI

An up-to-date comparison of state-of-the-art classification algorithms

- 01 Oct 2017 -

- Vol. 82, Iss: 82, pp 128-150

Chats0

TLDR

It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines and Random Forests, while being the fastest algorithm in terms of prediction efficiency.

Abstract:

Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.We compare the accuracy of 11 classification algorithms pairwise and groupwise.We examine separately the training, parameter-tuning, and testing time.GBDT and Random Forests yield highest accuracy, outperforming SVM.GBDT is the fastest in testing, Naive Bayes the fastest in training. Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

Fig. 20. Box plots for AUC (raw values).

Table 15 Training time (in seconds) efficiency results for different classification algorithms.

Fig. 19. The number of data sets on which each classifier achieves the best accuracy, grouped by the number of features.

Fig. 18. The number of data sets on which each classifier achieves the best accuracy, grouped by the number of classes.

Fig. 22. Box plots for AUC (mean ranks).

Table 6 Accuracy results for different classification algorithms on 71 data sets.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Incorporating Distance-Based Top-n-gram and Random Forest To Identify Electron Transport Proteins

Xiaoqing Ru, +2 more

- 28 May 2019 -

Journal of Proteome Research

TL;DR: Under the 10-fold cross-validation of the model constructed in this study, sensitivity, specificity, and accuracy rates surpassed 85%, 80%, and 82%, respectively, and indicated that the classification model built is an effective tool in identifying electron transport proteins.

...read moreread less

Journal ArticleDOI

Automated detection of driver fatigue based on AdaBoost classifier with EEG signals

Jianfeng Hu

- 03 Aug 2017 -

Frontiers in Computational Neuroscience

TL;DR: By using combination of FE features and AdaBoost classifier to detect EEG-based driver fatigue, this paper ensured confidence in exploring the inherent physiological mechanisms and wearable application.

...read moreread less

Journal ArticleDOI

Thermal Imaging and Vibration-Based Multisensor Fault Detection for Rotating Machinery

Olivier Janssens, +2 more

- 01 Jan 2019 -

IEEE Transactions on Industrial Informat...

TL;DR: A multisensor system is proposed that not only uses infrared thermal imaging data, but also vibration measurements for automatic condition and fault detection in rotating machinery and it is shown that by combining these two types of sensor data, several conditions/faults and combinations can be detected more accurately than when considering the sensor streams individually.

...read moreread less

Journal ArticleDOI

Machine learning approach for risk-based inspection screening assessment

Andika Rachman, +1 more

- 01 May 2019 -

Reliability Engineering & System Safety

TL;DR: The result shows that the application of the machine learning approach potentially improves the quality of the conventional RBI screening assessment output by reducing output variability and increasing accuracy and precision.

...read moreread less

Journal ArticleDOI

Two-Stage DEA in Banks: Terminological Controversies and Future Directions

Iago Cotrim Henriques, +3 more

- 15 Dec 2020 -

Expert Systems With Applications

TL;DR: A systematic review of the literature on the topic focusing on the banking industry indicates the lack of a uniform or universal terminology for two-stage DEA models in the baking industry.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Random Forests

Leo Breiman

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

LIBSVM: A library for support vector machines

Chih-Chung Chang, +1 more

- 06 May 2011 -

ACM Transactions on Intelligent Systems ...

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

Journal ArticleDOI

Support-Vector Networks

Corinna Cortes, +1 more

- 15 Sep 1995 -

Machine Learning

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

Book

C4.5: Programs for Machine Learning

J. Ross Quinlan

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read moreread less

Collapse

Random Forests

Leo Breiman

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

- 01 Oct 2001 -

Annals of Statistics

UCI Machine Learning Repository

A. Asuncion

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Yoav Freund, +1 more

Frequently Asked Questions (2)

Q1. What have the authors contributed in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, the authors carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine ( ELM ), Sparse Representation based Classification ( SRC ), and Deep Learning ( DL ), which have not been thoroughly investigated in existing comparative studies.

Q2. What have the authors stated for future works in "An up-to-date comparison of state- of-the-art classification algorithms" ?

In the future work, the authors will further investigate the performance of the 11 classifiers in specific application domains and with different feature selection methods.

An up-to-date comparison of state-of-the-art classification algorithms

Figures

Citations

Incorporating Distance-Based Top-n-gram and Random Forest To Identify Electron Transport Proteins

Automated detection of driver fatigue based on AdaBoost classifier with EEG signals

Thermal Imaging and Vibration-Based Multisensor Fault Detection for Rotating Machinery

Machine learning approach for risk-based inspection screening assessment

Two-Stage DEA in Banks: Terminological Controversies and Future Directions

References

Random Forests

ImageNet Classification with Deep Convolutional Neural Networks

LIBSVM: A library for support vector machines

Support-Vector Networks

C4.5: Programs for Machine Learning

Related Papers (5)

Random Forests

Scikit-learn: Machine Learning in Python

Greedy function approximation: A gradient boosting machine.

UCI Machine Learning Repository

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Frequently Asked Questions (2)

Q1. What have the authors contributed in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Q2. What have the authors stated for future works in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Trending Questions (1)