scispace - formally typeset
Open AccessJournal ArticleDOI

An up-to-date comparison of state-of-the-art classification algorithms

Reads0
Chats0
TLDR
It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines and Random Forests, while being the fastest algorithm in terms of prediction efficiency.
Abstract
Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.We compare the accuracy of 11 classification algorithms pairwise and groupwise.We examine separately the training, parameter-tuning, and testing time.GBDT and Random Forests yield highest accuracy, outperforming SVM.GBDT is the fastest in testing, Naive Bayes the fastest in training. Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

read more

Citations
More filters
Journal ArticleDOI

A Review of Classification Algorithms for EEG-based Brain-Computer Interfaces: A 10-year Update

TL;DR: A comprehensive overview of the modern classification algorithms used in EEG-based BCIs is provided, the principles of these methods and guidelines on when and how to use them are presented, and a number of challenges to further advance EEG classification in BCI are identified.
Journal ArticleDOI

A comparative analysis of gradient boosting algorithms

TL;DR: A comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed and indicates that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small.
Journal ArticleDOI

Deep Learning Fault Diagnosis Method Based on Global Optimization GAN for Unbalanced Data

TL;DR: New generator and discriminator of Generative Adversarial Network (GAN) are designed in this paper to generate more discriminant fault samples using a scheme of global optimization to solve the problem of unbalanced fault samples.
Journal ArticleDOI

Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network

TL;DR: This work reconstructs the high-dimensional features of Android applications (apps) and employ multiple CNN to detect Android malware and proposes a hybrid model based on deep autoencoder (DAE) and convolutional neural network (CNN), which shows powerful ability in feature extraction and malware detection.
Journal ArticleDOI

MTES: An Intelligent Trust Evaluation Scheme in Sensor-Cloud-Enabled Industrial Internet of Things

TL;DR: A mobile edge computing-based intelligent trust evaluation scheme is proposed to comprehensively evaluate the trustworthiness of sensor nodes using probabilistic graphical model and can effectively ensure the trustworthy of sensor node nodes and decrease the energy consumption.
References
More filters
Journal ArticleDOI

Evaluating multiple classifiers for stock price direction prediction

TL;DR: The results indicate that Random Forest is the top algorithm followed by Support Vector Machines, Kernel Factory, AdaBoost, Neural Networks, K-Nearest Neighbors and Logistic Regression in the domain of stock price direction prediction.
Journal ArticleDOI

On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c

TL;DR: In the ordinary theory of statistical correlation, normal or otherwise, material susceptible of continuous variation, or at least of variation by a considerable number of discontinuous steps, certain practical cases arise, where either no variation is thinkable at all, or else is not measured or possibly measurable.
Journal ArticleDOI

Statlog: comparison of classification algorithms on large real-world problems

TL;DR: A set of data set descriptors is developed to help decide which algorithms are suited to particular data sets, including data sets with extreme distributions and with many binary/categorical attributes.
Proceedings ArticleDOI

Stochastic gradient boosted distributed decision trees

TL;DR: Two different distributed methods that generates exact stochastic GBDT models are presented, the first is a MapReduce implementation and the second utilizes MPI on the Hadoop grid environment.
Proceedings ArticleDOI

An Empirical Study of Learning from Imbalanced Data Using Random Forest

TL;DR: A comprehensive suite of experiments that analyze the performance of the random forest (RF) learner implemented in Weka are discussed, providing an extensive empirical evaluation of RF learners built from imbalanced data.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What have the authors contributed in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, the authors carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine ( ELM ), Sparse Representation based Classification ( SRC ), and Deep Learning ( DL ), which have not been thoroughly investigated in existing comparative studies. 

In the future work, the authors will further investigate the performance of the 11 classifiers in specific application domains and with different feature selection methods. 

Trending Questions (1)
What are the state of the art video classification algorithms?

The paper does not mention any specific video classification algorithms.