scispace - formally typeset
Open AccessJournal ArticleDOI

An up-to-date comparison of state-of-the-art classification algorithms

Reads0
Chats0
TLDR
It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines and Random Forests, while being the fastest algorithm in terms of prediction efficiency.
Abstract
Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.We compare the accuracy of 11 classification algorithms pairwise and groupwise.We examine separately the training, parameter-tuning, and testing time.GBDT and Random Forests yield highest accuracy, outperforming SVM.GBDT is the fastest in testing, Naive Bayes the fastest in training. Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

read more

Citations
More filters
Journal ArticleDOI

A Review of Classification Algorithms for EEG-based Brain-Computer Interfaces: A 10-year Update

TL;DR: A comprehensive overview of the modern classification algorithms used in EEG-based BCIs is provided, the principles of these methods and guidelines on when and how to use them are presented, and a number of challenges to further advance EEG classification in BCI are identified.
Journal ArticleDOI

A comparative analysis of gradient boosting algorithms

TL;DR: A comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed and indicates that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small.
Journal ArticleDOI

Deep Learning Fault Diagnosis Method Based on Global Optimization GAN for Unbalanced Data

TL;DR: New generator and discriminator of Generative Adversarial Network (GAN) are designed in this paper to generate more discriminant fault samples using a scheme of global optimization to solve the problem of unbalanced fault samples.
Journal ArticleDOI

Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network

TL;DR: This work reconstructs the high-dimensional features of Android applications (apps) and employ multiple CNN to detect Android malware and proposes a hybrid model based on deep autoencoder (DAE) and convolutional neural network (CNN), which shows powerful ability in feature extraction and malware detection.
Journal ArticleDOI

MTES: An Intelligent Trust Evaluation Scheme in Sensor-Cloud-Enabled Industrial Internet of Things

TL;DR: A mobile edge computing-based intelligent trust evaluation scheme is proposed to comprehensively evaluate the trustworthiness of sensor nodes using probabilistic graphical model and can effectively ensure the trustworthy of sensor node nodes and decrease the energy consumption.
References
More filters

A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods

TL;DR: This paper discusses non-PSD kernels through the viewpoint of separability, and shows that the sigmoid kernel matrix is conditionally positive definite (CPD) in certain parameters and thus are valid kernels there.
Book ChapterDOI

Robust Feature Selection Using Ensemble Feature Selection Techniques

TL;DR: It is shown that ensemble feature selection techniques show great promise for high-dimensional domains with small sample sizes, and provide more robust feature subsets than a single feature selection technique.
Journal ArticleDOI

An experimental comparison of classification algorithms for imbalanced credit scoring data sets

TL;DR: The results from this empirical study indicate that the random forest and gradient boosting classifiers perform very well in a credit scoring context and are able to cope comparatively well with pronounced class imbalances in these data sets.
Journal ArticleDOI

A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring

TL;DR: A sequential ensemble credit scoring model based on a variant of gradient boosting machine (i.e., extreme gradient boosting (XGBoost) is proposed, which demonstrates that Bayesian hyper-parameter optimization performs better than random search, grid search, and manual search.

Yahoo! Learning to Rank Challenge Overview

Olivier Chapelle, +1 more
TL;DR: This paper provides an overview and an analysis of this challenge, along with a detailed description of the released datasets, used internally at Yahoo! for learning the web search ranking function.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What have the authors contributed in "An up-to-date comparison of state- of-the-art classification algorithms" ?

Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, the authors carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine ( ELM ), Sparse Representation based Classification ( SRC ), and Deep Learning ( DL ), which have not been thoroughly investigated in existing comparative studies. 

In the future work, the authors will further investigate the performance of the 11 classifiers in specific application domains and with different feature selection methods. 

Trending Questions (1)
What are the state of the art video classification algorithms?

The paper does not mention any specific video classification algorithms.