Adaptive Random Forests with Resampling for Imbalanced data Streams

doi:10.1109/IJCNN.2019.8852027

Proceedings ArticleDOI

Adaptive Random Forests with Resampling for Imbalanced data Streams

- pp 1-6

TLDR

This work presents the Adaptive Random Forest with Resampling (ARFRE), which is a classifier designed to deal with imbalanced datasets and shows that the proposed method can considerably improve the performance of the minority class(es) while avoiding degrading the performance in the majority class.

Abstract:

The large volume of data generated by computer networks, smartphones, wearables and a wide range of sensors, which produce real-time data, are only useful if they can be efficiently processed so that individuals can make timely decisions based on them. In this context, machine learning techniques are widely used. While it performs better than humans in such tasks, every machine learning algorithm has a certain intrinsic bias, which means they assume that the data have specific characteristics, such as having a balanced distribution between classes. As many real-world applications present imbalanced traits in their data, this topic is gaining repercussion over time. In this work, we present the Adaptive Random Forest with Resampling (ARF RE ), which is a classifier designed to deal with imbalanced datasets. ARF RE resample the instances based on the current class label distribution. We show through a set of extensive experiments on seven datasets that the proposed method can considerably improve the performance of the minority class(es) while avoiding degrading the performance in the majority class. On top of that, ARF RE is more efficient regarding execution time in comparison to the standard ARF algorithm.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm

Zeng Li, +4 more

- 11 May 2020 -

Knowledge Based Systems

TL;DR: A chunk-based incremental ensemble algorithm called Dynamic Updated Ensemble (DUE) for learning imbalanced data streams with concept drift, which can timely react to multiple kinds of concept drifts and keep a limited number of classifiers to ensure high efficiency.

...read moreread less

Journal ArticleDOI

A comprehensive active learning method for multiclass imbalanced data streams with concept drift

Weike Liu, +4 more

- 14 Jan 2021 -

Knowledge Based Systems

TL;DR: In this paper, an ensemble classifier, a drift detector, a label sliding window, sample sliding windows and an initialization training sample sequence are designed to comprehensively address the problem that a given class can simultaneously be a majority to a given subset of classes while also being a minority to others.

...read moreread less

Journal ArticleDOI

Lessons learned from data stream classification applied to credit scoring

Jean Paul Barddal, +3 more

- 30 Dec 2020 -

Expert Systems With Applications

TL;DR: Both traditional batch machine learning algorithms with data stream algorithms in different validation schemes using both Kolmogorov–Smirnov and Population Stability Index metrics are compared, showing the efficiency of data stream classification for the credit scoring task.

...read moreread less

Journal ArticleDOI

Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies

Dylan Vassallo, +2 more

TL;DR: In this paper, the authors proposed Adaptive Stacked eXtreme Gradient Boosting (ASXGB), an adaptation of XGBoost to better handle dynamic environments and presented a comparative analysis of various offline decision tree-based ensembles and heuristic-based data-sampling techniques.

...read moreread less

Journal ArticleDOI

ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams

Alberto Cano, +1 more

- 20 Apr 2022 -

Machine Learning

TL;DR: In this paper , a robust online self-adjusting ensemble (ROSE) classifier is proposed to detect concept drift and create a background ensemble for faster adaptation to changes in data streams.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Random Forests

Leo Breiman

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

UCI Machine Learning Repository

A. Asuncion

Journal ArticleDOI

Bagging predictors

Leo Breiman

TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.

...read moreread less

Journal ArticleDOI

Learning from Imbalanced Data

Haibo He, +1 more

- 01 Sep 2009 -

IEEE Transactions on Knowledge and Data ...

TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.

...read moreread less

Journal Article

Supervised Machine Learning: A Review of Classification Techniques

Sotiris Kotsiantis

- 01 Jan 2007 -

Informatica (lithuanian Academy of Scien...

TL;DR: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features, and the resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown.

...read moreread less