scispace - formally typeset
Proceedings ArticleDOI

Adaptive Random Forests with Resampling for Imbalanced data Streams

TLDR
This work presents the Adaptive Random Forest with Resampling (ARFRE), which is a classifier designed to deal with imbalanced datasets and shows that the proposed method can considerably improve the performance of the minority class(es) while avoiding degrading the performance in the majority class.
Abstract
The large volume of data generated by computer networks, smartphones, wearables and a wide range of sensors, which produce real-time data, are only useful if they can be efficiently processed so that individuals can make timely decisions based on them. In this context, machine learning techniques are widely used. While it performs better than humans in such tasks, every machine learning algorithm has a certain intrinsic bias, which means they assume that the data have specific characteristics, such as having a balanced distribution between classes. As many real-world applications present imbalanced traits in their data, this topic is gaining repercussion over time. In this work, we present the Adaptive Random Forest with Resampling (ARF RE ), which is a classifier designed to deal with imbalanced datasets. ARF RE resample the instances based on the current class label distribution. We show through a set of extensive experiments on seven datasets that the proposed method can considerably improve the performance of the minority class(es) while avoiding degrading the performance in the majority class. On top of that, ARF RE is more efficient regarding execution time in comparison to the standard ARF algorithm.

read more

Citations
More filters
Journal ArticleDOI

Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm

TL;DR: A chunk-based incremental ensemble algorithm called Dynamic Updated Ensemble (DUE) for learning imbalanced data streams with concept drift, which can timely react to multiple kinds of concept drifts and keep a limited number of classifiers to ensure high efficiency.
Journal ArticleDOI

A comprehensive active learning method for multiclass imbalanced data streams with concept drift

TL;DR: In this paper, an ensemble classifier, a drift detector, a label sliding window, sample sliding windows and an initialization training sample sequence are designed to comprehensively address the problem that a given class can simultaneously be a majority to a given subset of classes while also being a minority to others.
Journal ArticleDOI

Lessons learned from data stream classification applied to credit scoring

TL;DR: Both traditional batch machine learning algorithms with data stream algorithms in different validation schemes using both Kolmogorov–Smirnov and Population Stability Index metrics are compared, showing the efficiency of data stream classification for the credit scoring task.
Journal ArticleDOI

Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies

TL;DR: In this paper, the authors proposed Adaptive Stacked eXtreme Gradient Boosting (ASXGB), an adaptation of XGBoost to better handle dynamic environments and presented a comparative analysis of various offline decision tree-based ensembles and heuristic-based data-sampling techniques.
Journal ArticleDOI

ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams

Alberto Cano, +1 more
- 20 Apr 2022 - 
TL;DR: In this paper , a robust online self-adjusting ensemble (ROSE) classifier is proposed to detect concept drift and create a background ensemble for faster adaptation to changes in data streams.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

Bagging predictors

Leo Breiman
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Journal ArticleDOI

Learning from Imbalanced Data

TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.
Journal Article

Supervised Machine Learning: A Review of Classification Techniques

TL;DR: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features, and the resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown.
Related Papers (5)