Concept drift detection for online class imbalance learning

doi:10.1109/IJCNN.2013.6706768

Proceedings ArticleDOI

Concept drift detection for online class imbalance learning

Shuo Wang, +5 more

- pp 1-10

Chats0

TLDR

The analysis reveals that detecting drift in imbalanced data streams is a more difficult task than in balanced ones, and proposes a new detection method DDM-OCI derived from the existing methodDDM.

Abstract:

Concept drift detection methods are crucial components of many online learning approaches. Accurate drift detections allow prompt reaction to drifts and help to maintain high performance of online models over time. Although many methods have been proposed, no attention has been given to data streams with imbalanced class distributions, which commonly exist in real-world applications, such as fault diagnosis of control systems and intrusion detection in computer networks. This paper studies the concept drift problem for online class imbalance learning. We look into the impact of concept drift on single-class performance of online models based on three types of classifiers, under seven different scenarios with the presence of class imbalance. The analysis reveals that detecting drift in imbalanced data streams is a more difficult task than in balanced ones. Minority-class recall suffers from a significant drop after the drift involving the minority class. Overall accuracy is not suitable for drift detection. Based on the findings, we propose a new detection method DDM-OCI derived from the existing method DDM. DDM-OCI monitors minority-class recall online to capture the drift. The results show a quick response of the online model working with DDM-OCI to the new concept.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

Alberto Fernández, +3 more

- 01 Jan 2018 -

Journal of Artificial Intelligence Resea...

TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.

...read moreread less

Journal ArticleDOI

Characterizing concept drift

Geoffrey I. Webb, +4 more

- 01 Jul 2016 -

Data Mining and Knowledge Discovery

TL;DR: This work presents the first comprehensive framework for quantitative analysis of drift, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.

...read moreread less

Journal ArticleDOI

Resampling-Based Ensemble Methods for Online Class Imbalance Learning

Shuo Wang, +2 more

- 01 May 2015 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This paper gives the first comprehensive analysis of class imbalance in data streams, in terms of data distributions, imbalance rates and changes in class imbalance status, and proposes two new ensemble methods that maintain both OOB and UOB with adaptive weights for final predictions, called WEOB1 and WEOb2.

...read moreread less

Posted Content

A Systematic Study of Online Class Imbalance Learning with Concept Drift

Shuo Wang, +2 more

- 20 Mar 2017 -

arXiv: Learning

TL;DR: This paper first provides a comprehensive review of current research progress in this field, including current research focuses and open challenges, and an in-depth experimental study is performed, with the goal of understanding how to best overcome concept drift in online learning with class imbalance.

...read moreread less

Journal ArticleDOI

A Survey of Stealth Malware Attacks, Mitigation Measures, and Steps Toward Autonomous Open World Solutions

Ethan M. Rudd, +3 more

- 22 Jan 2017 -

IEEE Communications Surveys and Tutorial...

TL;DR: In this paper, the authors present a formalized adaptive open world framework for stealth malware recognition and relate it mathematically to research from other machine learning domains and suggest that several flawed assumptions inherent to most recognition algorithms prevent a direct mapping between the stealth malware detection problem and a machine learning solution.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Bagging predictors

Leo Breiman

TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.

...read moreread less

Journal ArticleDOI

Learning from Imbalanced Data

Haibo He, +1 more

- 01 Sep 2009 -

IEEE Transactions on Knowledge and Data ...

TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.

...read moreread less

Proceedings ArticleDOI

Mining time-changing data streams

Geoff Hulten, +2 more

TL;DR: An efficient algorithm for mining decision trees from continuously-changing data streams, based on the ultra-fast VFDT decision tree learner is proposed, called CVFDT, which stays current while making the most of old data by growing an alternative subtree whenever an old one becomes questionable, and replacing the old with the new when the new becomes more accurate.

...read moreread less

Proceedings ArticleDOI

Mining concept-drifting data streams using ensemble classifiers

Haixun Wang, +3 more

TL;DR: This paper proposes a general framework for mining concept-drifting data streams using weighted ensemble classifiers, and shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.

...read moreread less

Proceedings Article

Learning from Time-Changing Data with Adaptive Windowing

Albert Bifet, +1 more

TL;DR: A new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time is presented, using sliding windows whose size is recomputed online according to the rate of change observed from the data in the window itself.

...read moreread less