scispace - formally typeset
Proceedings ArticleDOI

Concept drift detection for online class imbalance learning

Reads0
Chats0
TLDR
The analysis reveals that detecting drift in imbalanced data streams is a more difficult task than in balanced ones, and proposes a new detection method DDM-OCI derived from the existing methodDDM.
Abstract
Concept drift detection methods are crucial components of many online learning approaches. Accurate drift detections allow prompt reaction to drifts and help to maintain high performance of online models over time. Although many methods have been proposed, no attention has been given to data streams with imbalanced class distributions, which commonly exist in real-world applications, such as fault diagnosis of control systems and intrusion detection in computer networks. This paper studies the concept drift problem for online class imbalance learning. We look into the impact of concept drift on single-class performance of online models based on three types of classifiers, under seven different scenarios with the presence of class imbalance. The analysis reveals that detecting drift in imbalanced data streams is a more difficult task than in balanced ones. Minority-class recall suffers from a significant drop after the drift involving the minority class. Overall accuracy is not suitable for drift detection. Based on the findings, we propose a new detection method DDM-OCI derived from the existing method DDM. DDM-OCI monitors minority-class recall online to capture the drift. The results show a quick response of the online model working with DDM-OCI to the new concept.

read more

Citations
More filters
Journal ArticleDOI

SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.
Journal ArticleDOI

Characterizing concept drift

TL;DR: This work presents the first comprehensive framework for quantitative analysis of drift, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.
Journal ArticleDOI

Resampling-Based Ensemble Methods for Online Class Imbalance Learning

TL;DR: This paper gives the first comprehensive analysis of class imbalance in data streams, in terms of data distributions, imbalance rates and changes in class imbalance status, and proposes two new ensemble methods that maintain both OOB and UOB with adaptive weights for final predictions, called WEOB1 and WEOb2.
Posted Content

A Systematic Study of Online Class Imbalance Learning with Concept Drift

TL;DR: This paper first provides a comprehensive review of current research progress in this field, including current research focuses and open challenges, and an in-depth experimental study is performed, with the goal of understanding how to best overcome concept drift in online learning with class imbalance.
Journal ArticleDOI

A Survey of Stealth Malware Attacks, Mitigation Measures, and Steps Toward Autonomous Open World Solutions

TL;DR: In this paper, the authors present a formalized adaptive open world framework for stealth malware recognition and relate it mathematically to research from other machine learning domains and suggest that several flawed assumptions inherent to most recognition algorithms prevent a direct mapping between the stealth malware detection problem and a machine learning solution.
References
More filters
Journal ArticleDOI

Bagging predictors

Leo Breiman
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Journal ArticleDOI

Learning from Imbalanced Data

TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.
Proceedings ArticleDOI

Mining time-changing data streams

TL;DR: An efficient algorithm for mining decision trees from continuously-changing data streams, based on the ultra-fast VFDT decision tree learner is proposed, called CVFDT, which stays current while making the most of old data by growing an alternative subtree whenever an old one becomes questionable, and replacing the old with the new when the new becomes more accurate.
Proceedings ArticleDOI

Mining concept-drifting data streams using ensemble classifiers

TL;DR: This paper proposes a general framework for mining concept-drifting data streams using weighted ensemble classifiers, and shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.
Proceedings Article

Learning from Time-Changing Data with Adaptive Windowing

TL;DR: A new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time is presented, using sliding windows whose size is recomputed online according to the rate of change observed from the data in the window itself.
Related Papers (5)