Incremental Learning of Concept Drift from Streaming Imbalanced Data (2013) | Gregory Ditzler

Citations

PDF

Open Access

More filters

Journal Article•DOI•

[...]

Guo Haixiang¹, Li Yijing¹, Jennifer Shang², Gu Mingyun¹, Huang Yuanyue¹, Gong Bing³ - Show less +2 more•Institutions (3)

China University of Geosciences (Wuhan)¹, University of Pittsburgh², Technical University of Madrid³

01 May 2017-Expert Systems With Applications

TL;DR: An in depth review of rare event detection from an imbalanced learning perspective and a comprehensive taxonomy of the existing application domains of im balanced learning are provided.

...read moreread less

Abstract: 527 articles related to imbalanced data and rare events are reviewed.Viewing reviewed papers from both technical and practical perspectives.Summarizing existing methods and corresponding statistics by a new taxonomy idea.Categorizing 162 application papers into 13 domains and giving introduction.Some opening questions are discussed at the end of this manuscript. Rare events, especially those that could potentially negatively impact society, often require humans decision-making responses. Detecting rare events can be viewed as a prediction task in data mining and machine learning communities. As these events are rarely observed in daily life, the prediction task suffers from a lack of balanced data. In this paper, we provide an in depth review of rare event detection from an imbalanced learning perspective. Five hundred and seventeen related papers that have been published in the past decade were collected for the study. The initial statistics suggested that rare events detection and imbalanced learning are concerned across a wide range of research areas from management science to engineering. We reviewed all collected papers from both a technical and a practical point of view. Modeling methods discussed include techniques such as data preprocessing, classification algorithms and model evaluation. For applications, we first provide a comprehensive taxonomy of the existing application domains of imbalanced learning, and then we detail the applications for each category. Finally, some suggestions from the reviewed papers are incorporated with our experiences and judgments to offer further research directions for the imbalanced learning and rare event detection fields.

...read moreread less

1,448 citations

Journal Article•DOI•

SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

[...]

Alberto Fernández¹, Salvador García¹, Francisco Herrera¹, Nitesh V. Chawla²•Institutions (2)

University of Granada¹, University of Notre Dame²

01 Jan 2018-Journal of Artificial Intelligence Research

TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.

...read moreread less

Abstract: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several different domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also significantly contributed to new supervised learning paradigms, including multilabel classification, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of different software packages -- from open source to commercial. In this paper, marking the fifteen year anniversary of SMOTE, we reect on the SMOTE journey, discuss the current state of affairs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.

...read moreread less

905 citations

Cites background or methods from "Incremental Learning of Concept Dri..."

...…both obstacles from the point of view of preprocessing (Nguyen et al., 2011; He & Chen, 2011; Wang, Minku, & Yao, 2015), particularly using SMOTE (Ditzler & Polikar, 2013), and/or cost-sensitive learning via ensembles of classifiers (Mirza, Lin, & Liu, 2015; Ghazikhani, Monsefi, & Sadoghi Yazdi,…...
[...]
...As we mentioned in Section 4.1, Ditzler and Polikar (2013) integrated the SMOTE preprocessing within a novel ensemble boosting approach that applies distribution weights among the instances depending on their distribution at each time step....
[...]
...The first is Learn++.NSE-SMOTE (Ditzler & Polikar, 2013), which is an extension of Learn++.SMOTE (Ditzler et al., 2010)....
[...]

Journal Article•DOI•

Ensemble learning for data stream analysis

[...]

Bartosz Krawczyk¹, Leandro L. Minku², Joo Gama³, Jerzy Stefanowski, Micha Woniak - Show less +1 more•Institutions (3)

Virginia Commonwealth University¹, University of Leicester², University of Porto³

01 Sep 2017-Information Fusion

TL;DR: This paper surveys research on ensembles for data stream classification as well as regression tasks and discusses advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations and structured outputs.

...read moreread less

757 citations

Cites background or methods from "Incremental Learning of Concept Dri..."

...Another example of passive online learning ensemble approach for non-stationary environments is Stanley’s Concept Drift Committee (CDC) [168] ....
[...]
...CDC (Concept Drift with MOTE), which employs oversampling of the minority class....
[...]
...The current use of AUC for data streams has been limited only to estimations on periodical holdout sets [77] or entire streams of a limited length [44]....
[...]
...[77] T.R. Hoens , R. Polikar , N.V. Chawla , Learning from streaming data with concept drift and imbalance: an overview, Prog....
[...]
...[50] R. Elwell , R. Polikar , Incremental learning of concept drift in nonstationary environments, IEEE Trans....
[...]

Journal Article•DOI•

Learning in Nonstationary Environments: A Survey

[...]

Gregory Ditzler¹, Manuel Roveri², Cesare Alippi², Robi Polikar³•Institutions (3)

University of Arizona¹, Polytechnic University of Milan², Rowan University³

12 Oct 2015-IEEE Computational Intelligence Magazine

TL;DR: In such nonstationary environments, where the probabilistic properties of the data change over time, a non-adaptive model trained under the false stationarity assumption is bound to become obsolete in time, and perform sub-optimally at best, or fail catastrophically at worst.

...read moreread less

Abstract: The prevalence of mobile phones, the internet-of-things technology, and networks of sensors has led to an enormous and ever increasing amount of data that are now more commonly available in a streaming fashion [1]-[5]. Often, it is assumed - either implicitly or explicitly - that the process generating such a stream of data is stationary, that is, the data are drawn from a fixed, albeit unknown probability distribution. In many real-world scenarios, however, such an assumption is simply not true, and the underlying process generating the data stream is characterized by an intrinsic nonstationary (or evolving or drifting) phenomenon. The nonstationarity can be due, for example, to seasonality or periodicity effects, changes in the users' habits or preferences, hardware or software faults affecting a cyber-physical system, thermal drifts or aging effects in sensors. In such nonstationary environments, where the probabilistic properties of the data change over time, a non-adaptive model trained under the false stationarity assumption is bound to become obsolete in time, and perform sub-optimally at best, or fail catastrophically at worst.

...read moreread less

640 citations

Cites background from "Incremental Learning of Concept Dri..."

...learning algorithm for imbalanced-nonstationary data streams that does not require access to historical data [106], [107]....
[...]

Journal Article•DOI•

Learning under Concept Drift: A Review

[...]

Jie Lu¹, Anjin Liu¹, Fan Dong¹, Feng Gu¹, João Gama², Guangquan Zhang¹ - Show less +2 more•Institutions (2)

University of Technology, Sydney¹, University of Porto²

01 Dec 2019-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A high quality, instructive review of current research developments and trends in the concept drift field is conducted, and a framework of learning under concept drift is established including three main components: concept drift detection, concept drift understanding, and concept drift adaptation.

...read moreread less

Abstract: Concept drift describes unforeseeable changes in the underlying distribution of streaming data over time. Concept drift research involves the development of methodologies and techniques for drift detection, understanding, and adaptation. Data analysis has revealed that machine learning in a concept drift environment will result in poor learning results if the drift is not addressed. To help researchers identify which research topics are significant and how to apply related techniques in data analysis tasks, it is necessary that a high quality, instructive review of current research developments and trends in the concept drift field is conducted. In addition, due to the rapid development of concept drift in recent years, the methodologies of learning under concept drift have become noticeably systematic, unveiling a framework which has not been mentioned in literature. This paper reviews over 130 high quality publications in concept drift related research areas, analyzes up-to-date developments in methodologies and techniques, and establishes a framework of learning under concept drift including three main components: concept drift detection, concept drift understanding, and concept drift adaptation. This paper lists and discusses 10 popular synthetic datasets and 14 publicly available benchmark datasets used for evaluating the performance of learning algorithms aiming at handling concept drift. Also, concept drift related research directions are covered and discussed. By providing state-of-the-art knowledge, this survey will directly support researchers in their understanding of research developments in the field of learning under concept drift.

...read moreread less

557 citations

Cites methods from "Incremental Learning of Concept Dri..."

...[119] presented two ensemble methods for learning under concept drift with imbalanced class....
[...]

Collapse

Incremental Learning of Concept Drift from Streaming Imbalanced Data

Citations

Cites background or methods from "Incremental Learning of Concept Dri..."

Cites background or methods from "Incremental Learning of Concept Dri..."

Cites background from "Incremental Learning of Concept Dri..."

Cites methods from "Incremental Learning of Concept Dri..."

References

"Incremental Learning of Concept Dri..." refers background in this paper

Related Papers (5)