scispace - formally typeset
Topic

Concept drift

About: Concept drift is a(n) research topic. Over the lifetime, 2304 publication(s) have been published within this topic receiving 53287 citation(s). The topic is also known as: data drift.

...read more

Papers
More filters

Journal ArticleDOI
TL;DR: The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art and aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

...read more

Abstract: Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

...read more

1,763 citations


18


Journal ArticleDOI
Gerhard Widmer1, Miroslav Kubat2Institutions (2)
01 Apr 1996-Machine Learning
TL;DR: A family of learning algorithms that flexibly react to concept drift and can take advantage of situations where contexts reappear are described, including a heuristic that constantly monitors the system's behavior.

...read more

Abstract: On-line learning in domains where the target concept depends on some hidden context poses serious problems. A changing context can induce changes in the target concepts, producing what is known as concept drift. We describe a family of learning algorithms that flexibly react to concept drift and can take advantage of situations where contexts reappear. The general approach underlying all these algorithms consists of (1) keeping only a window of currently trusted examples and hypotheses; (2) storing concept descriptions and reusing them when a previous context re-appears; and (3) controlling both of these functions by a heuristic that constantly monitors the system's behavior. The paper reports on experiments that test the systems' perfomance under various conditions such as different levels of noise and different extent and rate of concept drift.

...read more

1,533 citations


Proceedings ArticleDOI
Yehuda Koren1Institutions (1)
28 Jun 2009-
TL;DR: Two leading collaborative filtering recommendation approaches are revamp and a more sensitive approach is required, which can make better distinctions between transient effects and long term patterns.

...read more

Abstract: Customer preferences for products are drifting over time. Product perception and popularity are constantly changing as new selection emerges. Similarly, customer inclinations are evolving, leading them to ever redefine their taste. Thus, modeling temporal dynamics should be a key when designing recommender systems or general customer preference models. However, this raises unique challenges. Within the eco-system intersecting multiple products and customers, many different characteristics are shifting simultaneously, while many of them influence each other and often those shifts are delicate and associated with a few data instances. This distinguishes the problem from concept drift explorations, where mostly a single concept is tracked. Classical time-window or instance-decay approaches cannot work, as they lose too much signal when discarding data instances. A more sensitive approach is required, which can make better distinctions between transient effects and long term patterns. The paradigm we offer is creating a model tracking the time changing behavior throughout the life span of the data. This allows us to exploit the relevant components of all data instances, while discarding only what is modeled as being irrelevant. Accordingly, we revamp two leading collaborative filtering recommendation approaches. Evaluation is made on a large movie rating dataset by Netflix. Results are encouraging and better than those previously reported on this dataset.

...read more

1,463 citations


Proceedings ArticleDOI
Haixun Wang1, Wei Fan1, Philip S. Yu1, Jiawei Han2Institutions (2)
24 Aug 2003-
TL;DR: This paper proposes a general framework for mining concept-drifting data streams using weighted ensemble classifiers, and shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.

...read more

Abstract: Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two challenges, the overwhelming volume of the streaming data, and the concept drifts. In this paper, we propose a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Beyesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. Our empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.

...read more

1,330 citations


Proceedings ArticleDOI
W. Nick Street1, Yong Seog Kim1Institutions (1)
26 Aug 2001-
TL;DR: A fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift is presented.

...read more

Abstract: Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.

...read more

1,077 citations


Network Information
Related Topics (5)
Supervised learning

20.8K papers, 710.5K citations

91% related
Recommender system

27.2K papers, 598K citations

91% related
Knowledge extraction

20.2K papers, 413.4K citations

90% related
Collaborative filtering

14.7K papers, 470.4K citations

90% related
Semi-supervised learning

12.1K papers, 611.2K citations

90% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202211
2021276
2020322
2019246
2018209
2017163

Top Attributes

Show by:

Topic's top 5 most impactful authors

João Gama

44 papers, 5.1K citations

Jie Lu

34 papers, 441 citations

Guangquan Zhang

34 papers, 441 citations

Robi Polikar

21 papers, 1.6K citations

Bartosz Krawczyk

20 papers, 402 citations