scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learning in the presence of concept drift and hidden contexts

01 Apr 1996-Machine Learning (Kluwer Academic Publishers)-Vol. 23, Iss: 1, pp 69-101
TL;DR: A family of learning algorithms that flexibly react to concept drift and can take advantage of situations where contexts reappear are described, including a heuristic that constantly monitors the system's behavior.
Abstract: On-line learning in domains where the target concept depends on some hidden context poses serious problems. A changing context can induce changes in the target concepts, producing what is known as concept drift. We describe a family of learning algorithms that flexibly react to concept drift and can take advantage of situations where contexts reappear. The general approach underlying all these algorithms consists of (1) keeping only a window of currently trusted examples and hypotheses; (2) storing concept descriptions and reusing them when a previous context re-appears; and (3) controlling both of these functions by a heuristic that constantly monitors the system's behavior. The paper reports on experiments that test the systems' perfomance under various conditions such as different levels of noise and different extent and rate of concept drift.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art and aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.
Abstract: Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

2,374 citations


Cites background or methods from "Learning in the presence of concept..."

  • ...…Size [Gama et al. 2004], [Zhang et al. 2008], [Kuncheva and Zliobaite 2009] Forgetting Mechanisms Temporal Sequences [Salganicoff 1997], [Widmer and Kubat 1996], Abrupt Forgetting [Forman 2006], [Klinkenberg 2004], [Pechenizkiy are computed as Si = G(Xi, αSi−1), where α ∈ (0, 1) is the…...

    [...]

  • ...The FLORA2 algorithm [Widmer and Kubat 1996] includes a window adjustment heuristic for a rule-based classifier....

    [...]

  • ...…and Hulten 2000], [Kuncheva and Plumpton 2008], [Kelly et al. 1999] [Bouchachia 2011a], [Ikonomovska et al. 2011] [Salganicoff 1997], [Widmer and Kubat 1996], Fixed Size [Syed et al. 1999], [Hulten et al. 2001], [Lazarescu et al. 2004], Multiple Examples [Bifet and Gavalda 2006,…...

    [...]

  • ...One of the first algorithms using an adaptive window size was the FLORA2 [Widmer and Kubat 1996]....

    [...]

  • ...A typical section strategy monitors the evolution of the performance indicators [Widmer and Kubat 1996; Zeira et al. 2004] or raw data and statistically compares them to a fixed baseline....

    [...]

Proceedings ArticleDOI
26 Aug 2001
TL;DR: An efficient algorithm for mining decision trees from continuously-changing data streams, based on the ultra-fast VFDT decision tree learner is proposed, called CVFDT, which stays current while making the most of old data by growing an alternative subtree whenever an old one becomes questionable, and replacing the old with the new when the new becomes more accurate.
Abstract: Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying processes generating them changed during this time, sometimes radically. Although a number of algorithms have been proposed for learning time-changing concepts, they generally do not scale well to very large databases. In this paper we propose an efficient algorithm for mining decision trees from continuously-changing data streams, based on the ultra-fast VFDT decision tree learner. This algorithm, called CVFDT, stays current while making the most of old data by growing an alternative subtree whenever an old one becomes questionable, and replacing the old with the new when the new becomes more accurate. CVFDT learns a model which is similar in accuracy to the one that would be learned by reapplying VFDT to a moving window of examples every time a new example arrives, but with O(1) complexity per example, as opposed to O(w), where w is the size of the window. Experiments on a set of large time-changing data streams demonstrate the utility of this approach.

1,790 citations

Journal ArticleDOI
TL;DR: This article presents a framework that classifies transfer learning methods in terms of their capabilities and goals, and then uses it to survey the existing literature, as well as to suggest future directions for transfer learning work.
Abstract: The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.

1,634 citations


Cites background from "Learning in the presence of concept..."

  • ...Concept drift (Widmer and Kubat, 1996) in RL has not been directly addressed by any work in this survey....

    [...]

Proceedings ArticleDOI
Yehuda Koren1
28 Jun 2009
TL;DR: Two leading collaborative filtering recommendation approaches are revamp and a more sensitive approach is required, which can make better distinctions between transient effects and long term patterns.
Abstract: Customer preferences for products are drifting over time. Product perception and popularity are constantly changing as new selection emerges. Similarly, customer inclinations are evolving, leading them to ever redefine their taste. Thus, modeling temporal dynamics should be a key when designing recommender systems or general customer preference models. However, this raises unique challenges. Within the eco-system intersecting multiple products and customers, many different characteristics are shifting simultaneously, while many of them influence each other and often those shifts are delicate and associated with a few data instances. This distinguishes the problem from concept drift explorations, where mostly a single concept is tracked. Classical time-window or instance-decay approaches cannot work, as they lose too much signal when discarding data instances. A more sensitive approach is required, which can make better distinctions between transient effects and long term patterns. The paradigm we offer is creating a model tracking the time changing behavior throughout the life span of the data. This allows us to exploit the relevant components of all data instances, while discarding only what is modeled as being irrelevant. Accordingly, we revamp two leading collaborative filtering recommendation approaches. Evaluation is made on a large movie rating dataset by Netflix. Results are encouraging and better than those previously reported on this dataset.

1,621 citations


Cites background from "Learning in the presence of concept..."

  • ...A more sensitive approach is required, which can make better distinctions between transient effects and long term patterns....

    [...]

Journal ArticleDOI
TL;DR: Characteristics of the process industry data which are critical for the development of data-driven Soft Sensors are discussed.

1,399 citations


Cites background from "Learning in the presence of concept..."

  • ...For detailed treatment and some solutions see Widmer and Kubat (1996); Gama et al. (2004)....

    [...]

References
More filters
Proceedings ArticleDOI
05 Nov 1984
TL;DR: This paper regards learning as the phenomenon of knowledge acquisition in the absence of explicit programming, and gives a precise methodology for studying this phenomenon from a computational viewpoint.
Abstract: Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowledge acquisition in the absence of explicit programming. We give a precise methodology for studying this phenomenon from a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learnt using it in a reasonable (polynomial) number of steps. We find that inherent algorithmic complexity appears to set serious limits to the range of concepts that can be so learnt. The methodology and results suggest concrete principles for designing realistic learning systems.

5,311 citations

Journal ArticleDOI
TL;DR: This paper describes how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy and extends the nearest neighbor algorithm, which has large storage requirements.
Abstract: Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instance-based learning, that generates classification predictions using only specific instances. Instance-based learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storage-reducing algorithm performs well on several real-world databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noise-tolerant decision tree algorithm.

4,499 citations


"Learning in the presence of concept..." refers background or methods in this paper

  • ...The more sophisticated variant IB3 (Aha et al., 1991) possesses a mech-anism similar to FLORA4 's for deciding which of the exemplars are `trust-worthy' predictors, which of them should be discarded as possibly noisy oroutdated, and which are as yet undecided....

    [...]

  • ...This general approach to deciding which hypotheses to trust has beenadopted from the instance-based learning method IB3 (Aha et al, 1991), whichalso uses statistical con dence measures to distinguish between reliable andunreliable predictors (exemplars in IB3 )....

    [...]

  • ...Simple Instance-Based Learning (sometimes calledmemory-based learning)algorithms like IB1 (Aha et al., 1991) can be viewed as incremental on-linelearners that rst classify each newly arrived example by some nearest-neighbor27 method and then store it as a new exemplar....

    [...]

  • ...Con dence intervals are computed as in (Aha et al., 1991)....

    [...]

Book
03 Oct 2013
TL;DR: This book contains tutorial overviews and research papers on contemporary trends in the area of machine learning viewed from an AI perspective, including learning from examples, modeling human learning strategies, knowledge acquisition for expert systems, learning heuristics, discovery systems, and conceptual data analysis.
Abstract: This book contains tutorial overviews and research papers on contemporary trends in the area of machine learning viewed from an AI perspective. Research directions covered include: learning from examples, modeling human learning strategies, knowledge acquisition for expert systems, learning heuristics, discovery systems, and conceptual data analysis.

2,824 citations

Journal ArticleDOI
TL;DR: This paper shows that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned.
Abstract: Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

1,967 citations


"Learning in the presence of concept..." refers background in this paper

  • ...Again, this result assumes a minimum (fixed) window size of w(e,~) = m(e,~/2) where m(e,~) is derived from the general bound on the number of training examples that guarantee PAC-learning ( Blumer et al., 1989 ): re(e, 6) = maz( 4 log 3,-~ log !~)....

    [...]

  • ...In the following statement of results, the ci's are positive constants, e is the maximum allowed probability of misclassifying the next incoming example, n is the number of available attributes, and d is the Vapnik-Chervonenkis dimension (see, e.g., Blumer et al., 1989 ) of the target class....

    [...]

  • ...Again, this result assumes a mini-mum ( xed) window size which in this case turns out to be w( ; ) = m( ; =2)where m( ; ) is derived from the general bound on the number of train-ing examples that guarantee PAC-learning (Blumer et al., 1989): m( ; ) =max(4 log 2 ; 8d log 13 ) (where d is the VC-dimension of the class of target con-cepts)....

    [...]

  • ...…window size which in this case turns out to be w( ; ) = m( ; =2)where m( ; ) is derived from the general bound on the number of train-ing examples that guarantee PAC-learning (Blumer et al., 1989): m( ; ) =max(4 log 2 ; 8d log 13 ) (where d is the VC-dimension of the class of target con-cepts)....

    [...]

Journal ArticleDOI
Dana Angluin1
TL;DR: This work considers the problem of using queries to learn an unknown concept, and several types of queries are described and studied: membership, equivalence, subset, superset, disjointness, and exhaustiveness queries.
Abstract: We consider the problem of using queries to learn an unknown concept. Several types of queries are described and studied: membership, equivalence, subset, superset, disjointness, and exhaustiveness queries. Examples are given of efficient learning methods using various subsets of these queries for formal domains, including the regular languages, restricted classes of context-free languages, the pattern languages, and restricted types of prepositional formulas. Some general lower bound techniques are given. Equivalence queries are compared with Valiant's criterion of probably approximately correct identification under random sampling.

1,797 citations