scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A streaming ensemble algorithm (SEA) for large-scale classification

26 Aug 2001-pp 377-382
TL;DR: A fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift is presented.
Abstract: Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.
Citations
More filters
Journal ArticleDOI
TL;DR: The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art and aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.
Abstract: Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

2,374 citations

Proceedings ArticleDOI
Yehuda Koren1
28 Jun 2009
TL;DR: Two leading collaborative filtering recommendation approaches are revamp and a more sensitive approach is required, which can make better distinctions between transient effects and long term patterns.
Abstract: Customer preferences for products are drifting over time. Product perception and popularity are constantly changing as new selection emerges. Similarly, customer inclinations are evolving, leading them to ever redefine their taste. Thus, modeling temporal dynamics should be a key when designing recommender systems or general customer preference models. However, this raises unique challenges. Within the eco-system intersecting multiple products and customers, many different characteristics are shifting simultaneously, while many of them influence each other and often those shifts are delicate and associated with a few data instances. This distinguishes the problem from concept drift explorations, where mostly a single concept is tracked. Classical time-window or instance-decay approaches cannot work, as they lose too much signal when discarding data instances. A more sensitive approach is required, which can make better distinctions between transient effects and long term patterns. The paradigm we offer is creating a model tracking the time changing behavior throughout the life span of the data. This allows us to exploit the relevant components of all data instances, while discarding only what is modeled as being irrelevant. Accordingly, we revamp two leading collaborative filtering recommendation approaches. Evaluation is made on a large movie rating dataset by Netflix. Results are encouraging and better than those previously reported on this dataset.

1,621 citations


Cites background from "A streaming ensemble algorithm (SEA..."

  • ...[7, 16, 24]) is based on maintaining an ensemble of models capable of capturing various states of the data....

    [...]

Proceedings ArticleDOI
24 Aug 2003
TL;DR: This paper proposes a general framework for mining concept-drifting data streams using weighted ensemble classifiers, and shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.
Abstract: Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two challenges, the overwhelming volume of the streaming data, and the concept drifts. In this paper, we propose a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Beyesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. Our empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.

1,403 citations


Cites background from "A streaming ensemble algorithm (SEA..."

  • ...Much work has been done on modeling [1], querying [2, 14, 18], and mining data streams, for instance, several papers have been published on classification [7, 21, 27], regression analysis [5], and clustering [19]....

    [...]

01 Jan 2004
TL;DR: This paper considers different types of concept drift, peculiarities of the problem, and gives a critical review of existing approaches to the problem.
Abstract: Alexey Tsymbal Department of Computer Science Trinity College Dublin, Ireland tsymbalo@tcd.ie April 29, 2004 Abstract In the real world concepts are often not stable but change with time. Typical examples of this are weather prediction rules and customers’ preferences. The underlying data distribution may change as well. Often these changes make the model built on old data inconsistent with the new data, and regular updating of the model is necessary. This problem, known as concept drift, complicates the task of learning a model from data and requires special approaches, different from commonly used techniques, which treat arriving instances as equally important contributors to the final concept. This paper considers different types of concept drift, peculiarities of the problem, and gives a critical review of existing approaches to the problem. 1. Definitions and peculiarities of the problem A difficult problem with learning in many real-world domains is that the concept of interest may depend on some hidden context, not given explicitly in the form of pre-dictive features. A typical example is weather prediction rules that may vary radically with the season. Another example is the patterns of customers’ buying preferences that may change with time, depending on the current day of the week, availability of alter-natives, inflation rate, etc. Often the cause of change is hidden, not known a priori, making the learning task more complicated. Changes in the hidden context can induce more or less radical changes in the target concept, which is generally known as con-cept drift (Widmer and Kubat, 1996). An effective learner should be able to track such changes and to quickly adapt to them. A difficult problem in handling concept drift is distinguishing between true concept drift and noise. Some algorithms may overreact to noise, erroneously interpreting it as concept drift, while others may be highly robust to noise, adjusting to the changes too slowly. An ideal learner should combine robustness to noise and sensitivity to concept drift (Widmer and Kubat, 1996). In many domains, hidden contexts may be expected to recur. Recurring contexts may be due to cyclic phenomena, such as seasons of the year or may be associated with irregular phenomena, such as inflation rates or market mood (Harries and Sam-mut, 1998). In such domains, in order to adapt more quickly to concept drift, concept

987 citations


Cites background from "A streaming ensemble algorithm (SEA..."

  • ...…and Kubat, 1993, 1996; Wang et al., 2003), decision trees, including their incremental versions (Harries and Sammut, 1998; Hulten et al., 2001; Street and Kim, 2001; Kolter and Maloof, 2003; Stanley, 2003; Wang et al., 2003), Naïve Bayes (Kolter and Maloof, 2003; Wang et al., 2003), SVMs…...

    [...]

  • ...This is caused by the fact that data in many current data processing systems is organized in the form of a data stream rather than a static data repository, reflecting the natural flow of data (Street and Kim, 2001; Wang et al., 2003; Hulten and Spencer, 2003)....

    [...]

  • ...Another popular benchmark problem is represented by a moving hyperplane (Hulten et al., 2001; Street and Kim, 2001; Kolter and Maloof, 2003; Wang et al., 2003)....

    [...]

  • ...…page access data (Hulten et al., 2001), the Text Retrieval Conference (TREC) data (Lanquillon, 1999; Klinkenberg, 2004), credit card fraud data (Wang et al., 2003), breast cancer, anonymous Web browsing, and US Census Bureau data (Street and Kim, 2001), and e-mail data (Cunningham et al., 2003)....

    [...]

  • ...Street and Kim (2001) and Wang et al. (2001) suggest that simply dividing the data into sequential chunks of fixed size and building an ensemble on those chunks may be effective for handling concept drift....

    [...]

Journal ArticleDOI
TL;DR: The results of the analyses suggest that disease progression to distant sites does not occur exclusively via the axillary lymph nodes, but rather that lymph node status serves as an indicator of the tumor's ability to spread.
Abstract: Two of the most important prognostic indicators for breast cancer are tumor size and extent of axillary lymph node involvement. Data on 24,740 cases recorded in the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute were used to evaluate the breast cancer survival experience in a representative sample of women from the United States. Actuarial (life table) methods were used to investigate the 5-year relative survival rates in cases with known operative/pathologic axillary lymph node status and primary tumor diameter. Survival rates varied from 45.5% for tumor diameters equal to or greater than 5 cm with positive axillary nodes to 96.3% for tumors less than 2 cm and with no involved nodes. The relation between tumor size and lymph node status was investigated in detail. Tumor diameter and lymph node status were found to act as independent but additive prognostic indicators. As tumor size increased, survival decreased regardless of lymph node status; and as lymph node involvement increased, survival status also decreased regardless of tumor size. A linear relation was found between tumor diameter and the percent of cases with positive lymph node involvement. The results of our analyses suggest that disease progression to distant sites does not occur exclusively via the axillary lymph nodes, but rather that lymph node status serves as an indicator of the tumor's ability to spread.

960 citations

References
More filters
Book
15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

21,674 citations

Journal ArticleDOI
01 Aug 1996
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Abstract: Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.

16,118 citations

01 Jan 1998

12,940 citations


"A streaming ensemble algorithm (SEA..." refers methods in this paper

  • ...The first and third data sets are publicly available from the UCI machine learning repository [2]....

    [...]

Proceedings Article
Yoav Freund1, Robert E. Schapire1
03 Jul 1996
TL;DR: This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.
Abstract: In an earlier paper, we introduced a new "boosting" algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that con- sistently generates classifiers whose performance is a little better than random guessing. We also introduced the related notion of a "pseudo-loss" which is a method for forcing a learning algorithm of multi-label concepts to concentrate on the labels that are hardest to discriminate. In this paper, we describe experiments we carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems. We performed two sets of experiments. The first set compared boosting to Breiman's "bagging" method when used to aggregate various classifiers (including decision trees and single attribute- value tests). We compared the performance of the two methods on a collection of machine-learning benchmarks. In the second set of experiments, we studied in more detail the performance of boosting using a nearest-neighbor classifier on an OCR problem.

7,601 citations


"A streaming ensemble algorithm (SEA..." refers methods in this paper

  • ...Boosting [20], and its variants such as AdaBoost [10] and Arcing [5], uses a weighted resampling technique, creating a series of classifiers in which later indi- viduals focus on classifying the more difficult points....

    [...]

  • ...Boosting [20], and its variants such as AdaBoost [10] and Arcing [5], uses a weighted resampling technique, creating a series of classifiers in which later individuals focus on classifying the more difficult points....

    [...]

01 Jan 1996

7,386 citations