scispace - formally typeset
Search or ask a question

Showing papers by "Thomas G. Dietterich published in 2000"


Book ChapterDOI
21 Jun 2000
TL;DR: Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.
Abstract: Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.

5,679 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5 and found that in situations with little or no classification noise, randomization is competitive with bagging but not as accurate as boosting.
Abstract: Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approach to generating an ensemble is to randomize the internal decisions made by the base algorithm. This general approach has been studied previously by Ali and Pazzani and by Dietterich and Kong. This paper compares the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5. The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization.

2,919 citations


Journal ArticleDOI
TL;DR: The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges with probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction.
Abstract: This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics--as a subroutine hierarchy--and a declarative semantics--as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consistent with the given hierarchy. The decomposition also creates opportunities to exploit state abstractions, so that individual MDPs within the hierarchy can ignore large parts of the state space. This is important for the practical application of the method. This paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges with probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this nonhierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.

1,486 citations


Book ChapterDOI
26 Jul 2000
TL;DR: An overview of the MAXQ value function decomposition and its support for state abstraction and action abstraction is given.
Abstract: Reinforcement learning addresses the problem of learning optimal policies for sequential decision-making problems involving stochastic operators and numerical reward functions rather than the more traditional deterministic operators and logical goal predicates. In many ways, reinforcement learning research is recapitulating the development of classical research in planning and problem solving. After studying the problem of solving "flat" problem spaces, researchers have recently turned their attention to hierarchical methods that incorporate subroutines and state abstractions. This paper gives an overview of the MAXQ value function decomposition and its support for state abstraction and action abstraction.

102 citations


Proceedings Article
29 Jun 2000
TL;DR: This paper presents two statistical methods for the cost-sensitive setting, and shows experimentally that these bootstrap tests work better than applying standard z tests based on the normal distribution.
Abstract: Many machine learning applications require classi ers that minimize an asymmetric cost function rather than the misclassi cation rate, and several recent papers have addressed this problem. However, these papers have either applied no statistical testing or have applied statistical methods that are not appropriate for the cost-sensitive setting. Without good statistical methods, it is di cult to tell whether these new cost-sensitive methods are better than existing methods that ignore costs, and it is also di cult to tell whether one cost-sensitive method is better than another. To rectify this problem, this paper presents two statistical methods for the cost-sensitive setting. The rst constructs a con dence interval for the expected cost of a single classi er. The second constructs a condence interval for the expected di erence in costs of two classi ers. In both cases, the basic idea is to separate the problem of estimating the probabilities of each cell in the confusion matrix (which is independent of the cost matrix) from the problem of computing the expected cost. We show experimentally that these bootstrap tests work better than applying standard z tests based on the normal distribution.

93 citations


Proceedings ArticleDOI
01 Aug 2000
TL;DR: A decisiontheoretic approach to DLFT is described in which historical test data is mined to create a probabilistic model of patterns of die failure and this model is combined with greedy value-of-information computations to decide in real time which die to test next and when to stop testing.
Abstract: We describe an application of data mining and decision analysis to the problem of die-level functional test in integrated circuit manufacturing. Integrated circuits are fabricated on large wafers that can hold hundreds of individual chips (“die”). In current practice, large and expensive machines test each of these die to check that they are functioning properly (die-level functional test; DLFT), and then the wafers are cut up, and the good die are assembled into packages and connected to the package pins. Finally, the resulting packages are tested to ensure that the final product is functioning correctly. The purpose of die-level functional test is to avoid the expense of packaging bad die and to provide rapid feedback to the fabrication process by detecting die failures. The challenge for a decisiontheoretic approach is to reduce the amount of DLFT (and the associated costs) while still providing process feedback. We describe a decisiontheoretic approach to DLFT in which historical test data is mined to create a probabilistic model of patterns of die failure. This model is combined with greedy value-of-information computations to decide in real time which die to test next and when to stop testing. We report the results of several experiments that demonstrate the ability of this procedure to make good testing decisions, good stopping decisions, and to detect anomalous die. Based on experiments with historical test data from Hewlett Packard Company, the resulting system has the potential to

23 citations


Book ChapterDOI
11 Dec 2000
TL;DR: The problem of divide-and-conquer learning is defined and the key research questions that need to be studied are identified in order to develop practical, general-purpose learning algorithms for divide- and- Conquer problems and an associated theory.
Abstract: Existing machine learning theory and algorithms have focused on learning an unknown function from training examples, where the unknown function maps from a feature vector to one of a small number of classes. Emerging applications in science and industry require learning much more complex functions that map from complex input spaces (e.g., 2-dimensional maps, time series, and strings) to complex output spaces (e.g., other 2-dimensional maps, time series, and strings). Despite the lack of theory covering such cases, many practical systems have been built that work well in particular applications. These systems all employ some form of divide-and-conquer, where the inputs and outputs are divided into smaller pieces (e.g., "windows"), classified, and then the results are merged to produce an overall solution. This paper defines the problem of divide-and-conquer learning and identifies the key research questions that need to be studied in order to develop practical, general-purpose learning algorithms for divide-and-conquer problems and an associated theory.

20 citations


Book ChapterDOI
28 Aug 2000
TL;DR: The even-odd POM DP is introduced, an approximation to POMDPs (Partially Observable Markov Decision Problems) in which the world is assumed to be fully observable every other time step that is at least as good as methods based on the optimal value function of the underlying MDP.
Abstract: This paper introduces the even-odd POMDP, an approximation to POMDPs (Partially Observable Markov Decision Problems) in which the world is assumed to be fully observable every other time step. This approximation works well for problems with a delayed need to observe. The even-odd POMDP can be converted into an equivalent MDP, the 2MDP, whose value function, V2MQP, can be combined online with a 2-step lookahead search to provide a good POMDP policy. We prove that this gives an approximation to the POMDP's optimal value function that is at least as good as methods based on the optimal value function of the underlying MDP. We present experimental evidence that the method finds a good policy for a POMDP with 10,000 states and observations.

16 citations


BookDOI
01 Jan 2000
TL;DR: Die Online-Fachbuchhandlung beck-shop.de ist spezialisiert auf Fachbucher, insbesondere Recht, Steuern und Wirtschaft, wie Neuerscheinungsdienst oder Zusammenstellungen von Buchern zu Sonderpreisen.
Abstract: ion, Reformulation, and Approximation 5th International Symposium, SARA 2002, Kananaskis, Alberta, Canada, August 2-4, 2002, Proceedings Bearbeitet von Sven Koenig, Robert C Holte 1. Auflage 2002. Taschenbuch. xi, 352 S. Paperback ISBN 978 3 540 43941 7 Format (B x L): 15,5 x 23,5 cm Gewicht: 1140 g Weitere Fachgebiete > EDV, Informatik > Informatik > Logik, Formale Sprachen, Automaten Zu Leseprobe schnell und portofrei erhaltlich bei Die Online-Fachbuchhandlung beck-shop.de ist spezialisiert auf Fachbucher, insbesondere Recht, Steuern und Wirtschaft. Im Sortiment finden Sie alle Medien (Bucher, Zeitschriften, CDs, eBooks, etc.) aller Verlage. Erganzt wird das Programm durch Services wie Neuerscheinungsdienst oder Zusammenstellungen von Buchern zu Sonderpreisen. Der Shop fuhrt mehr als 8 Millionen Produkte. Table of

7 citations


Proceedings Article
29 Jun 2000
TL;DR: A new divide-and-conquer method is presented that analyzes the model to identify a series of smaller optimization problems whose sequential solution solves the global calibration problem.
Abstract: This paper introduces a new machine learning task|model calibration|and presents a method for solving a particularly diicult model calibration task that arose as part of a global climate change research project. The model calibration task is the problem of training the free parameters of a scientiic model in order to optimize the accuracy of the model for making future predictions. It is a form of supervised learning from examples in the presence of prior knowledge. An obvious approach to solving calibration problems is to formulate them as global optimization problems in which the goal is to nd values for the free parameters that minimize the error of the model on training data. Unfortunately, this global optimization approach becomes computationally infeasible when the model is highly nonlinear. This paper presents a new divide-and-conquer method that analyzes the model to identify a series of smaller optimization problems whose sequential solution solves the global calibration problem. This paper argues that methods of this kind|rather than global optimization techniques|will be required in order for agents with large amounts of prior knowledge to learn eeciently.

7 citations