scispace - formally typeset
Search or ask a question

Showing papers by "Thomas G. Dietterich published in 2002"


Book ChapterDOI
TL;DR: This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems, including sliding window methods, recurrent sliding windows, hidden Markov models, conditional random fields, and graph transformer networks.
Abstract: Statistical learning problems in many fields involve sequential data. This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems. These methods include sliding window methods, recurrent sliding windows, hidden Markov models, conditional random fields, and graph transformer networks. The paper also discusses some open research issues.

698 citations


Proceedings Article
08 Jul 2002
TL;DR: A statistical prun ing heuristic based on the principle that if the values of two policies are statistically in distinguishable on the training data then one of the policies can be pruned from the AO search space is introduced.
Abstract: This paper addresses cost sensitive classi cation in the setting where there are costs for measuring each attribute as well as costs for misclassi cation errors We show how to formulate this as a Markov Decision Pro cess in which the transition model is learned from the training data Speci cally we as sume a set of training examples in which all attributes and the true class have been measured We describe a learning algorithm based on the AO heuristic search procedure that searches for the classi cation policy with minimum expected cost We provide an ad missible heuristic for AO that substantially reduces the number of nodes that need to be expanded particularly when attribute mea surement costs are high To further prune the search space we introduce a statistical prun ing heuristic based on the principle that if the values of two policies are statistically in distinguishable on the training data then we can prune one of the policies from the AO search space Experiments with realis tic and synthetic data demonstrate that these heuristics can substantially reduce the mem ory needed for AO search without signi cantly a ecting the quality of the learned pol icy Hence these heuristics expand the range of cost sensitive learning problems for which AO is feasible

87 citations


Dissertation
01 Jan 2002
TL;DR: B-LOTs were shown to be superior to other methods in cases where the classes have very different frequencies—a situation that arises frequently in cost-sensitive classification problems.
Abstract: Many approaches for achieving intelligent behavior of automated (computer) systems involve components that learn from past experience. This dissertation studies computational methods for learning from examples, for classification and for decision making, when the decisions have different non-zero costs associated with them. Many practical applications of learning algorithms, including transaction monitoring, fraud detection, intrusion detection, and medical diagnosis, have such non-uniform costs, and there is a great need for new methods that can handle them. This dissertation discusses two approaches to cost-sensitive classification: input data weighting and conditional density estimation. The first method assigns a weight to each training example in order to force the learning algorithm (which is otherwise unchanged) to pay more attention to examples with higher misclassification costs. The dissertation discusses several different weighting methods and concludes that a method that gives higher weight to examples from rarer classes works quite well. Another algorithm that gave good results was a wrapper method that applies Powell's gradient-free algorithm to optimize the input weights. The second approach to cost-sensitive classification is conditional density estimation. In this approach, the output of the learning algorithm is a classifier that estimates, for a new data point, the probability that it belongs to each of the classes. These probability estimates can be combined with a cost matrix to make decisions that minimize the expected cost. The dissertation presents a new algorithm, bagged lazy option trees (B-LOTs), that gives better probability estimates than any previous method based on decision trees. In order to evaluate cost-sensitive classification methods, appropriate statistical methods are needed. The dissertation presents two new statistical procedures: BCOST provides a confidence interval on the expected cost of a classifier, and BDELTACOST provides a confidence interval on the difference in expected costs of two classifiers. These methods are applied to a large set of experimental studies to evaluate and compare the cost-sensitive methods presented in this dissertation. Finally, the dissertation describes the application of the B-LOTs to a problem of predicting the stability of river channels. In this study, B-LOTs were shown to be superior to other methods in cases where the classes have very different frequencies—a situation that arises frequently in cost-sensitive classification problems.

43 citations


Book ChapterDOI
24 Jun 2002
TL;DR: It is shown that the bias-variance decomposition offers a rationale to develop ensemble methods using SVMs as base learners, and two directions for developing SVM ensembles are outlined, exploiting the SVM bias characteristics and the biases in the kernel parameters.
Abstract: Accuracy, diversity, and learning characteristics of base learners critically influence the effectiveness of ensemble methods. Bias-variance decomposition of the error can be used as a tool to gain insights into the behavior of learning algorithms, in order to properly design ensemble methods well-tuned to the properties of a specific base learner. In this work we analyse bias-variance decomposition of the error in Support Vector Machines (SVM), characterizing it with respect to the kernel and its parameters. We show that the bias-variance decomposition offers a rationale to develop ensemble methods using SVMs as base learners, and we outline two directions for developing SVM ensembles, exploiting the SVM bias characteristics and the bias-variance dependence on the kernel parameters.

24 citations



Book ChapterDOI
TL;DR: The suitability of reinforcement learning for automatically tuning agents within a MAS to optimize a complex tradeoff, namely the camera use, is explored.
Abstract: This paper extends a navigation system implemented as a multi-agent system (MAS). The arbitration mechanism controlling the interactions between the agents was based on manually-tuned bidding functions. A difficulty with hand-tuning is that it is hard to handle situations involving complex tradeoffs. In this paper we explore the suitability of reinforcement learning for automatically tuning agents within a MAS to optimize a complex tradeoff, namely the camera use.

5 citations


Proceedings Article
08 Jul 2002
TL;DR: A formula for optimal smoothing is derived which shows that the degree of smoothing should decrease as the amount of data increases, and is shown to be better than two simpler action refinement methods on a synthetic maze problem.
Abstract: In many reinforcement learning applications, the set of possible actions can be partitioned by the programmer into subsets of similar actions. This paper presents a technique for exploiting this form of prior information to speed up model-based reinforcement learning. We call it an action refinement method, because it treats each subset of similar actions as a single “abstract” action early in the learning process and then later “refines” the abstract action into individual actions as more experience is gathered. Our method estimates the transition probabilities P (s′|s, a) for an action a by combining the results of executions of action a with executions of other actions in the same subset of similar actions. This is a form of “smoothing” of the probability estimates that trades increased bias for reduced variance. The paper derives a formula for optimal smoothing which shows that the degree of smoothing should decrease as the amount of data increases. Experiments show that probability smoothing is better than two simpler action refinement methods on a synthetic maze problem. Action refinement is most useful in problems, such as robotics, where training experiences are expensive.

4 citations