scispace - formally typeset
Search or ask a question

Showing papers by "Thomas G. Dietterich published in 1999"


Proceedings Article
29 Nov 1999
TL;DR: This paper defines five conditions under which state abstraction can be combined with the MAXQ value function decomposition and proves that the MAX Q learning algorithm converges under these conditions and shows experimentally that state abstraction is important for the successful application of MAXQ-Q learning.
Abstract: Many researchers have explored methods for hierarchical reinforcement learning (RL) with temporal abstractions, in which abstract actions are defined that can perform many primitive actions before terminating. However, little is known about learning with state abstractions, in which aspects of the state space are ignored. In previous work, we developed the MAXQ method for hierarchical RL. In this paper, we define five conditions under which state abstraction can be combined with the MAXQ value function decomposition. We prove that the MAXQ-Q learning algorithm converges under these conditions and show experimentally that state abstraction is important for the successful application of MAXQ-Q learning.

74 citations


Posted Content
TL;DR: The MAXQ-Q algorithm as mentioned in this paper decomposes the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposes value function of the target MDP into an additive combination of the value functions of the smaller mDPs, and proves that it converges wih probability 1 to a kind of locally-optimal policy known as a recursively optimal policy.
Abstract: This paper presents the MAXQ approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges wih probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this non-hierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.

20 citations


01 Jan 1999
TL;DR: This work studies methods for modifying C4.5 to incorporate arbitrary loss matrices and tests several methods: a wrapper method and some simple heuristics, and shows that this measure can predict when more e cient methods will be applied and when the wrapper method must be applied.
Abstract: Many machine learning applications require classi ers that minimize an asymmetric loss function rather than the raw misclassi cation rate. We study methods for modifying C4.5 to incorporate arbitrary loss matrices. One way to incorporate loss information into C4.5 is to manipulate the weights assigned to the examples from di erent classes. For 2-class problems, this works for any loss matrix, but for k > 2 classes, it is not su cient. Nonetheless, we ask what is the set of class weights that best approximates an arbitrary k k loss matrix, and we test and compare several methods: a wrapper method and some simple heuristics. The best method is a wrapper method that directly optimizes the loss using a holdout data set. We de ne complexity measure for loss matrices and show that this measure can predict when more e cient methods will su ce and when the wrapper method must be applied.

19 citations


Posted Content
TL;DR: In this paper, five conditions under which state abstraction can be combined with the MAXQ value function decomposition are defined and shown experimentally that state abstraction is important for the successful application of MAXQ-Q learning.
Abstract: Many researchers have explored methods for hierarchical reinforcement learning (RL) with temporal abstractions, in which abstract actions are defined that can perform many primitive actions before terminating. However, little is known about learning with state abstractions, in which aspects of the state space are ignored. In previous work, we developed the MAXQ method for hierarchical RL. In this paper, we define five conditions under which state abstraction can be combined with the MAXQ value function decomposition. We prove that the MAXQ-Q learning algorithm converges under these conditions and show experimentally that state abstraction is important for the successful application of MAXQ-Q learning.

7 citations