scispace - formally typeset
Open AccessProceedings Article

Model reduction techniques for computing approximately optimal solutions for Markov decision processes

Reads0
Chats0
TLDR
A method for solving implicit (factored) Markov decision processes (MDPs) with very large state spaces using an e-homogeneous partition, and algorithms that operate on BMDPs to find policies that are approximately optimal with respect to the original MDP are presented.
Abstract
We present a method for solving implicit (factored) Markov decision processes (MDPs) with very large state spaces. We introduce a property of state space partitions which we call e-homogeneity. Intuitively, an e-homogeneous partition groups together states that behave approximately the same under all or some subset of policies. Borrowing from recent work on model minimization in computer-aided software verification, we present an algorithm that takes a factored representation of an MDP and an 0 ≤ e ≤ 1 and computes a factored e-homogeneous partition of the state space. This partition defines a family of related MDPs--those MDP's with state space equal to the blocks of the partition, and transition probabilities "approximately" like those of any (original MDP) state in the source block. To formally study such families of MDPs, we introduce the new notion of a "bounded parameter MDP" (BMDP), which is a family of (traditional) MDPs defined by specifying upper and lower bounds on the transition probabilities and rewards. We describe algorithms that operate on BMDPs to find policies that are approximately optimal with respect to the original MDP. In combination, our method for reducing a large implicit MDP to a possibly much smaller BMDP using an e-homogeneous partition, and our methods for selecting actions in BMDP's constitute a new approach for analyzing large implicit MDP's. Among its advantages, this new approach provides insight into existing algorithms to solving implicit MDPs, provides useful connections to work in automata theory and model minimization, and suggests methods, which involve varying e, to trade time and space (specifically in terms of the size of the corresponding state space) for solution quality.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Decision-theoretic planning: structural assumptions and computational leverage

TL;DR: In this article, the authors present an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI.
Proceedings Article

Reinforcement Learning with Hierarchies of Machines

TL;DR: This work presents provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrates their effectiveness on a problem with several thousand states.
Journal ArticleDOI

Stochastic dynamic programming with factored representations

TL;DR: This work uses dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decision-tree representation of rewards, and develops versions of standard dynamic programming algorithms that directly manipulate decision-Tree representations of policies and value functions.

Towards a Unified Theory of State Abstraction for MDPs.

TL;DR: This work provides a unified treatment of state abstraction for Markov decision processes by studying five particular abstraction schemes, some of which have been proposed in the past in different forms, and analyzing their usability for planning and learning.
Journal ArticleDOI

Equivalence notions and model minimization in Markov decision processes

TL;DR: The generalization of bisimulation to stochastic processes yields a non-trivial notion of state equivalence that guarantees the optimal policy for the reduced model immediately induces a corresponding Optimal Policy for the original model.
References
More filters
Book

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.
Book

Dynamic Programming

TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.
MonographDOI

Markov Decision Processes

TL;DR: Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.
Journal ArticleDOI

Finite Markov Chains.

TL;DR: This lecture reviews the theory of Markov chains and introduces some of the high quality routines for working with Markov Chains available in QuantEcon.jl.