Open AccessBook
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Reads0
Chats0
TLDR
Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.Abstract:
From the Publisher:
The past decade has seen considerable theoretical and applied research on Markov decision processes, as well as the growing use of these models in ecology, economics, communications engineering, and other fields where outcomes are uncertain and sequential decision-making processes are needed. A timely response to this increased activity, Martin L. Puterman's new work provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models. It discusses all major research directions in the field, highlights many significant applications of Markov decision processes models, and explores numerous important topics that have previously been neglected or given cursory coverage in the literature. Markov Decision Processes focuses primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous-time discrete state models. The book is organized around optimality criteria, using a common framework centered on the optimality (Bellman) equation for presenting results. The results are presented in a "theorem-proof" format and elaborated on through both discussion and examples, including results that are not available in any other book. A two-state Markov decision process model, presented in Chapter 3, is analyzed repeatedly throughout the book and demonstrates many results and algorithms. Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria. It also explores several topics that have received little or no attention in other books, including modified policy iteration, multichain models with average reward criterion, and sensitive optimality. In addition, a Bibliographic Remarks section in each chapter comments on relevant historicread more
Citations
More filters
Book
Bridging the Gap Between Planning and Scheduling
TL;DR: This paper gives an overview of AI planning and scheduling techniques, focusing on their similarities, differences, and limitations, and argues that many difficult practical problems lie somewhere between plans and scheduling, and that neither area has the right set of tools for solving these vexing problems.
Journal ArticleDOI
Reinforcement Learning in Finite MDPs: PAC Analysis
TL;DR: The current state-of-the-art for near-optimal behavior in finite Markov Decision Processes with a polynomial number of samples is summarized by presenting bounds for the problem in a unified theoretical framework.
Proceedings Article
FF-Replan: a baseline for probabilistic planning
TL;DR: This paper gives the first technical description of FF-Replan and provides an analysis of its results on all of the recent IPPC-04 andIPPC-06 domains, in the hope that this will inspire extensions and insight into the approach and planning domains themselves that will soon lead to the dethroning of FF -Replan.
Proceedings Article
Error bounds for approximate policy iteration
TL;DR: In this article, the authors provide error bounds for approximate policy iterative using quadratic norms, and illustrate those results in the case of feature-based linear function approximation, where most function approximators (such as linear regression) select the best fit in a given class of parameterized functions by minimizing some (weighted) Quadratic norm.
Journal ArticleDOI
Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation
Peter Dayan,Kent C. Berridge +1 more
TL;DR: This work revisits differences between Pavlovian and instrumental learning in the control of incentive motivation and considers the consequences for the computational landscape of prediction, response, and choice.
References
More filters
Book
Dynamic Programming
TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.
Journal ArticleDOI
Finding Optimal (s, S) Policies Is About As Simple As Evaluating a Single Policy
Yu-Sheng Zheng,Awi Federgruen +1 more
TL;DR: A new algorithm for computing optimal ( s , S ) policies is derived based upon a number of new properties of the infinite horizon cost function c as well as a new upper bound for optimal order-up-to levels S * and a new lower bound for ideal reorder levels s *.