scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A modified dynamic programming method for markovian decision problems

01 Apr 1966-Journal of Mathematical Analysis and Applications (Academic Press)-Vol. 14, Iss: 1, pp 38-43
TL;DR: In this article, a modified dynamic programming method for the problem of choosing the action at the beginning of each period which will maximize future total discounted income is described, and the convergence appears to be quite rapid.
About: This article is published in Journal of Mathematical Analysis and Applications.The article was published on 1966-04-01 and is currently open access. It has received 147 citations till now. The article focuses on the topics: Monotone polygon & Decision problem.
Citations
More filters
Book
18 Apr 2003
TL;DR: In this article, the authors present an analysis of queuing models useful tools in applied probability useful probability distributions generating functions the discrete fast Fourier transform Laplace transformtheory numerical Laplace inversion the root-finding problem.
Abstract: Poission process and related processes renewal-reward processes discrete-time Markov chains continuous-time Markov chains Markov chains and queues discrete-time Markov decision processes semi-Markov decision processes advanced renewal theory algorithms analysis of queuing models useful tools in applied probability useful probability distributions generating functions the discrete fast Fourier transform Laplace transformtheory numerical Laplace inversion the root-finding problem.

840 citations

Journal Article
TL;DR: A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided.
Abstract: We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O((n/e2)log(1/δ)) times to find an e-optimal arm with probability of at least 1-δ. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability). We provide a model-based and a model-free variants of the elimination method. We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability. Simulations demonstrate a considerable speedup and added robustness over e-greedy Q-learning.

604 citations


Cites background from "A modified dynamic programming meth..."

  • ...This idea, commonly known as action elimination (AE), was proposed by MacQueen (1966) in the context of planning when the MDP parameters are known....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a linear superposition of M basis functions is proposed to fit the value function in a Markovian decision process by reducing the problem dimensionality from the number of states down to M.

385 citations

Book
01 Jan 1993
TL;DR: This review presents an overview of theoretical and computational results, applications, several generalizations of the standard MDP problem formulation, and future directions for research.
Abstract: A review is given of an optimization model of discrete-stage, sequential decision making in a stochastic environment, called the Markov decision process (MDP). This review presents an overview of theoretical and computational results, applications, several generalizations of the standard MDP problem formulation, and future directions for research. The reference list contains seminal papers, key texts, and surveys for the interested reader.

283 citations

Journal ArticleDOI
TL;DR: A set of percentile criteria that are conceptually natural and representative of the trade-off between optimistic and pessimistic views of the question are presented and the use of these criteria under different forms of uncertainty for both the rewards and the transitions is studied.
Abstract: Markov decision processes are an effective tool in modeling decision making in uncertain dynamic environments. Because the parameters of these models typically are estimated from data or learned from experience, it is not surprising that the actual performance of a chosen strategy often differs significantly from the designer's initial expectations due to unavoidable modeling ambiguity. In this paper, we present a set of percentile criteria that are conceptually natural and representative of the trade-off between optimistic and pessimistic views of the question. We study the use of these criteria under different forms of uncertainty for both the rewards and the transitions. Some forms are shown to be efficiently solvable and others highly intractable. In each case, we outline solution concepts that take parametric uncertainty into account in the process of decision making.

239 citations


Cites methods from "A modified dynamic programming meth..."

  • ...We apply the idea of action elimination, proposed by MacQueen (1966) in the context of the nominal MDP, to the percentile optimization framework to relax this dependence....

    [...]

References
More filters
Book
21 Oct 1957
TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.
Abstract: From the Publisher: An introduction to the mathematical theory of multistage decision processes, this text takes a functional equation approach to the discovery of optimum policies. Written by a leading developer of such policies, it presents a series of methods, uniqueness and existence theorems, and examples for solving the relevant equations. The text examines existence and uniqueness theorems, the optimal inventory equation, bottleneck problems in multistage production processes, a new formalism in the calculus of variation, strategies behind multistage games, and Markovian decision processes. Each chapter concludes with a problem set that Eric V. Denardo of Yale University, in his informative new introduction, calls a rich lode of applications and research topics. 1957 edition. 37 figures.

14,187 citations

Book
15 Jun 1960

3,046 citations

Journal ArticleDOI
Alan S. Manne1
TL;DR: In this paper, a typical sequential probabilistic model may be formulated in terms of a an initial decision rule and b a Markov process, and then optimized by means of linear programming.
Abstract: Using an illustration drawn from the area of inventory control, this paper demonstrates how a typical sequential probabilistic model may be formulated in terms of a an initial decision rule and b a Markov process, and then optimized by means of linear programming. This linear programming technique may turn out to be an efficient alternative to the functional equation approach in the numerical analysis of such problems. Regardless of computational significance, however, it is of interest that there should be such a close relationship between the two traditionally distinct areas of dynamic programming and linear programming.

469 citations