A modified dynamic programming method for markovian decision problems

doi:10.1016/0022-247X(66)90060-6

Home
/
Papers
/
A modified dynamic programming method for markovian decision problems

Journal Article•DOI•

A modified dynamic programming method for markovian decision problems

James B. MacQueen¹•Institutions (1)

University of California, Los Angeles¹

01 Apr 1966-Journal of Mathematical Analysis and Applications (Academic Press)-Vol. 14, Iss: 1, pp 38-43

TL;DR: In this article, a modified dynamic programming method for the problem of choosing the action at the beginning of each period which will maximize future total discounted income is described, and the convergence appears to be quite rapid.

read less

About: This article is published in Journal of Mathematical Analysis and Applications.The article was published on 1966-04-01 and is currently open access. It has received 147 citations till now. The article focuses on the topics: Monotone polygon & Decision problem.

...read moreread less

Citations

PDF

Open Access

More filters

Book•

A First Course in Stochastic Models

[...]

Henk Tijms

18 Apr 2003

TL;DR: In this article, the authors present an analysis of queuing models useful tools in applied probability useful probability distributions generating functions the discrete fast Fourier transform Laplace transformtheory numerical Laplace inversion the root-finding problem.

...read moreread less

Abstract: Poission process and related processes renewal-reward processes discrete-time Markov chains continuous-time Markov chains Markov chains and queues discrete-time Markov decision processes semi-Markov decision processes advanced renewal theory algorithms analysis of queuing models useful tools in applied probability useful probability distributions generating functions the discrete fast Fourier transform Laplace transformtheory numerical Laplace inversion the root-finding problem.

...read moreread less

840 citations

Journal Article•

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

[...]

Eyal Even-Dar, Shie Mannor, Yishay Mansour

01 Dec 2006-Journal of Machine Learning Research

TL;DR: A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided.

...read moreread less

Abstract: We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O((n/e2)log(1/δ)) times to find an e-optimal arm with probability of at least 1-δ. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability). We provide a model-based and a model-free variants of the elimination method. We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability. Simulations demonstrate a considerable speedup and added robustness over e-greedy Q-learning.

...read moreread less

604 citations

Cites background from "A modified dynamic programming meth..."

...This idea, commonly known as action elimination (AE), was proposed by MacQueen (1966) in the context of planning when the MDP parameters are known....
[...]

Journal Article•DOI•

Generalized polynomial approximations in Markovian decision processes

[...]

Paul J. Schweitzer¹, Abraham Seidmann²•Institutions (2)

Saint Petersburg State University¹, Tel Aviv University²

01 Sep 1985-Journal of Mathematical Analysis and Applications

TL;DR: In this paper, a linear superposition of M basis functions is proposed to fit the value function in a Markovian decision process by reducing the problem dimensionality from the number of states down to M.

...read moreread less

385 citations

Book•

Markov decision processes

[...]

Douglas J. White¹•Institutions (1)

University of Virginia¹

01 Jan 1993

TL;DR: This review presents an overview of theoretical and computational results, applications, several generalizations of the standard MDP problem formulation, and future directions for research.

...read moreread less

Abstract: A review is given of an optimization model of discrete-stage, sequential decision making in a stochastic environment, called the Markov decision process (MDP). This review presents an overview of theoretical and computational results, applications, several generalizations of the standard MDP problem formulation, and future directions for research. The reference list contains seminal papers, key texts, and surveys for the interested reader.

...read moreread less

283 citations

Journal Article•DOI•

Percentile Optimization for Markov Decision Processes with Parameter Uncertainty

[...]

Erick Delage¹, Shie Mannor²•Institutions (2)

HEC Montréal¹, McGill University²

01 Jan 2010-Operations Research

TL;DR: A set of percentile criteria that are conceptually natural and representative of the trade-off between optimistic and pessimistic views of the question are presented and the use of these criteria under different forms of uncertainty for both the rewards and the transitions is studied.

...read moreread less

Abstract: Markov decision processes are an effective tool in modeling decision making in uncertain dynamic environments. Because the parameters of these models typically are estimated from data or learned from experience, it is not surprising that the actual performance of a chosen strategy often differs significantly from the designer's initial expectations due to unavoidable modeling ambiguity. In this paper, we present a set of percentile criteria that are conceptually natural and representative of the trade-off between optimistic and pessimistic views of the question. We study the use of these criteria under different forms of uncertainty for both the rewards and the transitions. Some forms are shown to be efficiently solvable and others highly intractable. In each case, we outline solution concepts that take parametric uncertainty into account in the process of decision making.

...read moreread less

239 citations

Cites methods from "A modified dynamic programming meth..."

...We apply the idea of action elimination, proposed by MacQueen (1966) in the context of the nominal MDP, to the percentile optimization framework to relax this dependence....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Collapse

References

PDF

Open Access

More filters

Book•

Dynamic Programming

[...]

Richard Ernest Bellman

21 Oct 1957

TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.

...read moreread less

Abstract: From the Publisher: An introduction to the mathematical theory of multistage decision processes, this text takes a functional equation approach to the discovery of optimum policies. Written by a leading developer of such policies, it presents a series of methods, uniqueness and existence theorems, and examples for solving the relevant equations. The text examines existence and uniqueness theorems, the optimal inventory equation, bottleneck problems in multistage production processes, a new formalism in the calculus of variation, strategies behind multistage games, and Markovian decision processes. Each chapter concludes with a problem set that Eric V. Denardo of Yale University, in his informative new introduction, calls a rich lode of applications and research topics. 1957 edition. 37 figures.

...read moreread less

14,187 citations

Book•

Dynamic Programming and Markov Processes

[...]

Ronald A. Howard

15 Jun 1960

3,046 citations

Journal Article•DOI•

Dynamic Programming and Markov Processes.

[...]

H. Kaufman, Ronald A. Howard

01 Feb 1961-American Mathematical Monthly

2,428 citations

Journal Article•DOI•

Discounted Dynamic Programming

[...]

David Blackwell

01 Feb 1965-Annals of Mathematical Statistics

931 citations

Journal Article•DOI•

Linear Programming and Sequential Decisions

[...]

Alan S. Manne¹•Institutions (1)

Cowles Foundation¹

01 Apr 1960-Management Science

TL;DR: In this paper, a typical sequential probabilistic model may be formulated in terms of a an initial decision rule and b a Markov process, and then optimized by means of linear programming.

...read moreread less

Abstract: Using an illustration drawn from the area of inventory control, this paper demonstrates how a typical sequential probabilistic model may be formulated in terms of a an initial decision rule and b a Markov process, and then optimized by means of linear programming. This linear programming technique may turn out to be an efficient alternative to the functional equation approach in the numerical analysis of such problems. Regardless of computational significance, however, it is of interest that there should be such a close relationship between the two traditionally distinct areas of dynamic programming and linear programming.

...read moreread less

469 citations