Showing papers on "Markov decision process published in 1977"

PDF

Open Access

Journal Article•DOI•

Dynamic Programming and Stochastic Control

[...]

Dimitri P. Bertsekas, Chelsea C. White¹•Institutions (1)

University of Virginia¹

01 Oct 1977

1,016 citations

Journal Article•DOI•

Discounting, Ergodicity and Convergence for Markov Decision Processes

[...]

Thomas E. Morton, William E. Wecker

01 Apr 1977-Management Science

TL;DR: The convergence rate of Markov decision processes as the horizon length increases can be important for computations and judging the appropriateness of models as mentioned in this paper, and the convergence rate is commonly defined as:

...read moreread less

Abstract: The rate at which Markov decision processes converge as the horizon length increases can be important for computations and judging the appropriateness of models. The convergence rate is commonly as...

...read moreread less

51 citations

Journal Article•DOI•

Decision Problems with Expected Utility Critera, I: Upper and Lower Convergent Utility

[...]

David M. Kreps¹•Institutions (1)

Stanford University¹

01 Feb 1977-Mathematics of Operations Research

TL;DR: A countable stage, countable state, finite action decision problem is considered where the objective is the maximization of the expectation of an arbitrary utility function defined on the sequence of states.

...read moreread less

Abstract: A countable stage, countable state, finite action decision problem is considered where the objective is the maximization of the expectation of an arbitrary utility function defined on the sequence of states. Basic concepts are formulated, generalizing the standard notions of the optimality equations, conserving and unimprovable strategies, and strategy and value iteration. Analogues of positive, negative and convergent dynamic programming are analyzed.

...read moreread less

45 citations

Journal Article•DOI•

The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems

[...]

P. J. Schweitzer¹, Awi Federgruen•Institutions (1)

Saint Petersburg State University¹

01 Nov 1977-Mathematics of Operations Research

TL;DR: It is obtained necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n → ∞ for each initial state and each final reward vector.

...read moreread less

Abstract: This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n → ∞ for each initial state and each final reward vector. In addition, we obtain a characterization of the chain and periodicity structure of the set of one-step and J-step maximal gain policies. Finally, we discuss the asymptotic properties of the undiscounted value-iteration method.

...read moreread less

43 citations

Dissertation•

Finite Memory Estimation and Control of Finite Probabilistic Systems.

[...]

L. K. Platzman

01 Jan 1977

TL;DR: This dissertation introduces concepts and associated computational procedures that are applicable to a mathematical problem arising in the context of Operations Research and Stochastic Control to design a strategy for real-time decision-making on the basis of imperfect (state) information and finite memory.

...read moreread less

Abstract: : This dissertation introduces concepts and associated computational procedures that are applicable to a mathematical problem arising in the context of Operations Research and Stochastic Control. Briefly stated, the problem is to design a strategy for real-time decision-making on the basis of imperfect (state) information and finite memory. The plant (i.e. the object to be controlled) is modelled as a finite probabilistic system (FPS) or stationary discrete-time finite-input finite-output finite-state controlled stochastic process, a generalization of the partially-observed Markov decision model initiated by Drake (1962), which itself generalizes the Markov decision model of Bellman (1957a).

...read moreread less

38 citations

Journal Article•DOI•

Discounted Markov games; successive approximation and stopping times

[...]

J. van der Wal¹•Institutions (1)

Eindhoven University of Technology¹

01 Mar 1977-International Journal of Game Theory

TL;DR: In this article, a number of successive approximation algorithms for the repeated two-person zero-sum game called Markov game using the criterion of total expected discounted rewards were presented, where stopping times are introduced in order to simplify the proofs.

...read moreread less

Abstract: This paper presents a number of successive approximation algorithms for the repeated two-person zero-sum game called Markov game using the criterion of total expected discounted rewards. AsWessels [1977] did for Markov decision processes stopping times are introduced in order to simplify the proofs. It is shown that each algorithm provides upper and lower bounds for the value of the game and nearly optimal stationary strategies for both players.

...read moreread less

26 citations

Journal Article•DOI•

A general markov decision method i: model and techniques

[...]

G. de Leve, Awi Federgruen, Henk Tijms

01 Jan 1977-Advances in Applied Probability

TL;DR: In the policy-iteration algorithm resulting from this approach the number of equations to be solved in any iteration step can be substantially reduced, and by its flexibility, this algorithm allows us to exploit any structure of the particular problem to be solve.

...read moreread less

Abstract: This paper provides a new approach for solving a wide class of Markov decision problems including problems in which the space is general and the system can be continuously controlled. The optimality criterion is the long-run average cost per unit time. We decompose the decision processes into a common underlying stochastic process and a sequence of interventions so that the decision processes can be embedded upon a reduced set of states. Consequently, in the policy-iteration algorithm resulting from this approach the number of equations to be solved in any iteration step can be substantially reduced. Further, by its flexibility, this algorithm allows us to exploit any structure of the particular problem to be solved.

...read moreread less

25 citations

Markov decision processes with unbounded rewards

[...]

J Jaap Wessels, van Jaee Jo Nunen

01 Jan 1977

TL;DR: The final author version and the galley proof are versions of the publication after peer review that features the final layout of the paper including the volume, issue and page numbers.

...read moreread less

Abstract: • A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers.

...read moreread less

19 citations

Discounted and undiscounted value-iteration in Markov decision problems: a survey : (preprint)

[...]

Awi Federgruen, P.J. Schweitzer

01 Jan 1977

TL;DR: In this article, the present state of the art of value-iteration and related successive approximation methods, as well as resulting turnpike properties, in both the discounted and undiscounted version of finite state and action Markov Decision Problems, are surveyed.

...read moreread less

Abstract: A survey is given of the present state of the art of value-iteration and related successive approximation methods, as well as of resulting turnpike properties, in both the discounted and undiscounted version of finite state and action Markov Decision Problems.

...read moreread less

18 citations

Journal Article•DOI•

Stationary Markovian Decision Problems and Perturbation Theory of Quasi-Compact Linear Operators

[...]

J Wijngaard¹•Institutions (1)

Eindhoven University of Technology¹

01 Feb 1977-Mathematics of Operations Research

TL;DR: To prove the continuity of the average costs as function on the space of strategies some perturbation results for quasi-compact linear operators are used to prove the existence of an average optimal strategy.

...read moreread less

Abstract: In this paper stationary Markov decision problems are considered with arbitrary state space and compact space of strategies. Conditions are given for the existence of an average optimal strategy. This is done by using the fact that a continuous function on a compact space attains its minimum. To prove the continuity of the average costs as function on the space of strategies some perturbation results for quasi-compact linear operators are used. In a first set of conditions the boundedness of the one-period cost functions and the quasi-compactness of the Markov processes are assumed. In more general conditions the boundedness of the cost functions is replaced by the boundedness, on a subset A of the state space, of the recurrence time and costs until A and the quasi-compactness of the Markov processes are replaced by the quasi-compactness of the embedded Markov processes on A.

...read moreread less

18 citations

Stopping times and Markov programming

[...]

J Jaap Wessels

01 Jan 1977

TL;DR: In this paper, the authors consider finite state Markov decision processes with finite decision spaces for each state and consider the optimality criterion will be total expected discounted reward over an infinite time horizon.

...read moreread less

Abstract: In this paper we consider finite state Markov decision processes with finite decision spaces for each state. The optimality criterion will be total expected discounted reward over an infinite time horizon. For these problems a great variety of optimization procedures has been developed. We divide them in two classes: policy improvement procedures and policy improvement-value determination procedures.

...read moreread less

Journal Article•DOI•

Sensitive Optimality Criteria in Countable State Dynamic Programming

[...]

Arie Hordijk¹, K. Sladký²•Institutions (2)

Leiden University¹, Czechoslovak Academy of Sciences²

01 Feb 1977-Mathematics of Operations Research

TL;DR: In this article, the equivalence of the sensitive optimality criteria as introduced by Veinott is shown, and the Laurent expansion of the total discounted expected return for the various policies is derived.

...read moreread less

Abstract: Discrete time Markov decision processes with a countable state space are investigated. Under a condition of Liapunov function type the Laurent expansion of the total discounted expected return for the various policies is derived. Moreover, the equivalence of the sensitive optimality criteria as introduced by Veinott is shown.

...read moreread less

Journal Article•DOI•

On the Optimality of Structured Policies in Countable Stage Decision Processes. II: Positive and Negative Problems

[...]

David M. Kreps, Evan L. Porteus

01 Mar 1977-Siam Journal on Applied Mathematics

TL;DR: For positive problems, unimprovable strategies are optimal and the optimal value sequence is the least solution of the optimality equations exceeding an obvious lower bound as mentioned in this paper, which is the standard model of positive and negative dynamic programming.

...read moreread less

Abstract: The analysis of structured countable stage decision processes, initiated in Porteus [11], is continued. The standard models of positive and negative dynamic programming are given in this context, thus extending these results to criteria other than the usual expected sum of rewards, such as expected utility criteria, certain stochastic games, risk sensitive Markov decision processes, and maximin criteria.For positive problems, (what are called) unimprovable strategies are optimal and the optimal value sequence is the least solution of the optimality equations exceeding an obvious lower bound. For negative problems, conserving strategies are optimal, and if one strategy is a one-step improvement on another, then it nets a greater value. (This rules out cycling in the strategy iteration procedure.) Also, transfinite methods are used to prove that the optimal value sequence is the greatest solution of the optimality equations less than an obvious upper bound. We indicate how all these results can be extended ...

...read moreread less

Journal Article•DOI•

Markov ratio decision processes

[...]

V. Aggarwal¹, R. Chandrasekaran², K. P. K. Nair¹•Institutions (2)

University of New Brunswick¹, University of Texas at Dallas²

01 Jan 1977-Journal of Optimization Theory and Applications

TL;DR: In the discounted version of this finite-state Markov decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state.

...read moreread less

Abstract: A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given.

...read moreread less

Recurrence conditions in denumerable state markov decision processes : (preprint)

[...]

Awi Federgruen, Arie Hordijk, Henk Tijms

01 Jan 1977

Book Chapter•DOI•

Stopping Times and Markov Programming

[...]

J Jaap Wessels

01 Jan 1977

TL;DR: Finite state Markov decision processes with finite decision spaces for each state with optimality criterion will be total expected discounted reward over an infinite time horizon is considered.

...read moreread less

Journal Article•DOI•

Dynamic Programming for a Stochastic Markovian Process with an Application to the Mean Variance Models

[...]

Juval Goldwerger¹•Institutions (1)

Bar-Ilan University¹

01 Feb 1977-Management Science

TL;DR: A fresh perspective on the Markov reward process is presented, and a special case: the mean-variability models decision rule of maximizing μ/σ is worked out in detail.

...read moreread less

Abstract: This paper presents a fresh perspective on the Markov reward process. In order to bring Howard's [Howard, R. A. 1969. Dynamic Programing and Markov-Process. The M.I.T. Press, 5th printing.] model closer to practical applicability, two very important aspects of the model are restated: a We make the rewards random variables instead of known constants, and b we allow for any decision rule over the moment set of the portfolio distribution, rather than assuming maximization of the expected value of the portfolio outcome. These modifications provide a natural setting for the rewards to be normally distributed, and thus, applying the mean variance models becomes possible. An algorithm for solution is presented, and a special case: the mean-variability models decision rule of maximizing μ/σ is worked out in detail.

...read moreread less

Journal Article•DOI•

Normalized Markov Decision Chains. II: Optimality of Nonstationary Policies

[...]

Uriel G. Rothblum

01 Feb 1977-Siam Journal on Control and Optimization

TL;DR: Finite state and action, discrete time parameters normalized Markov decision chains with transition matrices that are nonnegative with spectral radius not exceeding one, and it is shown that the periodical reward gained in period N is bounded by a polynom, uniformly over the set of all policies.

...read moreread less

Abstract: In this paper we consider finite state and action, discrete time parameters normalized Markov decision chains, i.e., Markov decision processes with transition matrices that are nonnegative with spectral radius not exceeding one (but not necessarily substochastic). We show that the periodical reward gained in period N is bounded by a polynom, uniformly over the set of all policies. The degree of this polynom can be obtained by considering only the set of stationary policies. Extending and improving results of Sladky (1974) for the stochastic case, we obtain necessary and sufficient conditions for n discount optimality of arbitrary (not necessarily stationary) policies.

...read moreread less

Contraction mappings underlying undiscounted markov decision problems : (preprint)

[...]

Awi Federgruen, P.J. Schweitzer, Henk Tijms

01 Jan 1977

Journal Article•DOI•

Generalized Markovian decision processes

[...]

Israel Brosh¹, Eliezer Shlifer², Paul J. Schweitzer³•Institutions (3)

Tel Aviv University¹, Technion – Israel Institute of Technology², IBM³

01 Oct 1977

TL;DR: In this paper, a general discrete decision process is formulated which includes both undiscounted and discounted semi-Markovian decision processes as special cases, and a policy-iteration algorithm is presented and shown to converge to an optimal policy.

...read moreread less

Abstract: A general discrete decision process is formulated which includes both undiscounted and discounted semi-Markovian decision processes as special cases. A policy-iteration algorithm is presented and shown to converge to an optimal policy. Properties of the coupled functional equations are derived. Primal and dual linear programming formulations of the optimization problem are also given. An application is given to Markov ratio decision process.

...read moreread less

Characterization of strong (Nash) equilibrium points in Markov games

[...]

H.A.M. Couwenbergh

01 Jan 1977

TL;DR: The main result in this paper is the characterization of certain strong kinds of equilibrium points in Markov games with a countable set of players and uncountable decision sets.

...read moreread less

Abstract: The main result in this paper is the characterization of certain strong kinds of equilibrium points in Markov games with a countable set of players and uncountable decision sets. Two person Markov games are studied beforehand, since this paper gives an extension of the existing theory for two person zero sum Markov games; finally we consider the special cases of N-person Markov games and Markov decision processes.

...read moreread less

Geometric convergence of value-iteration in multichain Markov renewal programming

[...]

P.J. Schweitzer, Awi Federgruen

01 Jan 1977

Journal Article•DOI•

The theory of stochastic games by zero stop probabilities

[...]

Truman F. Bewley, Elon Kohlberg

01 Jun 1977-Advances in Applied Probability

TL;DR: In this paper, Tarski's Principle is applied to real closed fields to a field of asymptotic expansions, which the authors termed the field of real Puiseux series.

...read moreread less

Abstract: : The authors study two person, zero sum, stochastic games with zero stop probabilities Two distinct formulations are emphasized, (1) the infinite stage game with payoffs discounted at an interest rate close to zero and (2) the game with a large but finite number of stages The authors give a complete theory of such games The work implies all known existence theorems for optimal policies in Markov decision processes It also generalizes all previous existence theorems for the value of a stochastic game The approach differs from previous work in that it is algebraic and makes no use of the theory of Markov chains The essential idea of this approach is to apply Tarski's Principle on real closed fields to a field of asymptotic expansions, which the authors term the field of real Puiseux series

...read moreread less

Journal Article•DOI•

Learning Control of Markov Processes

[...]

Mitsuo Sato¹, Kenichi Abe¹, Hiroshi Takeda¹•Institutions (1)

Tohoku University¹

01 Jan 1977-Journal of the Society of Instrument and Control Engineers

TL;DR: The decision rules presented in the paper give a policy in estimating the transition probabilities successively from the viewpoint of the dual control, and the policy leads to an optimal Markovian decision process with discounted rewards.

...read moreread less

Abstract: This paper is concerned with an approach to Markovian decision processes with discounted rewards in which the transition probabilities are unknown. The processes are assumed to be finite-state, discrete-time and stationary. Thee decision rules presented in the paper give a policy in estimating the transition probabilities successively from the viewpoint of the dual control, and the policy leads to an optimal

...read moreread less

Book Chapter•DOI•

On Markov Policies in Continuous Time Discounted Dynamic Programming

[...]

Adam Idzik¹•Institutions (1)

Polish Academy of Sciences¹

01 Jan 1977

TL;DR: An axiomatization of discounted dynamic programming with a continuous time parameter (CDP) when the Markov policies are used is given and necessary and sufficient conditions for the existence of an optimal policy are given.

...read moreread less

Abstract: We consider the problem of discounted dynamic programming with a continuous time parameter (CDP) when the Markov policies are used. We give an axiomatization of such discounted CDP. We also give necessary and sufficient conditions for the existence of an optimal policy. Analogously to the discrete case we formulate improvement’s theorems and a theorem on the existence of a (p, e)-optimal policy in a class of semi-Markov policies.

...read moreread less

Proceedings Article•DOI•

Algorithms for solution of a class of stochastic games

[...]

William Gruver¹, Partha Dasgupta¹•Institutions (1)

North Carolina State University¹

01 Dec 1977

TL;DR: This paper treats the computational solution of terminating stochastic games using concepts from deterministic matrix games and Markovian decision processes and a suboptimal approach suitable for large problems is presented.

...read moreread less

Abstract: This paper treats the computational solution of terminating stochastic games using concepts from deterministic matrix games and Markovian decision processes. The algorithms due to Shapley and Pollatschek/ Avi-Itzhak are discussed and a suboptimal approach suitable for large problems is presented. Numerical results are given from an example comparing the three approaches.

...read moreread less

Book Chapter•DOI•

On the Optimality Conditions for Semi-Markov Decision Processes

[...]

Karel Sladký¹•Institutions (1)

Czechoslovak Academy of Sciences¹

01 Jan 1977

TL;DR: In this article, a recurrence formula for the difference between expected rewards and sojourn times generated by N transitions of a semi-Markov decision process with finite state space is presented.

...read moreread less

Abstract: The paper presents a recurrence formula for the difference between expected rewards and sojourn times generated by N transitions of a semi-Markov decision process with finite state space. Using the recurrence formula convergence of policy iteration method can be easily verified and also necessary and sufficient optimality conditions for average optimal and more selective average overtaking optimal policies are established.

...read moreread less

Journal Article•DOI•

Optimal Control of Chemical Plants with Stages Subject to Random Breakdowns

[...]

R.C.H. Cheng¹•Institutions (1)

University of Wales¹

01 Jan 1977-IFAC Proceedings Volumes

TL;DR: An altarnative approach is proposed which converts a problem involving the optimal control of a distributed parameter system into a so called Markov decision process which can be solved by mathematical programming methods.

...read moreread less