Showing papers on "Markov decision process published in 1971"

PDF

Open Access

Book•

[...]

01 Jan 1971

TL;DR: In this article, a detailed treatment of the simpler problems, and filling the need to introduce the student to the more sophisticated mathematical concepts required for advanced theory by describing their roles and necessity in an intuitive and natural way.

...read moreread less

Abstract: : The text treats stochastic control problems for Markov chains, discrete time Markov processes, and diffusion models, and discusses method of putting other problems into the Markovian framework. Computational methods are discussed and compared for Markov chain problems. Other topics include the fixed and free time of control, discounted cost, minimizing the average cost per unit time, and optimal stopping. Filtering and conrol for linear systems, and stochastic stability for discrete time problems are discussed thoroughly. The book gives a detailed treatment of the simpler problems, and fills the need to introduce the student to the more sophisticated mathematical concepts required for advanced theory by describing their roles and necessity in an intuitive and natural way. Diffusion models are developed as limits of stochastic difference equations and also via the stochastic integral approach. Examples and exercises are included. (Author)

...read moreread less

643 citations

The Optimal Control of Partially Observable Markov Decision Processes.

[...]

E. J. Sondik

01 Jan 1971

342 citations

Journal Article•DOI•

Continuously Discounted Markov Decision Model with Countable State and Action Space

[...]

Prasadarao Kakumanu

01 Jun 1971-Annals of Mathematical Statistics

61 citations

Journal Article•DOI•

Infinitesimal look-ahead stopping rules

[...]

Sheldon M. Ross

01 Feb 1971-Annals of Mathematical Statistics

TL;DR: In this article, the continuous time optimal stopping problem is considered and an infinitesimal look ahead procedure is defined, and sufficient conditions are then given which ensure that this procedure, which is the continuous-time analogue of the one stage look ahead rule in the discrete time problem, is optimal.

...read moreread less

Abstract: : The continuous time optimal stopping problem is considered and an infinitesimal look ahead procedure is defined. Sufficient conditions are then given which ensure that this procedure, which is the continuous time analogue of the one stage look ahead rule in the discrete time problem, is optimal. These results are then applied to a class of continuous time Markov decision processes.

...read moreread less

49 citations

Book•

Semi-Markov and decision processes

[...]

Ronald A. Howard

01 Jan 1971

34 citations

Journal Article•DOI•

Maximal Average-Reward Policies for Semi-Markov Decision Processes With Arbitrary State and Action Space

[...]

Steven A. Lippman

01 Oct 1971-Annals of Mathematical Statistics

TL;DR: In this paper, the authors consider the problem of maximizing the long-run average (also the long run average expected) reward per unit time in a semi-Markov decision processes with arbitrary state and action space.

...read moreread less

Abstract: We consider the problem of maximizing the long-run average (also the long-run average expected) reward per unit time in a semi-Markov decision processes with arbitrary state and action space. Our main result states that we need only consider the set of stationary policies in that for each $\varepsilon > 0$ there is a stationary policy which is $\varepsilon$-optimal. This result is derived under the assumptions that (roughly) (i) expected rewards and expected transition times are uniformly bounded over all states and actions, and that (ii) there is a state such that the expected length of time until the system returns to this state is uniformly bounded over all policies. The existence of an optimal stationary policy is established under the additional assumption of countable state and finite action space. Applications to queueing reward systems are given.

...read moreread less

32 citations

Journal Article•DOI•

Technical Note—Bounds on the Gain of a Markov Decision Process

[...]

N. A. J. Hastings

01 Feb 1971-Operations Research

TL;DR: This note shows, for the case of single-chain Markov decision processes, how bounds on the optimal gain can be obtained at each cycle of the foregoing algorithms.

...read moreread less

Abstract: An algorithm for the steady-state solution of Markov decision problems has been proposed by Howard and modified by Hastings. This note shows, for the case of single-chain Markov decision processes, how bounds on the optimal gain can be obtained at each cycle of the foregoing algorithms. The results extend to Markov renewal programming. Related results are the bounds proposed by Odoni for use with White's value-iteration method of optimization.

...read moreread less

24 citations

Journal Article•DOI•

Control of markov processes and $ w$-spaces

[...]

N V Krylov

28 Feb 1971-Mathematics of The Ussr-izvestiya

TL;DR: In this paper, the authors studied the control of continuous Markov processes on a semicompactum by two players with conflicting interests and derived a derivation of Bellman's equations in the case where control is exercised for an infinite time.

...read moreread less

Abstract: Problems in the control of continuous Markov processes on a semicompactum by two players with conflicting interests are studied. The basic content of the paper is a derivation of Bellman's equations in the case where control is exercised for an infinite time (Theorem 3), and in the case of a problem of optimal stopping (Theorem 6). The results are illustrated by two examples (Theorems 1 and 2).

...read moreread less

23 citations

Journal Article•DOI•

Mathematical programming and the control of Markov chains

[...]

H. J. Kushner¹, A. J. Kleinman¹•Institutions (1)

Brown University¹

01 May 1971-International Journal of Control

TL;DR: In this paper, linear programming versions of some control problems on Markov chains are derived, and are studied under conditions which occur in typical problems which arise by discretizing continuous time and state systems or in discrete state systems.

...read moreread less

Abstract: Linear programming versions of some control problems on Markov chains are derived, and are studied under conditions which occur in typical problems which arise by discretizing continuous time and state systems or in discrete state systems. Control interpretations of the dual variables and simplex multipliers are given. The formulations allows the treatment of ‘state space’ like constraints which cannot be handled conveniently with dynamic programming. The relation between dynamic programming on Markov chains and the deterministic discrete maximum principle is explored, and some insight is obtained into the problem of singular stochastic controls (with respect to a stochastic maximum principle).

...read moreread less

15 citations

Journal Article•DOI•

Markov Decisions on a Partitioned State Space

[...]

John L. Smith

01 Jan 1971

TL;DR: The framework of an algorithm based on the infinite return optimization algorithms of Howard and Jewell is suggested to compute the optimal policy under this constraint for the Markov decision process.

...read moreread less

Abstract: An important practical constraint on admissible control policies is defined for the Markov decision process. The framework of an algorithm based on the infinite return optimization algorithms of Howard and Jewell is suggested to compute the optimal policy under this constraint. Iterative convergence to the optimal policy cannot be guaranteed, but techniques proposed for state-space reduction and rapid resolution of undetermined policies should render many problems tractable.

...read moreread less

13 citations

Journal Article•DOI•

On sequentially controlled markov processes

[...]

A K Zvonkin

30 Apr 1971-Mathematics of The Ussr-sbornik

TL;DR: In this paper, the authors consider Markov processes with continuous time, where the switching of the controls takes place at random (independent of the future) moments of time, and derive Bellman's cost equation and the existence of optimal strategies, prove the measurability of cost and give an excessive characterization of cost.

...read moreread less

Abstract: We consider Markov processes with continuous time, where the switching of the controls takes place at random (independent of the future) moments of time We derive Bellman's cost equation and the existence of optimal strategies, prove the measurability of cost and give an excessive characterization of cost Bibliography: 9 items

...read moreread less

Journal Article•DOI•

Programming problems and changes in the stable behavior of a class of Markov chains

[...]

Richard V. Evans

01 Sep 1971-Journal of Applied Probability

Solution of markov renewal decision processes with application to computer system scheduling

[...]

John Wesley Boyse

01 Jan 1971

Proceedings Article•DOI•

A Markovian approach to dynamic games of combat

[...]

Sheldon Baron, David L. Kleinman

01 Dec 1971

TL;DR: A new approach to the solution of dynamic games of one-on-one combat is developed by introducing two physically motivated assumptions, and a meaningful discretized game formulation of the combat problem is obtained.

...read moreread less

Abstract: A new approach to the solution of dynamic games of one-on-one combat is developed. By introducing two physically motivated assumptions, a meaningful discretized game formulation of the combat problem is obtained. Then, concepts of Markov processes and state increment dynamic programming are used to develop a feasible computational scheme for solving this problem. The method is demonstrated by applying it to a classical differential game, the Homocidal Chauffeur problem.

...read moreread less

Journal Article•DOI•

Use of gain and other estimates to obtain sub-optimal solution of a class of Markov decision processes†

[...]

S. Das Gupta

01 Dec 1971-International Journal of Control

TL;DR: In this paper, a direct search type sub-optimal algorithm was proposed for the taxi-cab problem and Hamza's problem of coupled Markov processes, where Howard's algorithm may be persued with the suboptimal decision as the starting policy.

...read moreread less

Abstract: Studies in the past decade have indicated considerable interest in the problems of optimizing processes having Markovian property. Certain properties of the transition matrix, associated with such processes, regarding the estimates of the steady-state probability distribution, gain estimate, their order of approximation, error in estimate- etc., has been considered in some depth in this paper. The computational ease of these estimates have led to the development of a direct search type sub-optimal algorithm. The algorithm has been employed to Howard's Taxi-Cab problem and Hamza's problem of coupled Markov processes. Often, this sub-optimal solution is itself the optimal solution, which interestingly is the case for the problems solved in this paper. In case whore a strictly optimal solution is necessary, the Howard's algorithm may be persued with the sub-optimal decision as the starting policy. This will still be advantageous from the point of view of computation time for small-scale systems.

...read moreread less