scispace - formally typeset
Search or ask a question

Showing papers on "Markov decision process published in 1971"


Book
01 Jan 1971
TL;DR: In this article, a detailed treatment of the simpler problems, and filling the need to introduce the student to the more sophisticated mathematical concepts required for advanced theory by describing their roles and necessity in an intuitive and natural way.
Abstract: : The text treats stochastic control problems for Markov chains, discrete time Markov processes, and diffusion models, and discusses method of putting other problems into the Markovian framework. Computational methods are discussed and compared for Markov chain problems. Other topics include the fixed and free time of control, discounted cost, minimizing the average cost per unit time, and optimal stopping. Filtering and conrol for linear systems, and stochastic stability for discrete time problems are discussed thoroughly. The book gives a detailed treatment of the simpler problems, and fills the need to introduce the student to the more sophisticated mathematical concepts required for advanced theory by describing their roles and necessity in an intuitive and natural way. Diffusion models are developed as limits of stochastic difference equations and also via the stochastic integral approach. Examples and exercises are included. (Author)

643 citations




Journal ArticleDOI
TL;DR: In this article, the continuous time optimal stopping problem is considered and an infinitesimal look ahead procedure is defined, and sufficient conditions are then given which ensure that this procedure, which is the continuous-time analogue of the one stage look ahead rule in the discrete time problem, is optimal.
Abstract: : The continuous time optimal stopping problem is considered and an infinitesimal look ahead procedure is defined. Sufficient conditions are then given which ensure that this procedure, which is the continuous time analogue of the one stage look ahead rule in the discrete time problem, is optimal. These results are then applied to a class of continuous time Markov decision processes.

49 citations


Book
01 Jan 1971

34 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of maximizing the long-run average (also the long run average expected) reward per unit time in a semi-Markov decision processes with arbitrary state and action space.
Abstract: We consider the problem of maximizing the long-run average (also the long-run average expected) reward per unit time in a semi-Markov decision processes with arbitrary state and action space. Our main result states that we need only consider the set of stationary policies in that for each $\varepsilon > 0$ there is a stationary policy which is $\varepsilon$-optimal. This result is derived under the assumptions that (roughly) (i) expected rewards and expected transition times are uniformly bounded over all states and actions, and that (ii) there is a state such that the expected length of time until the system returns to this state is uniformly bounded over all policies. The existence of an optimal stationary policy is established under the additional assumption of countable state and finite action space. Applications to queueing reward systems are given.

32 citations


Journal ArticleDOI
TL;DR: This note shows, for the case of single-chain Markov decision processes, how bounds on the optimal gain can be obtained at each cycle of the foregoing algorithms.
Abstract: An algorithm for the steady-state solution of Markov decision problems has been proposed by Howard and modified by Hastings. This note shows, for the case of single-chain Markov decision processes, how bounds on the optimal gain can be obtained at each cycle of the foregoing algorithms. The results extend to Markov renewal programming. Related results are the bounds proposed by Odoni for use with White's value-iteration method of optimization.

24 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the control of continuous Markov processes on a semicompactum by two players with conflicting interests and derived a derivation of Bellman's equations in the case where control is exercised for an infinite time.
Abstract: Problems in the control of continuous Markov processes on a semicompactum by two players with conflicting interests are studied. The basic content of the paper is a derivation of Bellman's equations in the case where control is exercised for an infinite time (Theorem 3), and in the case of a problem of optimal stopping (Theorem 6). The results are illustrated by two examples (Theorems 1 and 2).

23 citations


Journal ArticleDOI
TL;DR: In this paper, linear programming versions of some control problems on Markov chains are derived, and are studied under conditions which occur in typical problems which arise by discretizing continuous time and state systems or in discrete state systems.
Abstract: Linear programming versions of some control problems on Markov chains are derived, and are studied under conditions which occur in typical problems which arise by discretizing continuous time and state systems or in discrete state systems. Control interpretations of the dual variables and simplex multipliers are given. The formulations allows the treatment of ‘state space’ like constraints which cannot be handled conveniently with dynamic programming. The relation between dynamic programming on Markov chains and the deterministic discrete maximum principle is explored, and some insight is obtained into the problem of singular stochastic controls (with respect to a stochastic maximum principle).

15 citations


Journal ArticleDOI
01 Jan 1971
TL;DR: The framework of an algorithm based on the infinite return optimization algorithms of Howard and Jewell is suggested to compute the optimal policy under this constraint for the Markov decision process.
Abstract: An important practical constraint on admissible control policies is defined for the Markov decision process. The framework of an algorithm based on the infinite return optimization algorithms of Howard and Jewell is suggested to compute the optimal policy under this constraint. Iterative convergence to the optimal policy cannot be guaranteed, but techniques proposed for state-space reduction and rapid resolution of undetermined policies should render many problems tractable.

13 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider Markov processes with continuous time, where the switching of the controls takes place at random (independent of the future) moments of time, and derive Bellman's cost equation and the existence of optimal strategies, prove the measurability of cost and give an excessive characterization of cost.
Abstract: We consider Markov processes with continuous time, where the switching of the controls takes place at random (independent of the future) moments of time We derive Bellman's cost equation and the existence of optimal strategies, prove the measurability of cost and give an excessive characterization of cost Bibliography: 9 items



Proceedings ArticleDOI
01 Dec 1971
TL;DR: A new approach to the solution of dynamic games of one-on-one combat is developed by introducing two physically motivated assumptions, and a meaningful discretized game formulation of the combat problem is obtained.
Abstract: A new approach to the solution of dynamic games of one-on-one combat is developed. By introducing two physically motivated assumptions, a meaningful discretized game formulation of the combat problem is obtained. Then, concepts of Markov processes and state increment dynamic programming are used to develop a feasible computational scheme for solving this problem. The method is demonstrated by applying it to a classical differential game, the Homocidal Chauffeur problem.

Journal ArticleDOI
TL;DR: In this paper, a direct search type sub-optimal algorithm was proposed for the taxi-cab problem and Hamza's problem of coupled Markov processes, where Howard's algorithm may be persued with the suboptimal decision as the starting policy.
Abstract: Studies in the past decade have indicated considerable interest in the problems of optimizing processes having Markovian property. Certain properties of the transition matrix, associated with such processes, regarding the estimates of the steady-state probability distribution, gain estimate, their order of approximation, error in estimate- etc., has been considered in some depth in this paper. The computational ease of these estimates have led to the development of a direct search type sub-optimal algorithm. The algorithm has been employed to Howard's Taxi-Cab problem and Hamza's problem of coupled Markov processes. Often, this sub-optimal solution is itself the optimal solution, which interestingly is the case for the problems solved in this paper. In case whore a strictly optimal solution is necessary, the Howard's algorithm may be persued with the sub-optimal decision as the starting policy. This will still be advantageous from the point of view of computation time for small-scale systems.