scispace - formally typeset
Search or ask a question

Showing papers on "Markov decision process published in 1988"


Book
01 Jan 1988
TL;DR: The General Theory of Markov Processes, by M. Sharpe, San Diego, 1988, 420 pp.
Abstract: 8. General Theory of Markov Processes. By M. Sharpe. ISBN 0 12 63 9060 6. Academic Press, San Diego, 1988. 420 pp. $49.50.

493 citations


Journal ArticleDOI
TL;DR: A survey of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return) can be found in this article.
Abstract: This paper is a survey of papers which make use of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return). It covers infinite-horizon nondiscounted formulations, infinite-horizon discounted formulations, and finite-horizon formulations. For problem formulations in terms solely of the probabilities of being in each state and taking each action, policy equivalence results are given which allow policies to be restricted to the class of Markov policies or to the randomizations of deterministic Markov policies. For problems which cannot be stated in such terms, in terms of the primitive state setI, formulations involving a redefinition of the states are examined.

127 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a new framework for the study of Markov decision processes in which the control problem is viewed as an optimization problem on the set of canonically induced measures on the trajectory space of the joint state and control process.
Abstract: This paper develops a new framework for the study of Markov decision processes in which the control problem is viewed as an optimization problem on the set of canonically induced measures on the trajectory space of the joint state and control process. This set is shown to be compact convex. One then associates with each of the usual cost criteria (infinite horizon discounted cost, finite horizon, control up to an exit time) a naturally defined occupation measure such that the cost is an integral of some function with respect to this measure. These measures are shown to form a compact convex set whose extreme points are characterized. Classical results about existence of optimal strategies are recovered from this and several applications to multicriteria and constrained optimization problems are briefly indicated.

124 citations


Journal ArticleDOI
TL;DR: The basic theory of hierarchic Markov processes is described and examples are given of applications in replacement models where the replacement decision depends on the quality of the new asset available for replacement.

75 citations


Journal ArticleDOI
TL;DR: This paper extends an earlier paper on real applications of Markov decision processes in which the results of the studies have been implemented, have had some influence on the actual decisions, or on which the analyses are based on real data.
Abstract: This paper extends an earlier paper [White 1985] on real applications of Markov decision processes in which the results of the studies have been implemented, have had some influence on the actual decisions, or in which the analyses are based on real data.

75 citations


Journal ArticleDOI
TL;DR: In this article, the effect of perturbations in the data of a discrete-time Markov reward process on the finite-horizon total expected reward, the infinitehorizon expected discounted and average reward and the total expected rewards up to a first passage time was studied.
Abstract: We study the effect of perturbations in the data of a discrete-time Markov reward process on the finite-horizon total expected reward, the infinite-horizon expected discounted and average reward and the total expected reward up to a first-passage time. Bounds for the absolute errors of these reward functions are obtained. The results are illustrated for a finite as well as infinite queueing systems (M/M/1/S and ). Extensions to Markov decision processes and other settings are discussed.

73 citations


Journal ArticleDOI
TL;DR: A review of the relevant theoretical results in order to call them to the attention of civil engineers involved with pavement management systems can be found in this article, where a variety of problems involving inspection, repair, and replacement are considered.
Abstract: The problem of scheduling maintenance for pavements in an optimum fashion has been approached in a variety of ways by researchers and practitioners. However, the Markov decision process has found very limited use despite the fact that cumulative damage is readily modeled by a Markov chain and that a wealth of immediately applicable theoretical results exist in the literature. The solutions are known for a variety of problems involving inspection, repair, and replacement, making it possible to solve directly for an optimal policy in the form of a control law. This paper reviews some of the relevant theoretical results in order to call them to the attention of civil engineers involved with pavement management systems.

69 citations


Book ChapterDOI
01 Jan 1988
TL;DR: In this article, the authors studied the long run average cost control problem for discrete time Markov chains in an extremely general framework and established the existence of stable stationary strategies which are optimal in the appropriate sense.
Abstract: The long-run average cost control problem for discrete time Markov chains is studied in an extremely general framework. Existence of stable stationary strategies which are optimal in the appropriate sense is established and these are characterized via the dynamic programming equations. The approach here differs from the conventional approach via the discounted cost problem and covers situations not covered by the latter.

53 citations


Book ChapterDOI
01 Jan 1988
TL;DR: This work emphasizes the use of induction on a sequence of successive approximations of the optimal value function (value iteration) to establish the form of optimal control policies.
Abstract: Queueing models are frequently helpful in the analysis and control of communication, manufacturing, and transportation systems The theory of Markov decision processes and the inductive techniques of dynamic programming have been used to develop normative models for optimal control of admission, servicing, routing, and scheduling of jobs in queues and networks of queues We review some of these models, beginning with single-facility models and then progressing to models for networks of queues We emphasize the use of induction on a sequence of successive approximations of the optimal value function (value iteration) to establish the form of optimal control policies

43 citations


Journal ArticleDOI
01 Sep 1988
TL;DR: It is proven that this scheme becomes epsilon -optimal as well as optimal by suitable choice of control parameter values in the sense that a relative frequency coefficient of making optimal decisions tends to the maximum.
Abstract: An efficient scheme is presented for a learning control problem of finite Markov chains with unknown dynamics, i.e. with unknown transition probabilities. The scheme is designed to optimize the asymptotic system performance and for easy application to models with relatively many states and decisions. In this scheme a control policy is determined each time through maximization of a simple performance criterion that explicitly incorporates a tradeoff between estimation of the unknown probabilities and control of the system. The policy determination can be easily performed even in the case of large-size models, since the maximizing operation can be greatly simplified by use of the policy-iteration method. It is proven that this scheme becomes epsilon -optimal as well as optimal by suitable choice of control parameter values in the sense that a relative frequency coefficient of making optimal decisions tends to the maximum. >

38 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the communicating property of Markov Decision Processes is equivalent to satisfaction of sets of linear equations, and a mapping between the "multichain" and "unichain' linear programs for undiscounted MDPs was developed by applying this equivalence.

Journal ArticleDOI
TL;DR: In this article, sufficient conditions for the existence of stationary Blackwell optimal policies are given for the one-step transition probability matrices and their resolvents, which appear to be more natural than those recently introduced by Dekker and Hordijk.

Journal ArticleDOI
TL;DR: In this article, the authors consider stochastic optimal control problems over an infinite horizon, where the reward is discounted and there is only one optimality criterion and standard dynamic programming can be applied.

Journal ArticleDOI
TL;DR: In this paper, the authors consider average reward Markov decision processes with discrete time parameter and denumerable state space, and they find necessary and sufficient conditions so that, for arbitrary bounded reward function, the corresponding average reward optimality equation has a bounded solution.

Book ChapterDOI
01 Jan 1988
TL;DR: The problem of steering a lon-run average cost functional to a prespecified value is discussed in the context of Markov decision processes with countable state-space and a methodology that was found useful in investigating properties of Certainty Equivalence implementations is outlined.
Abstract: In this paper, the problem of steering a lon-run average cost functional to a prespecified value is discussed in the context of Markov decision processes with countable state-space; this problem naturally arises in the study of constrained Markov decision processes by Lagrangian arguments. Under reasonable assumptions, a Markov stationary steering control is shown to exist and to be obtained by fixed memoryless randomization between two Markov stationary policies. The implementability of this randomized policy is investigated in view of the fact that the randomization bias is solution to a (highly) nonlinear equation, which may not even be available in the absence of full knowledge of the model parameter values. Several proposals for implementation are made and their relative properties discussed. The paper closes with an outline of a methodology that was found useful in investigating properties of Certainty Equivalence implementations.

Proceedings ArticleDOI
07 Dec 1988
TL;DR: Direct sample path arguments are presented for investigating the convergence of the sample average costs under this adaptive policy that alternates between two stationary policies so as to track adaptively a sample average cost to a desired value.
Abstract: A class of adaptive policies is defined by Markov decision processes (MDPs) under some recurrence conditions. The proposed policy alternates between two stationary policies so as to track adaptively a sample average cost to a desired value. Direct sample path arguments are presented for investigating the convergence of the sample average costs under this adaptive policy. The results have applications to MDPs with a single constraint. >

Book ChapterDOI
07 Dec 1988
TL;DR: The study represents the initial stages of a program to address the adaptive control of partially observable Markov decision processes (POMDP) with finite state, action, and observation spaces and initial results in the direction of using the ODE method are obtained.
Abstract: The study represents the initial stages of a program to address the adaptive control of partially observable Markov decision processes (POMDP) with finite state, action, and observation spaces. The authors review the results on the control of POMPD with known parameters and, in particular, the results on the control of quality control/machine replacement models. They study the adaptive control of a problem with simple structure: the two-state binary replacement problem. An adaptive control algorithm is defined, and initial results in the direction of using the ODE method are obtained. >

Journal ArticleDOI
TL;DR: In this article, the problem of characterizing the minimum perturbations to parameters in future stages of a discrete dynamic program necessary to change the optimal first policy is considered, and upper and lower bounds on the perturbation ranges are derived and used to establish ranges for the reward functions over which the initial policy is robust.
Abstract: The problem of characterizing the minimum perturbations to parameters in future stages of a discrete dynamic program necessary to change the optimal first policy is considered. Lower bounds on these perturbations are derived and used to establish ranges for the reward functions over which the optimal first policy is robust. A numerical example is presented to illustrate factors affecting the tightness of these bounds.

Journal ArticleDOI
TL;DR: In this article, it was shown that all limit points of discounted optimal stationary policies when the discount factor goes to 1 are strong 1-optimal, i.e., they are 1-optimality.

Journal ArticleDOI
TL;DR: The purpose of this paper is both to present a methodology which takes advantage of the structure of many large scale problems and to provide computational results indicating the value of the approach.

Journal ArticleDOI
TL;DR: In this paper, a discrete time, infinite horizon, dynamic programming model for the replacement of components in a binary coherent system is studied. And it is shown that it is optimal to follow a critical component policy (CCP) under quite general conditions.

Journal ArticleDOI
TL;DR: Algorithms for determining optimal policies for finite state, finite action, infinite discrete time horizon Markov decision processes and management implications of certain hypothesized relationships between mallard survival and harvest rates are addressed by applying the optimality procedures to mallard population models.
Abstract: Algorithms are described for determining optimal policies for finite state, finite action, infinite discrete time horizon Markov decision processes. Both value-improvement and policy-improvement techniques are used in the algorithms. Computing procedures are also described. The algorithms are appropriate for processes that are either finite or infinite, deterministic or stochastic, discounted or undiscounted, in any meaningful combination of these features. Computing procedures are described in terms of initial data processing, bound improvements, process reduction, and testing and solution. Application of the methodology is illustrated with an example involving natural resource management. Management implications of certain hypothesized relationships between mallard survival and harvest rates are addressed by applying the optimality procedures to mallard population models.


Book ChapterDOI
01 Jan 1988
TL;DR: Different optimality criteria for undiscounted infinite horizon optimal control problems are reviewed and special attention is paid to discrete time Markov decision processes with finite state space.
Abstract: Different optimality criteria for undiscounted infinite horizon optimal control problems are reviewed. Special attention is paid to discrete time Markov decision processes with finite state space. The different criteria are compared on an illustrative example.

Journal ArticleDOI
G. Hübner1
TL;DR: In this article, the adaptive control of average reward Markov decision processes with an unknown parameter chooses at each stage a decision which is optimal for the average reward problem with the presently estimated parameter, but in many cases it is inefficient or impossible to compute each time the long run optimal policy.
Abstract: The classical procedure for the adaptive control of average reward Markov decision processes with an unknown parameter chooses at each stage a decision which is optimal for the average reward problem with the presently estimated parameter. But in many cases it is inefficient or impossible to compute each time the long run optimal policy. So successive approximation methods were proposed and investigated. We present a unifying and generalizing approach including both types of methods mentioned above and generating a lot of new procedures, too.

Journal ArticleDOI
TL;DR: In this article, the authors show that the infinite-horizon value function for a linear/quadratic Markov decision process by policy improvement is exactly equivalent to solution of the equilibrium Riccati equation by the Newton-Raphson method.
Abstract: We show that the calculation of the infinite-horizon value function for a linear/quadratic Markov decision process by policy improvement is exactly equivalent to solution of the equilibrium Riccati equation by the Newton-Raphson method. The assertion extends to risk-sensitive and non-Markov forinulations and thus shows, for example, that the Newton-Raphson method provides an iterative algorithm for the canonical factorization of operators which shows second-order convergence and has a variational basis.

Journal ArticleDOI
TL;DR: In this paper, the authors explore the status of the necessary and sufficient conditions given by Whittle for Markov decision processes (MDPs) when these conditions fail, and show that these conditions are not always satisfied.
Abstract: Whittle enunciated an important reduction principle in dynamic programming when he showed that under certain conditions optimal strategies for Markov decision processes (MDPs) placed in parallel to one another take actions in a way which is consistent with the optimal strategies for the individual MDPs. However, the necessary and sufficient conditions given by Whittle are by no means always satisfied. We explore the status of this computationally attractive reduction principle when these conditions fail.

Book ChapterDOI
01 Jan 1988
TL;DR: In this paper an aggregation — disaggregation method is formulated for a finite horizon Markov decision process with two-dimensional state and action spaces that contains a similar type of information in which aggregation is both natural and simple.
Abstract: In this paper an aggregation — disaggregation method is formulated for a finite horizon Markov decision process with two-dimensional state and action spaces. This second dimension of the state and the action contains a similar type of information in which aggregation is both natural and simple. The quality of the approach is illustrated by an example.

Journal ArticleDOI
TL;DR: In this article, the authors consider infinite horizon discounted Markov decision processes and conditions under which discount-isotone optimal policies exist and show that the induced partial ordering facilitates the solutions for higher discount factor levels.
Abstract: This paper considers infinite horizon discounted Markov decision processes and conditions under which discount-isotone optimal policies exist. Given partial orders over the state and action spaces, a set of discount-isotone optimal policies is a set of optimal policies, one for each discount factor in a given set, such that, for each state, the optimal actions are partially ordered in such a manner as to match the ordering of the discount factors. It is easier to solve problems with small discount factors and the induced partial ordering facilitates the solutions for higher discount factor levels.

Proceedings ArticleDOI
07 Dec 1988
TL;DR: The authors first introduce a class of adaptive controllers for Markov chains that meet the challenge, under fairly relaxed conditions, based on the concept of weak contrast functions, an extension of P.L. Mandi's (1974) contrast functions.
Abstract: The authors consider the control of dynamic systems modeled as a family of either Markov or semi-Markov processes, parameterized by an unknown parameter alpha which takes values in a given finite set A. The true parameter alpha /sup 0/, which represents the real system, belongs to A. The problem is to devise a control strategy that despite the initial ignorance of alpha /sup 0/ can still achieve the minimum long-run average cost. The authors first introduce a class of adaptive controllers for Markov chains that meet the challenge, under fairly relaxed conditions. These algorithms are based on the concept of weak contrast functions, an extension of P.L. Mandi's (1974) contrast functions. A class of adaptive controllers for semi-Markov processes is formulated based on the weak contrast function concept. The authors show that these controllers achieve the minimum long-run average cost and illustrate their application to the adaptive multilayer control of Markov chains. >