TL;DR: A survey of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return) can be found in this article.

...read moreread less

Abstract: This paper is a survey of papers which make use of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return). It covers infinite-horizon nondiscounted formulations, infinite-horizon discounted formulations, and finite-horizon formulations. For problem formulations in terms solely of the probabilities of being in each state and taking each action, policy equivalence results are given which allow policies to be restricted to the class of Markov policies or to the randomizations of deterministic Markov policies. For problems which cannot be stated in such terms, in terms of the primitive state setI, formulations involving a redefinition of the states are examined.

...read moreread less

127 citations

Journal Article•DOI•

A convex analytic approach to Markov decision processes

[...]

Vivek S. Borkar¹•Institutions (1)

University of Maryland, College Park¹

01 Aug 1988-Probability Theory and Related Fields

TL;DR: In this article, the authors developed a new framework for the study of Markov decision processes in which the control problem is viewed as an optimization problem on the set of canonically induced measures on the trajectory space of the joint state and control process.

...read moreread less

Abstract: This paper develops a new framework for the study of Markov decision processes in which the control problem is viewed as an optimization problem on the set of canonically induced measures on the trajectory space of the joint state and control process. This set is shown to be compact convex. One then associates with each of the usual cost criteria (infinite horizon discounted cost, finite horizon, control up to an exit time) a naturally defined occupation measure such that the cost is an integral of some function with respect to this measure. These measures are shown to form a compact convex set whose extreme points are characterized. Classical results about existence of optimal strategies are recovered from this and several applications to multicriteria and constrained optimization problems are briefly indicated.

...read moreread less

124 citations

Journal Article•DOI•

Hierarchic Markov processes and their applications in replacement models

[...]

Anders Ringgaard Kristensen

01 May 1988-European Journal of Operational Research

TL;DR: The basic theory of hierarchic Markov processes is described and examples are given of applications in replacement models where the replacement decision depends on the quality of the new asset available for replacement.

...read moreread less

75 citations

Journal Article•DOI•

Further Real Applications of Markov Decision Processes

[...]

Douglas J. White

01 Oct 1988-Interfaces

TL;DR: This paper extends an earlier paper on real applications of Markov decision processes in which the results of the studies have been implemented, have had some influence on the actual decisions, or on which the analyses are based on real data.

...read moreread less

Abstract: This paper extends an earlier paper [White 1985] on real applications of Markov decision processes in which the results of the studies have been implemented, have had some influence on the actual decisions, or in which the analyses are based on real data.

...read moreread less

75 citations

Journal Article•DOI•

Perturbation theory for Markov reward processes with applications to queueing systems

[...]

Nico M. Van Dijk, Martin L. Puterman

01 Mar 1988-Advances in Applied Probability

TL;DR: In this article, the effect of perturbations in the data of a discrete-time Markov reward process on the finite-horizon total expected reward, the infinitehorizon expected discounted and average reward and the total expected rewards up to a first passage time was studied.

...read moreread less

Abstract: We study the effect of perturbations in the data of a discrete-time Markov reward process on the finite-horizon total expected reward, the infinite-horizon expected discounted and average reward and the total expected reward up to a first-passage time. Bounds for the absolute errors of these reward functions are obtained. The results are illustrated for a finite as well as infinite queueing systems (M/M/1/S and ). Extensions to Markov decision processes and other settings are discussed.

...read moreread less

73 citations

Journal Article•DOI•

Analytical Framework for Optimizing Pavement Maintenance

[...]

James V. Carnahan¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 May 1988-Journal of Transportation Engineering-asce

TL;DR: A review of the relevant theoretical results in order to call them to the attention of civil engineers involved with pavement management systems can be found in this article, where a variety of problems involving inspection, repair, and replacement are considered.

...read moreread less

Abstract: The problem of scheduling maintenance for pavements in an optimum fashion has been approached in a variety of ways by researchers and practitioners. However, the Markov decision process has found very limited use despite the fact that cumulative damage is readily modeled by a Markov chain and that a wealth of immediately applicable theoretical results exist in the literature. The solutions are known for a variety of problems involving inspection, repair, and replacement, making it possible to solve directly for an optimal policy in the form of a control law. This paper reviews some of the relevant theoretical results in order to call them to the attention of civil engineers involved with pavement management systems.

...read moreread less

69 citations

Book Chapter•DOI•

Control of Markov Chains with Long-Run Average Cost Criterion

[...]

Vivek S. Borkar¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1988

TL;DR: In this article, the authors studied the long run average cost control problem for discrete time Markov chains in an extremely general framework and established the existence of stable stationary strategies which are optimal in the appropriate sense.

...read moreread less

Abstract: The long-run average cost control problem for discrete time Markov chains is studied in an extremely general framework. Existence of stable stationary strategies which are optimal in the appropriate sense is established and these are characterized via the dynamic programming equations. The approach here differs from the conventional approach via the discounted cost problem and covers situations not covered by the latter.

...read moreread less

53 citations

Book Chapter•DOI•

Scheduling, Routing, and Flow Control in Stochastic Networks

[...]

Shaler Stidham¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Jan 1988

TL;DR: This work emphasizes the use of induction on a sequence of successive approximations of the optimal value function (value iteration) to establish the form of optimal control policies.

...read moreread less

Abstract: Queueing models are frequently helpful in the analysis and control of communication, manufacturing, and transportation systems The theory of Markov decision processes and the inductive techniques of dynamic programming have been used to develop normative models for optimal control of admission, servicing, routing, and scheduling of jobs in queues and networks of queues We review some of these models, beginning with single-facility models and then progressing to models for networks of queues We emphasize the use of induction on a sequence of successive approximations of the optimal value function (value iteration) to establish the form of optimal control policies

...read moreread less

43 citations

Journal Article•DOI•

Learning control of finite Markov chains with an explicit trade-off between estimation and control

[...]

Mitsuo Sato¹, Kenichi Abe, Hiroshi Takeda•Institutions (1)

Tohoku University¹

01 Sep 1988

TL;DR: It is proven that this scheme becomes epsilon -optimal as well as optimal by suitable choice of control parameter values in the sense that a relative frequency coefficient of making optimal decisions tends to the maximum.

...read moreread less

Abstract: An efficient scheme is presented for a learning control problem of finite Markov chains with unknown dynamics, i.e. with unknown transition probabilities. The scheme is designed to optimize the asymptotic system performance and for easy application to models with relatively many states and decisions. In this scheme a control policy is determined each time through maximization of a simple performance criterion that explicitly incorporates a tradeoff between estimation of the unknown probabilities and control of the system. The policy determination can be easily performed even in the case of large-size models, since the maximizing operation can be greatly simplified by use of the policy-iteration method. It is proven that this scheme becomes epsilon -optimal as well as optimal by suitable choice of control parameter values in the sense that a relative frequency coefficient of making optimal decisions tends to the maximum. >

...read moreread less

38 citations

Journal Article•DOI•

Communicating MDPs: Equivalence and LP properties

[...]

Jerzy A. Filar¹, Todd A. Schultz²•Institutions (2)

University of Maryland, Baltimore County¹, Augusta College²

01 Dec 1988-Operations Research Letters

TL;DR: In this paper, it was shown that the communicating property of Markov Decision Processes is equivalent to satisfaction of sets of linear equations, and a mapping between the "multichain" and "unichain' linear programs for undiscounted MDPs was developed by applying this equivalence.

...read moreread less

Journal Article•DOI•

Conditions for Existence of Average and Blackwell Optimal Stationary Policies in Denumerable Markov Decision Processes

[...]

Jean B. Lasserre¹•Institutions (1)

Hoffmann-La Roche¹

01 Dec 1988-Journal of Mathematical Analysis and Applications

TL;DR: In this article, sufficient conditions for the existence of stationary Blackwell optimal policies are given for the one-step transition probability matrices and their resolvents, which appear to be more natural than those recently introduced by Dekker and Hordijk.

...read moreread less

Journal Article•DOI•

A forecast horizon and a stopping rule for general Markov decision processes

[...]

Onésimo Hernández-Lerma, Jean B. Lasserre¹•Institutions (1)

Hoffmann-La Roche¹

01 Jun 1988-Journal of Mathematical Analysis and Applications

TL;DR: In this article, the authors consider stochastic optimal control problems over an infinite horizon, where the reward is discounted and there is only one optimality criterion and standard dynamic programming can be applied.

...read moreread less

Journal Article•DOI•

Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains

[...]

Rolando Cavazos-Cadena¹•Institutions (1)

Universidad Autónoma Agraria Antonio Narro¹

03 Jan 1988-Systems & Control Letters

TL;DR: In this paper, the authors consider average reward Markov decision processes with discrete time parameter and denumerable state space, and they find necessary and sufficient conditions so that, for arbitrary bounded reward function, the corresponding average reward optimality equation has a bounded solution.

...read moreread less

Book Chapter•DOI•

Implementation Issues for Markov Decision Processes

[...]

Armand M. Makowski¹, Adam Shwartz¹•Institutions (1)

University of Maryland, College Park¹

01 Jan 1988

TL;DR: The problem of steering a lon-run average cost functional to a prespecified value is discussed in the context of Markov decision processes with countable state-space and a methodology that was found useful in investigating properties of Certainty Equivalence implementations is outlined.

...read moreread less

Abstract: In this paper, the problem of steering a lon-run average cost functional to a prespecified value is discussed in the context of Markov decision processes with countable state-space; this problem naturally arises in the study of constrained Markov decision processes by Lagrangian arguments. Under reasonable assumptions, a Markov stationary steering control is shown to exist and to be obtained by fixed memoryless randomization between two Markov stationary policies. The implementability of this randomized policy is investigated in view of the fact that the randomization bias is solution to a (highly) nonlinear equation, which may not even be available in the absence of full knowledge of the model parameter values. Several proposals for implementation are made and their relative properties discussed. The paper closes with an outline of a methodology that was found useful in investigating properties of Certainty Equivalence implementations.

...read moreread less

Proceedings Article•DOI•

A class of steering policies under a recurrence condition

[...]

Dye-Juan Ma, Armand M. Makowski

07 Dec 1988

TL;DR: Direct sample path arguments are presented for investigating the convergence of the sample average costs under this adaptive policy that alternates between two stationary policies so as to track adaptively a sample average cost to a desired value.

...read moreread less

Abstract: A class of adaptive policies is defined by Markov decision processes (MDPs) under some recurrence conditions. The proposed policy alternates between two stationary policies so as to track adaptively a sample average cost to a desired value. Direct sample path arguments are presented for investigating the convergence of the sample average costs under this adaptive policy. The results have applications to MDPs with a single constraint. >

...read moreread less

Book Chapter•DOI•

On the adaptive control of a partially observable Markov decision process

[...]

E. Fernandez-Gaucherand, Ari Arapostathis, S.I. Marcus

07 Dec 1988

TL;DR: The study represents the initial stages of a program to address the adaptive control of partially observable Markov decision processes (POMDP) with finite state, action, and observation spaces and initial results in the direction of using the ODE method are obtained.

...read moreread less

Abstract: The study represents the initial stages of a program to address the adaptive control of partially observable Markov decision processes (POMDP) with finite state, action, and observation spaces. The authors review the results on the control of POMPD with known parameters and, in particular, the results on the control of quality control/machine replacement models. They study the adaptive control of a problem with simple structure: the two-state binary replacement problem. An adaptive control algorithm is defined, and initial results in the direction of using the ODE method are obtained. >

...read moreread less

Journal Article•DOI•

Sensitivity analysis in discrete dynamic programming

[...]

Wallace J. Hopp¹•Institutions (1)

Northwestern University¹

01 Feb 1988-Journal of Optimization Theory and Applications

TL;DR: In this article, the problem of characterizing the minimum perturbations to parameters in future stages of a discrete dynamic program necessary to change the optimal first policy is considered, and upper and lower bounds on the perturbation ranges are derived and used to establish ranges for the reward functions over which the initial policy is robust.

...read moreread less

Abstract: The problem of characterizing the minimum perturbations to parameters in future stages of a discrete dynamic program necessary to change the optimal first policy is considered. Lower bounds on these perturbations are derived and used to establish ranges for the reward functions over which the optimal first policy is robust. A numerical example is presented to illustrate factors affecting the tightness of these bounds.

...read moreread less

Journal Article•DOI•

Strong 1-optimal stationary policies in denumerable Markov decision processes

[...]

R. Cavazos-Cadena, Jean B. Lasserre¹•Institutions (1)

Hoffmann-La Roche¹

01 Jul 1988-Systems & Control Letters

TL;DR: In this article, it was shown that all limit points of discounted optimal stationary policies when the discount factor goes to 1 are strong 1-optimal, i.e., they are 1-optimality.

...read moreread less

Journal Article•DOI•

A methodology for computation reduction for specially structured large scale Markov decision problems

[...]

Fong-Yuen Ding¹, Thom J. Hodgson², Russell E. King²•Institutions (2)

Northern Illinois University¹, North Carolina State University²

01 Feb 1988-European Journal of Operational Research

TL;DR: The purpose of this paper is both to present a methodology which takes advantage of the structure of many large scale problems and to provide computational results indicating the value of the approach.

...read moreread less

Journal Article•DOI•

Optimal replacement policies for a multicomponent reliability system

[...]

James Flynn¹, Chia-Shin Chung¹, Dalen T. Chiang¹•Institutions (1)

Cleveland State University¹

01 Aug 1988-Operations Research Letters

TL;DR: In this paper, a discrete time, infinite horizon, dynamic programming model for the replacement of components in a binary coherent system is studied. And it is shown that it is optimal to follow a critical component policy (CCP) under quite general conditions.

...read moreread less

Journal Article•DOI•

Markov: A methodology for the solution of infinite time horizon markov decision processes

[...]

Byron K. Williams¹•Institutions (1)

United States Fish and Wildlife Service¹

01 Dec 1988-Applied Stochastic Models and Data Analysis

TL;DR: Algorithms for determining optimal policies for finite state, finite action, infinite discrete time horizon Markov decision processes and management implications of certain hypothesized relationships between mallard survival and harvest rates are addressed by applying the optimality procedures to mallard population models.

...read moreread less

Abstract: Algorithms are described for determining optimal policies for finite state, finite action, infinite discrete time horizon Markov decision processes. Both value-improvement and policy-improvement techniques are used in the algorithms. Computing procedures are also described. The algorithms are appropriate for processes that are either finite or infinite, deterministic or stochastic, discounted or undiscounted, in any meaningful combination of these features. Computing procedures are described in terms of initial data processing, bound improvements, process reduction, and testing and solution. Application of the methodology is illustrated with an example involving natural resource management. Management implications of certain hypothesized relationships between mallard survival and harvest rates are addressed by applying the optimality procedures to mallard population models.

...read moreread less

Journal Article•DOI•

Note on “A Partially Observable Markov Decision Process with Lagged Information”

[...]

Chelsea C. White¹•Institutions (1)

University of Virginia¹

01 Feb 1988-Journal of the Operational Research Society

Book Chapter•DOI•

Decision Horizon, Overtaking and 1-Optimality Criteria in Optimal Control

[...]

Jean B. Lasserre¹•Institutions (1)

University of California¹

01 Jan 1988

TL;DR: Different optimality criteria for undiscounted infinite horizon optimal control problems are reviewed and special attention is paid to discrete time Markov decision processes with finite state space.

...read moreread less

Abstract: Different optimality criteria for undiscounted infinite horizon optimal control problems are reviewed. Special attention is paid to discrete time Markov decision processes with finite state space. The different criteria are compared on an illustrative example.

...read moreread less

Journal Article•DOI•

A unified approach to adaptive control of average reward Markov decision processes

[...]

G. Hübner¹•Institutions (1)

University of Hamburg¹

01 Sep 1988-Or Spektrum

TL;DR: In this article, the adaptive control of average reward Markov decision processes with an unknown parameter chooses at each stage a decision which is optimal for the average reward problem with the presently estimated parameter, but in many cases it is inefficient or impossible to compute each time the long run optimal policy.

...read moreread less

Abstract: The classical procedure for the adaptive control of average reward Markov decision processes with an unknown parameter chooses at each stage a decision which is optimal for the average reward problem with the presently estimated parameter. But in many cases it is inefficient or impossible to compute each time the long run optimal policy. So successive approximation methods were proposed and investigated. We present a unifying and generalizing approach including both types of methods mentioned above and generating a lot of new procedures, too.

...read moreread less

Journal Article•DOI•

Policy Improvement and the Newton-Raphson Algorithm

[...]

P. Whittle¹, N. Komarova•Institutions (1)

University of Cambridge¹

01 Apr 1988-Probability in the Engineering and Informational Sciences

TL;DR: In this article, the authors show that the infinite-horizon value function for a linear/quadratic Markov decision process by policy improvement is exactly equivalent to solution of the equilibrium Riccati equation by the Newton-Raphson method.

...read moreread less

Abstract: We show that the calculation of the infinite-horizon value function for a linear/quadratic Markov decision process by policy improvement is exactly equivalent to solution of the equilibrium Riccati equation by the Newton-Raphson method. The assertion extends to risk-sensitive and non-Markov forinulations and thus shows, for example, that the Newton-Raphson method provides an iterative algorithm for the canonical factorization of operators which shows second-order convergence and has a variational basis.

...read moreread less

Journal Article•DOI•

On a reduction principle in dynamic programming

[...]

Kevin D. Glazebrook

01 Dec 1988-Advances in Applied Probability

TL;DR: In this paper, the authors explore the status of the necessary and sufficient conditions given by Whittle for Markov decision processes (MDPs) when these conditions fail, and show that these conditions are not always satisfied.

...read moreread less

Abstract: Whittle enunciated an important reduction principle in dynamic programming when he showed that under certain conditions optimal strategies for Markov decision processes (MDPs) placed in parallel to one another take actions in a way which is consistent with the optimal strategies for the individual MDPs. However, the necessary and sufficient conditions given by Whittle are by no means always satisfied. We explore the status of this computationally attractive reduction principle when these conditions fail.

...read moreread less

Book Chapter•DOI•

Aggregation — Disaggregation Algorithms for Discrete Stochastic Systems

[...]

Grzegorz Reyman, Jan van der Wal

01 Jan 1988

TL;DR: In this paper an aggregation — disaggregation method is formulated for a finite horizon Markov decision process with two-dimensional state and action spaces that contains a similar type of information in which aggregation is both natural and simple.

...read moreread less

Abstract: In this paper an aggregation — disaggregation method is formulated for a finite horizon Markov decision process with two-dimensional state and action spaces. This second dimension of the state and the action contains a similar type of information in which aggregation is both natural and simple. The quality of the approach is illustrated by an example.

...read moreread less

Journal Article•DOI•

Discount-isotone policies for Markov decision processes

[...]

D. J. White¹•Institutions (1)

University of Virginia¹

01 Mar 1988-Or Spektrum

TL;DR: In this article, the authors consider infinite horizon discounted Markov decision processes and conditions under which discount-isotone optimal policies exist and show that the induced partial ordering facilitates the solutions for higher discount factor levels.

...read moreread less

Abstract: This paper considers infinite horizon discounted Markov decision processes and conditions under which discount-isotone optimal policies exist. Given partial orders over the state and action spaces, a set of discount-isotone optimal policies is a set of optimal policies, one for each discount factor in a given set, such that, for each state, the optimal actions are partially ordered in such a manner as to match the ordering of the discount factors. It is easier to solve problems with small discount factors and the induced partial ordering facilitates the solutions for higher discount factor levels.

...read moreread less

Proceedings Article•DOI•

Adaptive Markov and semi-Markov decision processes: a weak contrast function approach

[...]

Rodolfo A. Milito¹, Jose B. Cruz•Institutions (1)

Bell Labs¹

07 Dec 1988

TL;DR: The authors first introduce a class of adaptive controllers for Markov chains that meet the challenge, under fairly relaxed conditions, based on the concept of weak contrast functions, an extension of P.L. Mandi's (1974) contrast functions.

...read moreread less

Abstract: The authors consider the control of dynamic systems modeled as a family of either Markov or semi-Markov processes, parameterized by an unknown parameter alpha which takes values in a given finite set A. The true parameter alpha /sup 0/, which represents the real system, belongs to A. The problem is to devise a control strategy that despite the initial ignorance of alpha /sup 0/ can still achieve the minimum long-run average cost. The authors first introduce a class of adaptive controllers for Markov chains that meet the challenge, under fairly relaxed conditions. These algorithms are based on the concept of weak contrast functions, an extension of P.L. Mandi's (1974) contrast functions. A class of adaptive controllers for semi-Markov processes is formulated based on the weak contrast function concept. The authors show that these controllers achieve the minimum long-run average cost and illustrate their application to the adaptive multilayer control of Markov chains. >

...read moreread less