Showing papers on "Markov decision process published in 1986"

PDF

Open Access

Posted Content•

Structural estimation of markov decision processes

[...]

John Rust

01 Jan 1986-Research Papers in Economics

499 citations

Journal Article•DOI•

Decentralized learning in finite Markov chains

[...]

Richard M. Wheeler¹, Kumpati S. Narendra²•Institutions (2)

Sandia National Laboratories¹, Yale University²

01 Jun 1986-IEEE Transactions on Automatic Control

TL;DR: It is shown that, if updating is done in sufficiently small steps, the group will converge to the policy that maximizes the long-term expected reward per step.

...read moreread less

Abstract: The principal contribution of this paper is a new result on the decentralized control of finite Markov chains with unknown transition probabilities and rewords. One decentralized decision maker is associated with each state in which two or more actions (decisions) are available. Each decision maker uses a simple learning scheme, requiring minimal information, to update its action choice. It is shown that, if updating is done in sufficiently small steps, the group will converge to the policy that maximizes the long-term expected reward per step. The analysis is based on learning in sequential stochastic games and on certain properties, derived in this paper, of ergodic Markov chains. A new result on convergence in identical payoff games with a unique equilibrium point is also presented.

...read moreread less

102 citations

Journal Article•DOI•

Time-average optimal constrained semi-Markov decision processes

[...]

Frederick J. Beutler, Keith W. Ross

01 Jun 1986-Advances in Applied Probability

TL;DR: In this paper, a Lagrange multiplier formulation involving a dynamic programming equation is utilized to relate the constrained optimization to an unconstrained optimization parametrized by the multiplier, leading to a proof for the existence of a semi-simple optimal constrained policy.

...read moreread less

Abstract: Optimal causal policies maximizing the time-average reward over a semiMarkov decision process (SMDP), subject to a hard constraint on a timeaverage cost, are considered. Rewards and costs depend on the state and action, and contain running as well as switching components. It is supposed that the state space of the SMDP is finite, and the action space compact metric. The policy determines an action at each transition point of the SMDP. Under an accessibility hypothesis, several notions of time average are equivalent. A Lagrange multiplier formulation involving a dynamic programming equation is utilized to relate the constrained optimization to an unconstrained optimization parametrized by the multiplier. This approach leads to a proof for the existence of a semi-simple optimal constrained policy. That is, there is at most one state for which the action is randomized between two possibilities; at all other states, an action is uniquely chosen for each state. Affine forms for the rewards, costs and transition probabilities further reduce the optimal constrained policy to 'almost bang-bang' form, in which the optimal policy is not randomized, and is bang-bang except perhaps at one state. Under the same assumptions, one can alternatively find an optimal constrained policy that is strictly bang-bang, but may be randomized at one state. Application is made to flow control of a birth-and-death process (e.g., an MIMIs queue); under certain monotonicity restrictions on the reward and cost structure the preceding results apply, and in addition there is a simple acceptance region.

...read moreread less

84 citations

Journal Article•DOI•

Optimization of Real-Time Reservoir Operations With Markov Decision Processes

[...]

Dapei Wang, Barry J. Adams

01 Mar 1986-Water Resources Research

TL;DR: In this article, a two-stage optimization framework, which consists of a real-time model followed by a steady state model, is proposed, which is optimized with the generalized policy iteration procedure.

...read moreread less

Abstract: In recognition of hydrologic uncertainty and seasonality, reservoir inflows are described as periodic Markov processes. The optimization of reservoir operations involves determination of the optimal release volumes in the successive time periods so that the expected total rewards resulting from the operations are maximized. A two-stage optimization framework, which consists of a real time model followed by a steady state model, is proposed. The steady state model that describes the convergent nature of the prospective future operations is regarded as a periodic Markov decision process and is optimized with the generalized policy iteration procedure. This result is in turn used as an interim step for deriving the optimal immediate decisions for the current period in the real-time model. Significant computational efficiency results from this framework and the respective optimization procedure.

...read moreread less

75 citations

Proceedings Article•DOI•

Estimation and optimal control for constrained Markov chains

[...]

Dye-Jyun Ma¹, Armand M. Makowski¹, Adam Shwartz²•Institutions (2)

University of Maryland, College Park¹, Technion – Israel Institute of Technology²

01 Dec 1986

TL;DR: The (optimal) design of many engineering systems can be adequately recast as a Markov decision process, where requirements on system performance are captured in the form of constraints.

...read moreread less

Abstract: The (optimal) design of many engineering systems can be adequately recast as a Markov decision process, where requirements on system performance are captured in the form of constraints. In this paper, various optimality results for constrained Markov decision processes are briefly reviewed; the corresponding implementation issues are discussed and shown to lead to several problems of parameter estimation. Simple situations where such constrained problems naturally arise, are presented in the context of queueing systems, in order to illustrate various points of the theory. In each case, the structure of the optimal policy is exhibited.

...read moreread less

65 citations

Journal Article•DOI•

A new condition for the existence of optimal stationary policies in average cost Markov decision processes

[...]

Linn I. Sennott¹•Institutions (1)

Illinois State University¹

01 Jun 1986-Operations Research Letters

TL;DR: In this paper, a new form of the optimality equation is derived for the case in which every stationary policy gives rise to an ergodic Markov chain, and conditions are given under which an unbounded solution to the average cost optimalality equation exists and yields an optimal stationary policy.

...read moreread less

50 citations

Journal Article•DOI•

Linear Programming for Finite State Multi-Armed Bandit Problems

[...]

Yih Ren Chen¹, Michael N. Katehakis¹•Institutions (1)

Stony Brook University¹

01 Feb 1986-Mathematics of Operations Research

TL;DR: It is shown that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods.

...read moreread less

Abstract: We consider the multi-armed bandit problem. We show that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods.

...read moreread less

48 citations

Journal Article•DOI•

Finite-state approximations for denumerable state discounted markov decision processes

[...]

Rolando Cavazos-Cadena

01 Apr 1986-Applied Mathematics and Optimization

TL;DR: Convergence theorems that, when applied to the case of bounded rewards, give stronger results than those in [9] are proved and bounds on the rates of convergence under several assumptions are given.

...read moreread less

Abstract: A finite-state iterative scheme introduced by White [9] to approximate the optimal value function of denumerable-state Markov decision processes with bounded rewards, is extended to the case of unbounded rewards. Convergence theorems that, when applied to the case of bounded rewards, give stronger results than those in [9] are proved. Moreover, bounds on the rates of convergence under several assumptions are given and the extended scheme is used to obtain policies with asymptotic optimality properties.

...read moreread less

37 citations

Journal Article•DOI•

Parameter Imprecision in Finite State, Finite Action Dynamic Programs

[...]

Chelsea C. White¹, Hany K. El-Deib¹•Institutions (1)

University of Virginia¹

01 Jan 1986-Operations Research

TL;DR: This work examines a finite state, finite action dynamic program having a one-step transition value-function that is affine in an imprecisely known parameter and presents conditions that guarantee the existence of a parameter-independent strategy that maximizes the minimum value of its expected reward function over all possible parameter values.

...read moreread less

Abstract: In order to model parameter imprecision associated with a problem's reward or preference structure, we examine a finite state, finite action dynamic program having a one-step transition value-function that is affine in an imprecisely known parameter. For the finite horizon case, we also assume that the terminal value function is affine in the imprecise parameter. We assume that the parameter of interest has no dynamics, no new information about its value is received once the decision process begins, and its imprecision is described by set inclusion. We seek the set of all parameter-independent strategies that are optimal for some value of the imprecisely known parameter. We present a successive approximations procedure for solving the finite horizon case and a policy iteration procedure for determining the solution of the discounted infinite horizon case. These algorithms are then applied to a decision analysis problem with imprecise utility function and to a Markov decision process with imprecise reward structure. We also present conditions that guarantee the existence of a parameter-independent strategy that maximizes, with respect to all other parameter invariant strategies, the minimum value of its expected reward function over all possible parameter values.

...read moreread less

35 citations

Journal Article•DOI•

Successive Approximations for Finite Horizon, Semi-Markov Decision Processes with Application to Asset Liquidation

[...]

John W. Mamer¹•Institutions (1)

University of California, Los Angeles¹

01 Aug 1986-Operations Research

TL;DR: This paper presents a simple successive approximation approach to the characterization of optimal policies for finite horizon, semi-Markov decision processes by analyzing the optimal liquidation of an asset and shows that several aspects of the standard, discrete-time, infinite horizon optimal policy carry over to the continuous- time, finite horizon policy.

...read moreread less

Abstract: This paper presents a simple successive approximation approach to the characterization of optimal policies for finite horizon, semi-Markov decision processes. Optimal policies are nonstationary, for in this setting they depend on both time and state. We illustrate this approach by analyzing the optimal liquidation of an asset; we also show that several aspects of the standard, discrete-time, infinite horizon optimal policy carry over to the continuous-time, finite horizon policy.

...read moreread less

34 citations

Journal Article•DOI•

Forward Recursion for Markov Decision Processes with Skip-Free-to-the-Right Transitions, Part I: Theory and Algorithm

[...]

J Wijngaard¹, Shaler Stidham²•Institutions (2)

Eindhoven University of Technology¹, North Carolina State University²

01 May 1986-Mathematics of Operations Research

TL;DR: An efficient algorithm is developed which suggests a method for approximating g* and an associated average-return optimal policy and can be applied to some special cases, such as control of arrivals to a queue, control of the service rate, and controlled random walks.

...read moreread less

Abstract: We consider a Markovian decision process with countable state space states 0, 1, 2,... which is skip-free to the right a transition from i to j is impossible if j >i + 1. In this type of system it is easy to calculate by forward recursion the maximal total expected reward going from state 0 to state i; the same can be done, of course, for the case where a constant g is subtracted from the one-period reward function g-revised reward. Let -wgi be the maximal total expected g-revised reward going from state 0 to state i. We show that wg· satisfies the average-reward optimality equation. If wg· satisfies a growth condition, then g = g*, the maximal average reward. For all other g, the function wg increases or decreases so fast that this cannot be the case. Thus, in principle the solution wg can be used to check if g g*, which suggests a method for approximating g* and an associated average-return optimal policy. We develop an efficient algorithm based on this idea. In a companion paper we shall show how the algorithm, or modifications of it, can be applied to some special cases, such as control of arrivals to a queue, control of the service rate, and controlled random walks.

...read moreread less

Journal Article•DOI•

Markov Decision Processes with a Borel Measurable Cost Function---The Average Case

[...]

Masami Kurano¹•Institutions (1)

Chiba University¹

01 May 1986-Mathematics of Operations Research

TL;DR: The validity of the optimality equation and the existence of e-optimal stationary policies are proved by use of this method, and a p-step contraction property for the average cost case is introduced.

...read moreread less

Abstract: We consider a Markov decision process with a Borel measurable cost function. We introduce a p-step contraction property for the average cost case. By use of this method, the validity of the optimality equation and the existence of e-optimal stationary policies are proved. As some applications, the sequential replacement model and the inventory model are considered.

...read moreread less

Book Chapter•DOI•

On the computation of equilibria in discounted stochastic dynamic games

[...]

Michèle Breton¹, Alain Haurie¹, Jerzy A. Filar²•Institutions (2)

École Normale Supérieure¹, Johns Hopkins University²

01 Jun 1986-Journal of Economic Dynamics and Control

TL;DR: Various algorithms for numerical solutions of discounted stochastic games and a new mathematical programming formulation which permits the numerical solution of a game by using a non-linear programming code is presented.

...read moreread less

Book•

Management Science: An Introduction

[...]

Patrick G. McKeown, Terry R. Rakes, K. Roscoe Davis

01 Jan 1986

TL;DR: This book discusses linear programming, game theory, and decision making in the context of management science with a focus on dynamic programming.

...read moreread less

Abstract: Introduction to management science. Mathematical review. Breakeven analysis. Forecasting. Introduction to linear programming. Linear programming. Model formulations. LP simplex method. Sensitivity analysis and duality. PERT/CPM. Transportation and assignment models. Other network models. Goal programming. Integer programming. Inventory models. Probability review. Decision making. Decision mwdels. Markov processes. Game theory. Queuing analysis: waiting-line problems. Simulation. Dynamic programming. Calculus review. Non linear models. Implementation.

...read moreread less

Proceedings Article•DOI•

A new condition for the existence of optimum stationary policies in average cost Markov decision processes - Unbounded cost case

[...]

Linn I. Sennott¹•Institutions (1)

Illinois State University¹

01 Dec 1986

Journal Article•DOI•

Finite-state approximations for denumerable multidimensional state discounted Markov decision processes

[...]

Onésimo Hernández-Lerma

01 Feb 1986-Journal of Mathematical Analysis and Applications

TL;DR: In this paper, Hartley et al. extended the finite-state iterative scheme introduced by White to approximate the value function of denumerable-state Markov decision processes to denumerable multidimensional state space.

...read moreread less

Journal Article•DOI•

Policy Bounds for Markov Decision Processes

[...]

William S. Lovejoy¹•Institutions (1)

Georgia Institute of Technology¹

01 Jul 1986-Operations Research

TL;DR: This paper demonstrates how a Markov decision process MDP can be approximated to generate a policy bound, i.e., a function that bounds the optimal policy from below or from above for all states.

...read moreread less

Abstract: This paper demonstrates how a Markov decision process MDP can be approximated to generate a policy bound, i.e., a function that bounds the optimal policy from below or from above for all states. We present sufficient conditions for several computationally attractive approximations to generate rigorous policy bounds. These approximations include approximating the optimal value function, replacing the original MDP with a separable approximate MDP, and approximating a stochastic MDP with its deterministic counterpart. An example from the field of fisheries management demonstrates the practical applicability of the results.

...read moreread less

Journal Article•DOI•

Linear programming formulations of Markov decision processes

[...]

John Lawrence Nazareth, R. B. Kulkarni¹•Institutions (1)

Woodward, Inc.¹

01 Jun 1986-Operations Research Letters

TL;DR: In this article, a new derivation of the linear program corresponding to a Markov Decision Process (MDP) in steady state, which seeks to minimize discounted total expected cost, is given.

...read moreread less

Journal Article•DOI•

Optimal hop-by-hop flow control in computer networks

[...]

Zvi Rosberg¹, I. Gopal•Institutions (1)

IBM¹

01 Sep 1986-IEEE Transactions on Automatic Control

TL;DR: The optimal control of hop-by-hop flow control in a computer network is shown to be a linear truncated function of the state and the explicit form is found when the arrival process of the messages is a Bernoulli process.

...read moreread less

Abstract: The problem of hop-by-hop flow control in a computer network is formulated as a Markov decision process with a cost function composed of the delay of the messages and the buffer constraints. The optimal control is shown to be a linear truncated function of the state and the explicit form is found when the arrival process of the messages is a Bernoulli process. For a renewal arrival process, the long-rnn average cost of any policy with a linear truncated structure is expressed by a set of linear equations.

...read moreread less

Journal Article•

Multiobjective Markov decision process with average reward criterion

[...]

S. Durinovic¹, Heuy-Miin Lee¹, Michael N. Katehakis¹, Jerzy A. Filar¹•Institutions (1)

University of Zagreb¹

01 Jun 1986-Large scale systems

TL;DR: In this article, the relation entre politiques efficaces dans un processus de decision de Markov multibojectif and les points efficacesdans un programme lineaire multiobjectif relie

...read moreread less

Abstract: On cherche a clarifier la relation entre politiques efficaces dans un processus de decision de Markov multibojectif et les points efficaces dans un programme lineaire multiobjectif relie

...read moreread less

Book Chapter•DOI•

Markov decision drift processes

[...]

Frank A. Van der Duyn Schouten

01 Jan 1986

TL;DR: In Markov decision theory, discrete-time MarkOV decision processes are distinguished from semi-Markov decision processes, which are continuous time decision processes.

...read moreread less

Abstract: In Markov decision theory we distinguish (a) discrete-time Markov decision processes (b) semi-Markov decision processes (c) continuous time Markov decision processes.

...read moreread less

Book Chapter•DOI•

Markov decision processes with both continuous and impulsive control

[...]

A. A. Yushkevich

01 Jan 1986

Journal Article•DOI•

A Planning and Decision-Aiding Procedure for Purchasing and Launching Spacecraft

[...]

William T. Scherer, Chelsea C. White

01 Jun 1986-Interfaces

TL;DR: In this article, a planning and decision support system (PDSS) was developed to determine the number of communications satellites to purchase and the timing of the these purchases for INTELSAT was complicated by the length of time necessary to manufacture a spacecraft, the potential for spacecraft failure, uncertain future costs and capacity requirements, multiple, conflicting, and noncommensurate objectives, and various exogenous factors.

...read moreread less

Abstract: Developing a planning and decision support system (PDSS) to determine the number of spacecraft (communications satellites) to purchase and the timing of the these purchases for INTELSAT was complicated by the length of time necessary to manufacture a spacecraft, the potential for spacecraft failure, uncertain future costs and capacity requirements, multiple, conflicting, and noncommensurate objectives, and various exogenous factors. The PDSS uses the simulation of a large Markov decision process (MDP) to evaluate purchase strategies generated by (1) experts, (2) heuristic procedures, and (3) the solution of an aggregated version of the MDP, thus integrating knowledge engineering and formal reasoning approaches to decision aiding and problem solving.

...read moreread less

Journal Article•DOI•

Applications of O.R. in China

[...]

Cheng Kan¹•Institutions (1)

Academia Sinica¹

01 Feb 1986-Journal of the Operational Research Society

TL;DR: This paper gives a brief description of recent O.R. activity in China in four parts: mathematical programming; queueing theory and Markov decision processes; reliability theory; simulation.

...read moreread less

Abstract: This paper gives a brief description of recent O.R. activity in China. It consists of four parts: mathematical programming; queueing theory and Markov decision processes; reliability theory; simulation. Emphasis is placed on the current situation of practical O.R.

...read moreread less

Journal Article•DOI•

Fixed point theorems for discounted finite Markov decision processes

[...]

Ulrich Holzbaur¹•Institutions (1)

University of Ulm¹

01 Jun 1986-Journal of Mathematical Analysis and Applications

TL;DR: In this article, the existence of a solution to the optimality equation for discounted finite Markov decision processes by means of Birkhoff's fixed point theorem was established, and the proof yields the well-known linear programming formulation for the optimal value function.

...read moreread less

Journal Article•DOI•

Computational comparison of policy iteration algorithms for discounted Markov decision processes

[...]

R Hartley¹, A C Lavercombe, Lyn C. Thomas²•Institutions (2)

University of Manchester¹, University of Edinburgh²

01 Apr 1986-Computers & Operations Research

TL;DR: A computational comparison of the policy iteration algorithms for solving discounted Markov decision processes is described, examining the different forms of iterations, reordering, extrapolation and action elimination.

...read moreread less

Journal Article•DOI•

Policy structure for discrete time Markov chain disorder problems

[...]

Richard L. Marcellus¹•Institutions (1)

University of Illinois at Chicago¹

01 Aug 1986-European Journal of Operational Research

TL;DR: In this paper, a structural property for policies, the likelihood consistency property, was introduced for partially observed Markov decision problems, where the decision maker must formulate a policy of response to an unobservable transition to an undesirable state.

...read moreread less

Journal Article•DOI•

Variational characterizations in Markov decision processes

[...]

Awi Federgruen¹, P. J. Schweitzer²•Institutions (2)

Columbia University¹, Saint Petersburg State University²

01 Aug 1986-Journal of Mathematical Analysis and Applications

TL;DR: In this paper, the authors derived bounds and variational characterizations for the solutions of variational Markov decision processes, and used them to measure the deviation of the current solution from optimality.

...read moreread less

Error bounds for rolling horizon procedures in Markov decision process

[...]

Jeffrey M. Alden, Robert L. Smith

01 Jan 1986

Dissertation•

Stochastic optimal control for piecewise deterministic Markov processes

[...]

Ribeiro do Val, Joao Bosco

01 Jan 1986