scispace - formally typeset
Search or ask a question

Showing papers on "Markov decision process published in 1983"


Journal ArticleDOI
TL;DR: A survey of problems and methods contained in various works for continuous control, optimal stopping, and impulse control is given in this article, where the authors address the long-term average cost control of continuous time Markov processes.
Abstract: This paper addresses the long-term average cost control of continuous time Markov processes. A survey of problems and methods contained in various works is given for continuous control, optimal stopping, and impulse control.

78 citations


Journal ArticleDOI
TL;DR: This paper establishes the existence of a solution to the optimality equations in undis-counted semi-Markov decision models with countable state space, under conditions generalizing the hitherto obtained results.
Abstract: This paper establishes the existence of a solution to the optimality equations in undis-counted semi-Markov decision models with countable state space, under conditions generalizing the hitherto obtained results. In particular, we merely require the existence of a finite set of states in which every pair of states can reach each other via some stationary policy, instead of the traditional and restrictive assumption that every stationary policy has a single irreducible set of states. A replacement model and an inventory model illustrate why this extension is essential. Our approach differs fundamentally from classical approaches; we convert the optimality equations into a form suitable for the application of a fixed point theorem.

64 citations


Journal ArticleDOI
TL;DR: In this article, the sets of (Pareto) maximal returns and maximal policies for Markov decision processes with vector-valued returns are defined. But the results of these results hold only for the convex hull of returns of stationary policies.
Abstract: Dynamic programming models with vector-valued returns are investigated. The sets of (Pareto) maximal returns and (Pareto) maximal policies are defined. Monotonicity conditions are shown to be sufficient for the set of maximal policies to include a stationary policy, and for the set of maximal returns to be in the convex hull of returns of stationary policies. In particular, it is shown that these results hold for Markov decision processes.

61 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated the existence of stationary average optimal policies for Markov decision drift processes and derived sufficient conditions, which guarantee that a 'limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy.
Abstract: Recently the authors introduced the concept of Markov decision drift processes. A Markov decision drift process can be seen as a straightforward generalization of a Markov decision process with continuous time parameter. In this paper we investigate the existence of stationary average optimal policies for Markov decision drift processes. Using a well-known Abelian theorem we derive sufficient conditions, which guarantee that a 'limit point' of a sequence of discounted optimal policies with the discounting factor approaching 1 is an average optimal policy. An alternative set of sufficient conditions is obtained for the case in which the discounted optimal policies generate regenerative stochastic processes. The latter set of conditions is easier to verify in several applications. The results of this paper are also applicable to Markov decision processes with discrete or continuous time parameter and to semi-Markov decision processes. In this sense they generalize some wellknown results for Markov decision processes with finite or compact action space. Applications to an M/M/1 queueing model and a maintenance replacement model are given. It is shown that under certain conditions on the model parameters the average optimal policy for the M/M/1 queueing model is monotone non-decreasing (as a function of the number of waiting customers) with respect to the service intensity and monotone non-increasing with respect to the arrival intensity. For the maintenance replacement model we prove the average optimality of a bang-bang type policy. Special attention is paid to the computation of the optimal control parameters.

52 citations


Journal ArticleDOI
TL;DR: In this article, a stochastic network problem that includes interconnected queues is described and studied within the framework of controlled Markov chains with average cost criterion and with special cost and transition structures.
Abstract: Controlled Markov chains with average cost criterion and with special cost and transition structures are studied. Existence of optimal stationary strategies is established for the average cost criterion. Corresponding dynamic programming equations are derived. A stochastic network problem that includes interconnected queues as a special case is described and studied within this framework.

41 citations


Journal ArticleDOI
TL;DR: In this article, the authors study Markov jump decision processes with both continuously and instantaneously acting decisions and with deterministic drift between jumps, and obtain necessary and sufficient optimality conditions for these decision processes in terms of equations and inequalities of quasi-variational type.
Abstract: We study Markov jump decision processes with both continuously and instantaneouslyacting decisions and with deterministic drift between jumps. Such decision processes were recentlyintroduced and studied from discrete time approximations point of view by Van der Duyn Schouten.Weobtain necessary and sufficient optimality conditions for these decision processes in terms of equations and inequalities of quasi-variational type. By means of the latter we find simple necessaryand sufficient conditions for the existence of stationary optimal policies in such processes with finite state and action spaces, both in the discounted and average per unit time reward cases.

35 citations



Journal ArticleDOI
TL;DR: This paper describes a computational comparison of value iteration algorithms for discounted Markov decision processes and concludes that the current state-of-the-art approaches to solving these problems are unsatisfactory.

17 citations


Book ChapterDOI
01 Jan 1983
TL;DR: A Markov decision modeling approach to the solution of a large-scale multi-objective problem--the maintenance of a statewide network of roads, which integrates management objectives for public safety and comfort with State and Federal budgetary policies and engineering considerations.
Abstract: This article discusses a Markov decision modeling approach to the solution of a large-scale multi-objective problem--the maintenance of a statewide network of roads. This approach integrates management objectives for public safety and comfort, and preservation of the considerable investment in highways with State and Federal budgetary policies and engineering considerations. The Markov decision model captures the dynamic and probabilistic aspects of the maintenance problem and considers the influence of environmental factors, the type of roads, traffic densities and various engineering factors influencing road deterioration. The model recommends the best maintenance action for each mile of the network of highways, and specifies the minimum funds required to carry out the maintenance program.

13 citations


Journal ArticleDOI
David J. Soukup1
TL;DR: In this article, the authors explore the strategies of fund-raising for a nonprofit organization to maximize net income by modeling the giving pattern of members as a Markov chain and evaluate the alternatives.
Abstract: The author explores the strategies of fund-raising for a nonprofit organization to maximize net income. The giving pattern of members is modeled as a Markov chain. The alternatives are evaluated us...

13 citations


Book
01 Jun 1983

ReportDOI
01 Mar 1983
TL;DR: In this article, the authors give an overview of some recent developments in optimal stochastic control theory, and discuss the case of continuously acting control, in which at each time t a control u sub t is applied to the system.
Abstract: : The purpose of this article is to give an overview of some recent developments in optimal stochastic control theory. Broadly speaking, stochastic control theory deals with models of systems whose evolution is affected both by certain random influences and also by certain inputs chosen by a controller. The authors are concerned here only with state-space formulations of control problems in continuous time. Moreover, the authors consider only markovian control problems in which the state x sub t of the process being controlled is Markov provided the controller follows a Markov control policy. They mainly discuss the case of continuously acting control, in which at each time t a control u sub t is applied to the system.

Book ChapterDOI
01 Jan 1983
TL;DR: In this article, the authors proved that for any total reward countable state Markov decision process, there exists a Markov strategy IT which is uniformly nearly-optimal in the following sense: v(i,π), ≥ v*(i) − e − eu *(i), for any initial state i.
Abstract: In this paper the following result is proved. In any total reward countable state Markov decision process a Markov strategy IT exists which is uniformly nearly-optimal in the following sense: v(i,π,) ≥ v*(i) − e − eu*(i) for any initial state i. Here v* denotes the value function of the process and u* denotes the value of the process if all negative rewards are neglected.

Journal ArticleDOI
TL;DR: In this article, an adaptive policy and a learning policy for Markov Decision Processes with uncertain transition matrices are defined and a non-Bayesian analysis of this model is studied and an optimal adaptive policy is constructed.
Abstract: This study is concerned with Markov Decision Processes with uncertain transition matrices. In the discounted case, the Bayesian analysis of this model is studied.We define an adaptive policy and a learning policy and show that there exists, for any ???> 0 an ???-optimal and learning policy. In the average case, the non-Bayesian analysis of this model is studied and an optimal adaptive policy is constructed.

Journal ArticleDOI
TL;DR: In this article, the authors consider continuous-time Markov decision processes where decisions can be made at any time and show that there exists a monotone optimal policy among all the regular policies.
Abstract: By considering continuous-time Markov decision processes where decisions can be made at any time, we show in the case of M/M /1 queues with discounted costs that there exists a monotone optimal policy among all the regular policies.

Book ChapterDOI
01 Jan 1983
TL;DR: Operational planning in a general purpose ship terminal is treated and a check simulation is used, which leads to an iterative aggregation-disaggregation approach.
Abstract: Operational planning in a general purpose ship terminal is treated. The decisions to be taken concern the weekly manpower capacity and the assignment of manpower and equipment to ships. As a Markov decision problem the model is very big and aggregation is desirable. As a check simulation is used, which leads to an iterative aggregation-disaggregation approach.

Proceedings ArticleDOI
01 Dec 1983
TL;DR: A discrete-time model is presented for a system of two queues that compete for the service attention of a single server with infinite buffer capacity and a fixed prioritization scheme is shown to be optimal when the expected long-run average criterion and the expected discounted criterion are used.
Abstract: A discrete-time model is presented for a system of two queues that compete for the service attention of a single server with infinite buffer capacity. The arrivals are modelled by an i.i.d, random sequence of a general type while the service completions are generated by independent Bernou lli streams, and the allocation of service attention is governed by feedback policies which are based on past decisions and buffer content histories. The cost of operation per unit time is a linear function of the queue sizes. Under the model assumptions, a fixed prioritization scheme, known as the µc-rule, is shown, to be optimal when the expected long-run average criterion and the expected discounted criterion, over both finite and infinite horizons, are used. The analysis is based on the Dynamic Programming methodology for Markov decision processes and takes advantage of the sample path properties of the adopted state-space model.

Journal ArticleDOI
TL;DR: In this paper, an operational method for solving dynamic programs can be used, in some cases, to solve the problem of maximizing a firm's market value, formulated as a Markov decision problem that can be solved via linear programming.
Abstract: This paper shows how an operational method for solving dynamic programs can be used, in some cases, to solve the problem of maximizing a firm's market value. The problem is formulated as a Markov decision problem that can be solved via linear programming. The paper shows how to calculate or estimate the state-contingent prices that are used to value the firm. In addition, the paper points out how states can be aggregated to make the solution technique more practical. The paper's final section contains a specific example.

Journal ArticleDOI
TL;DR: In this article, the authors consider how partially observable Markov decision processes may be transformed into piecewise linear ones, which have many advantages in that they are easily represented in a computer.

Proceedings ArticleDOI
29 Aug 1983
TL;DR: A method based upon policy iteration technique for Markov decision processes is used to obtain the optimal delayed resolution policy from the class of stationary delayed resolution policies for given values of the parameters.
Abstract: A class of policies called stationary delayed resolution policies have been proposed recently for sharing finite number of buffers at a store-and-forward node in a message switching network [9]. It has been shown that with respect to the total weighted throughput these policies comprise the optimal class of policies. In this paper, we present methods to obtain an optimal policy from the class of stationary delayed resolution policies for given values of the parameters.A method based upon policy iteration technique for Markov decision processes is used to obtain the optimal delayed resolution policy. It is shown that the policy iteration technique while useful in obtaining the exact optimal policy becomes intractable for practical values of buffer sizes and number of message classes. A class of policies called SRS delayed resolution policies is proposed. It is shown that the best SRS delayed resolution policies closely approximate the performance of the optimal delayed resolution policies.

Journal ArticleDOI
TL;DR: In this paper, the LP formulation for an undiscounted multi-chain Markov decision problem can be put in a block upper-triangular form by a polynomial time procedure.

01 Jan 1983
TL;DR: In this paper, it was shown that for the positive and gambling cases such strategies cannot be constructed by simply switching to a "better" action or gamble at each successive return to a state.
Abstract: In every finite-state leavable gambling problem and in every finite-state Markov decision process with discounted, negative or positive reward criteria there exists a Markov strategy which is monotonically improving and optimal in the limit along every history. An example is given to show that for the positive and gambling cases such strategies cannot be constructed by simply switching to a "better" action or gamble at each successive return to a state. Key words and phrases: gambling problem, Markov decision process, strategy, stationary strategy, monotonically improving strategy, limit-optimal strategy.

Book ChapterDOI
01 Jan 1983
TL;DR: In this article, a simplex-like algorithm was proposed to find a Blackwell-optimal policy for a fixed ρ near enough to 0 by considering the problem not in the field of real numbers but in that of rational functions in ρ.
Abstract: In a finite Markov decision process a ρ-optimal policy (ρ being the interest rate) can be found for fixed ρ by solving a linear programming problem. We solve such problems simultaneously for all ρ near enough to 0 by considering the problem not in the field of real numbers but in that of rational functions in ρ and applying a simplex-like algorithm in that field. This finite algorithm produces both a Blackwell-optimal policy, its total discounted reward as rational function in ρ and also the interval (0, ρ 0] in which that policy is ρ-optimal.

01 Jan 1983
TL;DR: It is demonstrated that the relation is very close, since numerical possibilities strongly depend on the structure of the model, even for straightforward numerical techniques, but also for numerical analysis based on aggregation and/or decomposition.
Abstract: The main topic of the paper is the relation between modelling and numerical analysis for Markov decision processes. It is demonstrated that the relation is very close, since numerical possibilities strongly depend on the structure of the model. This is even true for straightforward numerical techniques, but also for numerical analysis based on aggregation and/or decomposition. Examples amplify the arguments.


Journal ArticleDOI
TL;DR: In this article, sufficient conditions for certain functions to be convex are presented, where a convex function takes its maximum at an extreme point, and the conditions may greatly simplify a problem.
Abstract: The paper presents sufficient conditions for certain functions to be convex. Functions of this type often appear in Markov decision processes, where their maximum is the solution of the problem. Since a convex function takes its maximum at an extreme point, the conditions may greatly simplify a problem. In some cases a full solution may be obtained after the reduction is made. Some illustrative examples are discussed. OPTIMAL POLICY; CONVEX FUNCTION

Book ChapterDOI
01 Jan 1983
TL;DR: A finite semi-Markov decision process is studied to maximize the expected average reward and convergence results are stated in the form of theorems and some examples are given.
Abstract: A finite semi-Markov decision process is studied to maximize the expected average reward. The semi-Markov kernel of the process depends on an unknown parameter taking values in a subset [a, b] of ℝS. A controller modelled as a learning automaton updates sequentially the probabilities of generating decisions based on the observed decisions, states, and jump times. Convergence results are stated in the form of theorems and some examples are given.