scispace - formally typeset
Search or ask a question

Showing papers on "Markov decision process published in 1984"


Journal ArticleDOI
TL;DR: Stochastic calculus for these stochastic processes is developed and a complete characterization of the extended generator is given; this is the main technical result of the paper.
Abstract: A general class of non-diffusion stochastic models is introduced with a view to providing a framework for studying optimization problems arising in queueing systems, inventory theory, resource allocation and other areas. The corresponding stochastic processes are Markov processes consisting of a mixture of deterministic motion and random jumps. Stochastic calculus for these processes is developed and a complete characterization of the extended generator is given; this is the main technical result of the paper. The relevance of the extended generator concept in applied problems is discussed and some recent results on optimal control of piecewise-deterministic processes are described.

954 citations


Journal ArticleDOI
TL;DR: In this paper, the optimal control of a Markov network with two service stations and linear cost is studied and optimal switching curves described by switching curves in the two-dimensional state space are shown to exist.
Abstract: Optimal controls described by switching curves in the two-dimensional state space are shown to exist for the optimal control of a Markov network with two service stations and linear cost. The controls govern routing and service priorities. Finite horizon and long run average cost problems are considered and value iteration is a key tool. Nonconvex value functions are shown to exist for slightly more general networks. Nonconvex value functions are also shown to arise for a simple single station control problem in which the instantaneous cost is convex but not monotone. Nevertheless, optimality of threshold policies is established for the single station problem. The proof is based on a novel use of stochastic coupling and policy iteration.

309 citations


Journal ArticleDOI
TL;DR: In this paper, it is shown that the optimal decision is characterized by thresholds as in the decoupled case, however, the thresholds are time-varying and their computation requires the solution of two coupled sets of dynamic programming equations.
Abstract: Two detectors making independent observations must decide when a Markov chain jumps from state 0 to state 1. The decisions are coupled through a common cost function. It is shown that the optimal decision is characterized by thresholds as in the decoupled case. However, the thresholds are time-varying and their computation requires the solution of two coupled sets of dynamic programming equations. A comparison to the decoupled case shows the structure of the coupling.

60 citations


Journal ArticleDOI
TL;DR: A constructive proof of the existence of optimal policies among all policies under new cumulative average optimality criteria which are more sensitive than the maximization of the spectral radius is given.
Abstract: Previous treatments of multiplicative Markov decision chains eg., Bellman [Bellman, R. 1957. Dynamic Programming. Princeton University Press, Princeton, New Jersey.], Mandl [Mandl, P. 1967. An iterative method for maximizing the characteristic root of positive matrices. Rev. Roumaine Math. Pures Appl.XII 1317--1322.], and Howard and Matheson [Howard, R. A., Matheson, J. E. 1972. Risk-sensitive Markov decision processes. Management Sci.8 356--369.] restricted attention to stationary policies and assumed that all transition matrices are irreducible and aperiodic. They also used a “first term” optimality criterion, namely maximizing the spectral radius of the associated transition matrix. We give a constructive proof of the existence of optimal policies among all policies under new cumulative average optimality criteria which are more sensitive than the maximization of the spectral radius. The algorithm for finding an optimal policy, first searches for a stationary policy with a nonnilpotent transition matrix, provided such a rule exists. Otherwise, the method still finds an optimal policy; though in this case the set of optimal policies usually does not contain a stationary policy! If a stationary policy with a nonnilpotent transition matrix exists, then we develop a policy improvement algorithm which finds a stationary optimal policy.

55 citations


Journal ArticleDOI
TL;DR: In this article, the authors present the foundations of the theory of nonhomogeneous Markov processes in general state spaces, and give a survey of the fundamental papers in this topic.
Abstract: We present the foundations of the theory of nonhomogeneous Markov processes in general state spaces and we give a survey of the fundamental papers in this topic. We consider the following questions: 1. The existence of transition functions for a Markov process. 2. The construction of regularization of processes. 3. The properties of right and left processes: the strict Markov property, the behavior of excessive functions, etc. 4. The relation of right and left processes with dual homogeneous processes and the application of the results of the nonhomogeneous theory to dual homogeneous processes, etc.

21 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: A connected speech recognition system based on Markov models is described and the performance was analyzed and compared with that of a system which uses prototypes of words instead of Markov model.
Abstract: This paper describes a connected speech recognition system based on Markov models. The performance of this system was analyzed and compared with that of a system which uses prototypes of words instead of Markov models. Some preliminary results are reported with reference to the recognition of connected digits.

21 citations


Journal ArticleDOI
TL;DR: In this paper, a state-space model is presented for a queueing system where two classes of customer compete in discrete-time for the service attention of a single server with infinite buffer capacity.
Abstract: A state-space model is presented for a queueing system where two classes of customer compete in discrete-time for the service attention of a single server with infinite buffer capacity. The arrivals are modelled by an independent identically distributed random sequence of a general type while the service completions are generated by independent Bernoulli streams; the allocation of service attention is governed by feedback policies which are based on past decisions and buffer content histories. The cost of operation per unit time is a linear function of the queue sizes. Under the model assumptions, a fixed prioritization scheme, known as the μ c -rule, is shown to be optimal when the expected long-run average criterion and the expected discounted criterion, over both finite and infinite horizons, are used. This static prioritization of the two classes of customers is done solely on the basis of service and cost parameters. The analysis is based on the dynamic programming methodology for Markov decision processes and takes advantage of the sample-path properties of the adopted state-space model.

21 citations


Proceedings ArticleDOI
01 Dec 1984
TL;DR: A general dynamic programming operator model that includes, but is not restricted to, optimization problems is proposed that is inclusive of sequential nonextremization problems.
Abstract: Several authors (Denardo [61 Karp and Held [18], and Bertsekas [3]) have proposed abstract dynamic programming models encompassing a wide variety of sequential optimization problems. The unifying purpose of these models is to impose sufficient conditions on the recursive defimtion of the objective function to guarantee the validity of the solution of the optimization problem by a dynamic programming iteration. In this paper we propose a general dynamic programming operator model that includes, but is not restricted to, optimization problems. Any functional satisfying a certain commutativity condition (which reduces to the principle of optimality in extrermzation problems see Section 2, B2) a-ith the generating operator of the objective recursive function, results in a sequential problem solvable by a dynamic programming iteration. Examples of sequential nonextremization problems fitting t h s framework are the derivation of marginal distributions in arbitrary probability spaces, iterative computation of stageseparated functions defined on general algebra~c systems such as additive commutative semi-groups with distributlve products, generation of symbolic transfer functions, and the Chapman-Kolmogorov equations.

14 citations



Journal ArticleDOI
TL;DR: In this paper, the applicability of Markov maintenance models is discussed and a model for filling the gap between theoretical and practical maintenance problems is proposed for the purpose of filling the gaps between the two domains.
Abstract: The applicability of Markov maintenance models is crucial. We need to fill the gap between theoretical and practical maintenance problems. A model is proposed for that purpose.

13 citations


Journal ArticleDOI
TL;DR: This paper deals with total reward Markov decision processes with countable state space and establishes that if an optimal strategy exists then also an optimal stationary strategy exists.
Abstract: This paper deals with total reward Markov decision processes with countable state space. Various partial results from the literature are connected and extended in the following theorem. If in each state where the value is nonpositive a conserving action exists then there exists a stationary strategy f which is uniformly nearly optimal in the following sense: vf ≥ v*-eu*, where u* is the value of the problem if only the positive rewards are counted. Further, the following result is established: if an optimal strategy exists then also an optimal stationary strategy exists.

Book ChapterDOI
01 Jan 1984
TL;DR: The problem considered is non-linear filtering of Gaussian observations of a Markov jump-diffusion with an embedded Markov chain, that is described by stochastic differential equations driven by Brownian motions and a random Poisson measure.
Abstract: The problem considered is non-linear filtering of Gaussian observations of a Markov jump-diffusion with an embedded Markov chain, that is described by stochastic differential equations driven by Brownian motions and a random Poisson measure. The modelling potential of this class of Markov processes is illustrated by some simple realistic examples.


Journal ArticleDOI
TL;DR: This paper gives a systematic treatment of results about the existence of various types of nearly-optimal strategies (Markov, stationary) in countable state total reward Markov decision processes.
Abstract: This paper gives a systematic treatment of results about the existence of various types of nearly-optimal strategies (Markov , stationary) in countable state total reward Markov decision processes. For example the following questions are considered: do there exist optimal stationary strategics, uniformly nearly-optimal stationary strategies or uniformly nearly-optimal Harkov strategies.

Journal ArticleDOI
TL;DR: This paper shows how modified policy iteration methods may be constructed to achieve a preassigned rate-of-convergence and provides impetus for perhaps more computationally efficient procedures than currently exist.

Journal ArticleDOI
TL;DR: In this article, the authors considered sequential Markov decision models with compact state and action spaces where the law of motion is not completely known, and showed that a policy is asymptotically as good for the true law-of-motion as it is for a consistent estimator.
Abstract: The paper considers sequential Markov decision models with compact state and action spaces where the law of motion is not completely known. It is shown that a policy is asymptotically as good for the true law of motion as it is for a consistent asymptotic estimator (estimating the law of motion). Thus, if there exists such a consistent asymptotic estimator, then there exists an asymptotically optimal policy by the compactness and respective continuity assumptions. Both the discounted reward criterion and the average reward criterion are considered. Work supported by the Deutsche Forschungsgemeinschaft. AMS 1979 subject classifications. 90 C 47, 62 C 99, 62 M 99


Proceedings ArticleDOI
01 Dec 1984
TL;DR: This paper is an attempt to obtain a computable solution to the problem of optimum traffic routing with state information, in the framework of the theory of Markov decision processes.
Abstract: In a modern stored-program-controlled telephone network, a considerable amount of information regarding the state of the network can be made available for operational decisions such as traffic routing. This paper is an attempt to obtain a computable solution to the problem of optimum traffic routing with state information, in the framework of the theory of Markov decision processes.

Journal ArticleDOI
TL;DR: A two-dimensional Markov Decision Process model is used to combine the Production-Inventory problem and the Equipment Replacement problem and provides two sets of sufficient conditions for the combined optimal policy to have a relatively simple monotonic structure.
Abstract: A two-dimensional Markov Decision Process model is used to combine the Production-Inventory problem and the Equipment Replacement problem We provide two sets of sufficient conditions for the combined optimal policy to have a relatively simple monotonic structure

Journal ArticleDOI
TL;DR: In a decision process with finite state space and arbitrary decision sets (gambles or actions), there is always available a Markov strategy which uniformly maximizes the average time spent at a goal as mentioned in this paper.

Journal ArticleDOI
TL;DR: In this paper, the control of a finite-state semi-Markov process is investigated and necessary and sufficient conditions for the optimality of a stationary policy in the class of nonstationary policies are given.
Abstract: The control of a finite-state semi-Markov process is investigated. In each state, a finite number of actions is available. Each action determines reward rates and transition rates to the other states. These rates depend on the holding time in the state and the actions can be changed at any point in time—not just at transition times. The goal is to find a policy that maximizes the expected total or discounted reward.In the infinite-horizon case, necessary and sufficient conditions for the optimality of a stationary policy in the class of nonstationary policies is given. A stationary policy is shown to be optimal in that class, and this policy can be chosen piecewise-constant in the holding time in each state if the rates are piecewise-analytic in the holding time. Several applications are examined in the domains of queueing, inventory and reliability.In the finite-horizon case, necessary and sufficient conditions for optimality are given.

Journal ArticleDOI
01 Oct 1984
TL;DR: The pair of functional equations for undiscounted Markov renewal programs (MRPs) are solved by an iterative procedure which generates geometrically-converging estimates for the gain rate vectorg* and relative value vector.
Abstract: The pair of functional equations for undiscounted Markov renewal programs (MRPs) are solved by an iterative procedure which generates geometrically-converging estimates for the gain rate vectorg* and relative value vector. In addition, monotonically-converging lower bounds ong* are exhibited. The approach emplys a hierarchical decomposition of the MRP into a set of communicating MRPs, with Hasting's bounds used to estimate thescalar gain rate for each member of the set.

Journal ArticleDOI
TL;DR: Numerical evidence is provided to show that exploiting the structure of a problem under consideration of ten yields a more substantial reduction of the required computational effort than some of the existing acceleration procedures.
Abstract: The paper gives a survey on solution techniques for Markov decision processes with respect to the total reward criterion. We will discuss briefly a number of problem structures which guarantee that the optimal policy possesses a specific structure which can be exploited in numerical solution procedures. However, the main emphasis is on iterative methods. It is shown by examples that the effect of a number of modifications of the standard iterative method, which are advocated in the literature, is limited in some realistic situations. Numerical evidence is provided to show that exploiting the structure of a problem under consideration of ten yields a more substantial reduction of the required computational effort than some of the existing acceleration procedures. We advocate that this structure should be analyzed and used in choosing the appropriate solution procedure. The appropriate procedure might be composed on one hand by blending several of the acceleration concepts that are described in literature. ...

Book ChapterDOI
01 Jan 1984
TL;DR: This article addresses the Markov decision problem with long-run average reward V u when there is a global constraint to be satisfied: I u ≤α, where I u is also a long- run average.
Abstract: This article addresses the Markov decision problem with long-run average reward V u when there is a global constraint to be satisfied: I u ≤α, where I u is also a long-run average. Using Lagrange multiplier techniques, existence of an optimal stationary policy is proven. Unlike the unconstrained theory, optimal stationary policies are in general randomized. Structural properties of an optimal policy are determined and the corresponding dynamic programming equations are derived. Finally, conditions are given for the existence of an optimal pure policy and an optimal “almost” bang-bang policy.

Journal ArticleDOI
TL;DR: In this paper, a continuous-time Markov decision process with a denumerable state space and nonzero terminal rewards is considered and a dynamic programming approximation algorithm for the finite-horizon problem is introduced.
Abstract: In this article we consider a continuous-time Markov decision process with a denumerable state space and nonzero terminal rewards. We first establish the necessary and sufficient optimality condition without any restriction on the cost functions. The necessary condition is derived through the Pontryagin maximum principle and the sufficient condition, by the inherent structure of the problem. We introduce a dynamic programming approximation algorithm for the finite-horizon problem. As the time between discrete points decreases, the optimal policy of the discretized problem converges to that of the continuous-time problem in the sense of weak convergence. For the infinite-horizon problem, a successive approximation method is introduced as an alternative to a policy iteration method.



Journal ArticleDOI
TL;DR: In this paper, the authors consider a class of general dynamic programs which satisfy the mono tonicity and con- traction assumption, and in which the sets of cost functions and policies are closed under the monotone contraction operators.
Abstract: This paper considers a class of general dynamic programs which satisfies the mono tonicity and con- traction assumption, and in which the sets of cost functions and policies are closed under the monotone contraction operators. This class of dynamic programs includes, piecewise linear, affine dynamic programs, partially observable Markov decision processes, and many sequential decision processes under uncertainty such as machine maintenance control models and search problems with incomplete infOImation. An algorithm based on generalized policy improvement has the property that it only generates cost functions and policies belonging to distinguished subsets of cost functions and policies, respectively. as special cases. This paper considers a class of dynamic programs, called closed, with the property that the generalized policy improvement algorithm stays within a certain "small" subset of cost functions and policies. In other words, the sets of cost functions and policies generated by the algo­ rithm are closed under the monotone contraction operators. Furthermore, it is possible to keep such sets within "distinguished small" subsets of the sets of all bounded cost functions and all stationary policies, respectively. This property is very important to dynamic programming from a computational aspect. The class of closed dynamic programs includes piecewise linear dynamic progralnming (16), affine dynamic programming (5), (6), partially ob­ servable Markov decision processes (7), (IS), (17) and many sequential decision processes with imperfect information such as machine maintenance models (17) and search models. The approximation of dynamic programs is

Proceedings ArticleDOI
01 Jan 1984
TL;DR: Application of the DSS to a spacecraft purchase problem of INTELSAT is discussed and is used to motivate the developments.
Abstract: A decision support system (DSS) was developed for determining the number of items to purchase of a capital intensive commodity and when these items should be purchased in order to satisfy projected operational requirements. Such decisionmaking represents a major planning effort of many organizations such as airlines, trucking companies, and satellite telecommunications firms. The determination of these decisions is complicated by the length of time necessary to produce the commodity, the potential for failure, uncertain future costs and capacity requirements, multiple, conflicting, and noncommensurate objectives, and various exogenous factors. The DSS that was developed uses the simulation of a large Markov decision process (MDP) to evaluate purchase strategies generated by (1) domain experts, (2) heuristic procedures, and (3) the solution of an aggregated version of the MDP. Application of the DSS to a spacecraft purchase problem of INTELSAT is discussed and is used to motivate the developments.

ReportDOI
01 Dec 1984
TL;DR: Two deterministic algorithms for the maximum a posteriori estimation of a one dimensional, binary Markov random field from noisy observations are presented and an experimental comparison of the performance of optimal algorithms with a stochastic approximation scheme is presented.
Abstract: : This document presents two deterministic algorithms for the maximum a posteriori estimation of a one dimensional, binary Markov random field from noisy observations. Extensions to other related problems, such as one dimensional signal matching, and estimation of continuous valued Markov random fields are also discussed. Finally, the author presents an experimental comparison of the performance of optimal algorithms with a stochastic approximation scheme (simulated annealing). Additional keywords: Mathematical models, Dynamic programming, Gaussian noise, White noise, Army research. (Author)