Showing papers on "Markov decision process published in 1984"

PDF

Open Access

Journal Article•DOI•

Piecewise-Deterministic Markov Processes: A General Class of Non-Diffusion Stochastic Models

[...]

01 Jul 1984-Journal of the royal statistical society series b-methodological

TL;DR: Stochastic calculus for these stochastic processes is developed and a complete characterization of the extended generator is given; this is the main technical result of the paper.

...read moreread less

Abstract: A general class of non-diffusion stochastic models is introduced with a view to providing a framework for studying optimization problems arising in queueing systems, inventory theory, resource allocation and other areas. The corresponding stochastic processes are Markov processes consisting of a mixture of deterministic motion and random jumps. Stochastic calculus for these processes is developed and a complete characterization of the extended generator is given; this is the main technical result of the paper. The relevance of the extended generator concept in applied problems is discussed and some recent results on optimal control of piecewise-deterministic processes are described.

...read moreread less

954 citations

Journal Article•DOI•

Optimal control of two interacting service stations

[...]

Bruce Hajek¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jun 1984-IEEE Transactions on Automatic Control

TL;DR: In this paper, the optimal control of a Markov network with two service stations and linear cost is studied and optimal switching curves described by switching curves in the two-dimensional state space are shown to exist.

...read moreread less

Abstract: Optimal controls described by switching curves in the two-dimensional state space are shown to exist for the optimal control of a Markov network with two service stations and linear cost. The controls govern routing and service priorities. Finite horizon and long run average cost problems are considered and value iteration is a key tool. Nonconvex value functions are shown to exist for slightly more general networks. Nonconvex value functions are also shown to arise for a simple single station control problem in which the instantaneous cost is convex but not monotone. Nevertheless, optimality of threshold policies is established for the single station problem. The proof is based on a novel use of stochastic coupling and policy iteration.

...read moreread less

309 citations

Journal Article•DOI•

The decentralized quickest detection problem

[...]

Demosthenis Teneketzis, Pravin Varaiya¹•Institutions (1)

University of California, Berkeley¹

01 Jul 1984-IEEE Transactions on Automatic Control

TL;DR: In this paper, it is shown that the optimal decision is characterized by thresholds as in the decoupled case, however, the thresholds are time-varying and their computation requires the solution of two coupled sets of dynamic programming equations.

...read moreread less

Abstract: Two detectors making independent observations must decide when a Markov chain jumps from state 0 to state 1. The decisions are coupled through a common cost function. It is shown that the optimal decision is characterized by thresholds as in the decoupled case. However, the thresholds are time-varying and their computation requires the solution of two coupled sets of dynamic programming equations. A comparison to the decoupled case shows the structure of the coupling.

...read moreread less

60 citations

Journal Article•DOI•

Multiplicative Markov Decision Chains

[...]

Uriel G. Rothblum¹•Institutions (1)

Yale University¹

01 Feb 1984-Mathematics of Operations Research

TL;DR: A constructive proof of the existence of optimal policies among all policies under new cumulative average optimality criteria which are more sensitive than the maximization of the spectral radius is given.

...read moreread less

Abstract: Previous treatments of multiplicative Markov decision chains eg., Bellman [Bellman, R. 1957. Dynamic Programming. Princeton University Press, Princeton, New Jersey.], Mandl [Mandl, P. 1967. An iterative method for maximizing the characteristic root of positive matrices. Rev. Roumaine Math. Pures Appl.XII 1317--1322.], and Howard and Matheson [Howard, R. A., Matheson, J. E. 1972. Risk-sensitive Markov decision processes. Management Sci.8 356--369.] restricted attention to stationary policies and assumed that all transition matrices are irreducible and aperiodic. They also used a “first term” optimality criterion, namely maximizing the spectral radius of the associated transition matrix. We give a constructive proof of the existence of optimal policies among all policies under new cumulative average optimality criteria which are more sensitive than the maximization of the spectral radius. The algorithm for finding an optimal policy, first searches for a stationary policy with a nonnilpotent transition matrix, provided such a rule exists. Otherwise, the method still finds an optimal policy; though in this case the set of optimal policies usually does not contain a stationary policy! If a stationary policy with a nonnilpotent transition matrix exists, then we develop a policy improvement algorithm which finds a stationary optimal policy.

...read moreread less

55 citations

Journal Article•DOI•

Nonhomogeneous Markov processes

[...]

S. E. Kuznetsov

01 Jun 1984-Journal of Mathematical Sciences

TL;DR: In this article, the authors present the foundations of the theory of nonhomogeneous Markov processes in general state spaces, and give a survey of the fundamental papers in this topic.

...read moreread less

Abstract: We present the foundations of the theory of nonhomogeneous Markov processes in general state spaces and we give a survey of the fundamental papers in this topic. We consider the following questions: 1. The existence of transition functions for a Markov process. 2. The construction of regularization of processes. 3. The properties of right and left processes: the strict Markov property, the behavior of excessive functions, etc. 4. The relation of right and left processes with dual homogeneous processes and the application of the results of the nonhomogeneous theory to dual homogeneous processes, etc.

...read moreread less

21 citations

Proceedings Article•DOI•

Syntax driven recognition of connected words by Markov models

[...]

M. Cravero¹, L. Fissore, R. Pieraccini, C. Scagliola•Institutions (1)

CSELT¹

01 Mar 1984

TL;DR: A connected speech recognition system based on Markov models is described and the performance was analyzed and compared with that of a system which uses prototypes of words instead of Markov model.

...read moreread less

Abstract: This paper describes a connected speech recognition system based on Markov models. The performance of this system was analyzed and compared with that of a system which uses prototypes of words instead of Markov models. Some preliminary results are reported with reference to the recognition of connected digits.

...read moreread less

21 citations

Journal Article•DOI•

Two competing queues with linear costs: the μc-rule is often optimal

[...]

John S. Baras, A. J. Dorsey, Armand M. Makowski

01 Mar 1984-Advances in Applied Probability

TL;DR: In this paper, a state-space model is presented for a queueing system where two classes of customer compete in discrete-time for the service attention of a single server with infinite buffer capacity.

...read moreread less

Abstract: A state-space model is presented for a queueing system where two classes of customer compete in discrete-time for the service attention of a single server with infinite buffer capacity. The arrivals are modelled by an independent identically distributed random sequence of a general type while the service completions are generated by independent Bernoulli streams; the allocation of service attention is governed by feedback policies which are based on past decisions and buffer content histories. The cost of operation per unit time is a linear function of the queue sizes. Under the model assumptions, a fixed prioritization scheme, known as the μ c -rule, is shown to be optimal when the expected long-run average criterion and the expected discounted criterion, over both finite and infinite horizons, are used. This static prioritization of the two classes of customers is done solely on the basis of service and cost parameters. The analysis is based on the dynamic programming methodology for Markov decision processes and takes advantage of the sample-path properties of the adopted state-space model.

...read moreread less

21 citations

Proceedings Article•DOI•

Backward, forward and backward-forward dynamic programming models under commutativity conditions

[...]

Sergio Verdu¹, H.V. Poor²•Institutions (2)

Princeton University¹, University of Illinois at Urbana–Champaign²

01 Dec 1984

TL;DR: A general dynamic programming operator model that includes, but is not restricted to, optimization problems is proposed that is inclusive of sequential nonextremization problems.

...read moreread less

Abstract: Several authors (Denardo [61 Karp and Held [18], and Bertsekas [3]) have proposed abstract dynamic programming models encompassing a wide variety of sequential optimization problems. The unifying purpose of these models is to impose sufficient conditions on the recursive defimtion of the objective function to guarantee the validity of the solution of the optimization problem by a dynamic programming iteration. In this paper we propose a general dynamic programming operator model that includes, but is not restricted to, optimization problems. Any functional satisfying a certain commutativity condition (which reduces to the principle of optimality in extrermzation problems see Section 2, B2) a-ith the generating operator of the objective recursive function, results in a sequential problem solvable by a dynamic programming iteration. Examples of sequential nonextremization problems fitting t h s framework are the derivation of marginal distributions in arbitrary probability spaces, iterative computation of stageseparated functions defined on general algebra~c systems such as additive commutative semi-groups with distributlve products, generation of symbolic transfer functions, and the Chapman-Kolmogorov equations.

...read moreread less

14 citations

Journal Article•DOI•

Exit systems for dual markov processes

[...]

Joanna B. Mitro¹•Institutions (1)

University of Cincinnati¹

01 Jul 1984-Probability Theory and Related Fields

13 citations

Journal Article•DOI•

On Markov Maintenance Problems

[...]

Yukio Hatoyama

01 Oct 1984-IEEE Transactions on Reliability

TL;DR: In this paper, the applicability of Markov maintenance models is discussed and a model for filling the gap between theoretical and practical maintenance problems is proposed for the purpose of filling the gaps between the two domains.

...read moreread less

Abstract: The applicability of Markov maintenance models is crucial. We need to fill the gap between theoretical and practical maintenance problems. A model is proposed for that purpose.

...read moreread less

13 citations

Journal Article•DOI•

On Stationary Strategies in Countable State Total Reward Markov Decision Processes

[...]

Jan van der Wal

01 May 1984-Mathematics of Operations Research

TL;DR: This paper deals with total reward Markov decision processes with countable state space and establishes that if an optimal strategy exists then also an optimal stationary strategy exists.

...read moreread less

Abstract: This paper deals with total reward Markov decision processes with countable state space. Various partial results from the literature are connected and extended in the following theorem. If in each state where the value is nonpositive a conserving action exists then there exists a stationary strategy f which is uniformly nearly optimal in the following sense: vf ≥ v*-eu*, where u* is the value of the problem if only the positive rewards are counted. Further, the following result is established: if an optimal strategy exists then also an optimal stationary strategy exists.

...read moreread less

Book Chapter•DOI•

Markov jump-diffusion models and decision-making-free filtering

[...]

H. A. P. Blom¹•Institutions (1)

National Aerospace Laboratory¹

01 Jan 1984

TL;DR: The problem considered is non-linear filtering of Gaussian observations of a Markov jump-diffusion with an embedded Markov chain, that is described by stochastic differential equations driven by Brownian motions and a random Poisson measure.

...read moreread less

Abstract: The problem considered is non-linear filtering of Gaussian observations of a Markov jump-diffusion with an embedded Markov chain, that is described by stochastic differential equations driven by Brownian motions and a random Poisson measure. The modelling potential of this class of Markov processes is illustrated by some simple realistic examples.

...read moreread less

Journal Article•DOI•

Non-negative matrices, dynamic programming and a harvesting problem

[...]

D. R. Grey

01 Dec 1984-Journal of Applied Probability

Journal Article•DOI•

On the use of information in markov decision processes

[...]

van der J Jan Wal, J Jaap Wessels

01 Jan 1984-Statistics & Decisions

TL;DR: This paper gives a systematic treatment of results about the existence of various types of nearly-optimal strategies (Markov, stationary) in countable state total reward Markov decision processes.

...read moreread less

Abstract: This paper gives a systematic treatment of results about the existence of various types of nearly-optimal strategies (Markov , stationary) in countable state total reward Markov decision processes. For example the following questions are considered: do there exist optimal stationary strategics, uniformly nearly-optimal stationary strategies or uniformly nearly-optimal Harkov strategies.

...read moreread less

Journal Article•DOI•

Truncated policy iteration methods

[...]

Ron S. Dembo¹, Moshe Haviv²•Institutions (2)

Yale University¹, Hebrew University of Jerusalem²

01 Dec 1984-Operations Research Letters

TL;DR: This paper shows how modified policy iteration methods may be constructed to achieve a preassigned rate-of-convergence and provides impetus for perhaps more computationally efficient procedures than currently exist.

...read moreread less

Journal Article•DOI•

Asymptotic results for sequential markov decision models under uncertainty

[...]

Manfred Schäl

01 Jan 1984-Statistics and Risk Modeling

TL;DR: In this article, the authors considered sequential Markov decision models with compact state and action spaces where the law of motion is not completely known, and showed that a policy is asymptotically as good for the true law-of-motion as it is for a consistent estimator.

...read moreread less

Abstract: The paper considers sequential Markov decision models with compact state and action spaces where the law of motion is not completely known. It is shown that a policy is asymptotically as good for the true law of motion as it is for a consistent asymptotic estimator (estimating the law of motion). Thus, if there exists such a consistent asymptotic estimator, then there exists an asymptotically optimal policy by the compactness and respective continuity assumptions. Both the discounted reward criterion and the average reward criterion are considered. Work supported by the Deutsche Forschungsgemeinschaft. AMS 1979 subject classifications. 90 C 47, 62 C 99, 62 M 99

...read moreread less

Journal Article•DOI•

Markov decision processes with continuous time parameter

[...]

Sean Collins

01 Apr 1984-Journal of the Operational Research Society

Proceedings Article•DOI•

Routing of telephone traffic as a controlled Markov process

[...]

K. Krishnan¹, T. Ott¹•Institutions (1)

Telcordia Technologies¹

01 Dec 1984

TL;DR: This paper is an attempt to obtain a computable solution to the problem of optimum traffic routing with state information, in the framework of the theory of Markov decision processes.

...read moreread less

Abstract: In a modern stored-program-controlled telephone network, a considerable amount of information regarding the state of the network can be made available for operational decisions such as traffic routing. This paper is an attempt to obtain a computable solution to the problem of optimum traffic routing with state information, in the framework of the theory of Markov decision processes.

...read moreread less

Journal Article•DOI•

Production-Inventory with Equipment Replacement-PIER

[...]

M. Venkatesan¹•Institutions (1)

New York University¹

01 Dec 1984-Operations Research

TL;DR: A two-dimensional Markov Decision Process model is used to combine the Production-Inventory problem and the Equipment Replacement problem and provides two sets of sufficient conditions for the combined optimal policy to have a relatively simple monotonic structure.

...read moreread less

Abstract: A two-dimensional Markov Decision Process model is used to combine the Production-Inventory problem and the Equipment Replacement problem We provide two sets of sufficient conditions for the combined optimal policy to have a relatively simple monotonic structure

...read moreread less

Journal Article•DOI•

On maximizing the average time at a goal

[...]

Stephen Demko¹, Theodore P. Hill¹•Institutions (1)

Georgia Institute of Technology¹

01 Jul 1984-Stochastic Processes and their Applications

TL;DR: In a decision process with finite state space and arbitrary decision sets (gambles or actions), there is always available a Markov strategy which uniformly maximizes the average time spent at a goal as mentioned in this paper.

...read moreread less

Journal Article•DOI•

Optimality of Piecewise-Constant Policies in Semi-Markov Decision Chains

[...]

Laurent Cantaluppi

01 Sep 1984-Siam Journal on Control and Optimization

TL;DR: In this paper, the control of a finite-state semi-Markov process is investigated and necessary and sufficient conditions for the optimality of a stationary policy in the class of nonstationary policies are given.

...read moreread less

Abstract: The control of a finite-state semi-Markov process is investigated. In each state, a finite number of actions is available. Each action determines reward rates and transition rates to the other states. These rates depend on the holding time in the state and the actions can be changed at any point in time—not just at transition times. The goal is to find a policy that maximizes the expected total or discounted reward.In the infinite-horizon case, necessary and sufficient conditions for the optimality of a stationary policy in the class of nonstationary policies is given. A stationary policy is shown to be optimal in that class, and this policy can be chosen piecewise-constant in the holding time in each state if the rates are piecewise-analytic in the holding time. Several applications are examined in the domains of queueing, inventory and reliability.In the finite-horizon case, necessary and sufficient conditions for optimality are given.

...read moreread less

Journal Article•DOI•

A value-iteration scheme for undiscounted multichain Markov renewal programs

[...]

Paul J. Schweitzer¹•Institutions (1)

Saint Petersburg State University¹

01 Oct 1984

TL;DR: The pair of functional equations for undiscounted Markov renewal programs (MRPs) are solved by an iterative procedure which generates geometrically-converging estimates for the gain rate vectorg* and relative value vector.

...read moreread less

Abstract: The pair of functional equations for undiscounted Markov renewal programs (MRPs) are solved by an iterative procedure which generates geometrically-converging estimates for the gain rate vectorg* and relative value vector. In addition, monotonically-converging lower bounds ong* are exhibited. The approach emplys a hierarchical decomposition of the MRP into a set of communicating MRPs, with Hasting's bounds used to estimate thescalar gain rate for each member of the set.

...read moreread less

Journal Article•DOI•

On iterative optimization of structural Markov decision processes with discounted rewards

[...]

Mhm Marcel Hendrikx, van Jaee Jo Nunen¹, J Jaap Wessels•Institutions (1)

Saint Petersburg State University¹

01 Jan 1984-Optimization

TL;DR: Numerical evidence is provided to show that exploiting the structure of a problem under consideration of ten yields a more substantial reduction of the required computational effort than some of the existing acceleration procedures.

...read moreread less

Abstract: The paper gives a survey on solution techniques for Markov decision processes with respect to the total reward criterion. We will discuss briefly a number of problem structures which guarantee that the optimal policy possesses a specific structure which can be exploited in numerical solution procedures. However, the main emphasis is on iterative methods. It is shown by examples that the effect of a number of modifications of the standard iterative method, which are advocated in the literature, is limited in some realistic situations. Numerical evidence is provided to show that exploiting the structure of a problem under consideration of ten yields a more substantial reduction of the required computational effort than some of the existing acceleration procedures. We advocate that this structure should be analyzed and used in choosing the appropriate solution procedure. The appropriate procedure might be composed on one hand by blending several of the acceleration concepts that are described in literature. ...

...read moreread less

Book Chapter•DOI•

Markov decision processes with constraints

[...]

Keith W. Ross¹•Institutions (1)

University of Michigan¹

01 Jan 1984

TL;DR: This article addresses the Markov decision problem with long-run average reward V u when there is a global constraint to be satisfied: I u ≤α, where I u is also a long- run average.

...read moreread less

Abstract: This article addresses the Markov decision problem with long-run average reward V u when there is a global constraint to be satisfied: I u ≤α, where I u is also a long-run average. Using Lagrange multiplier techniques, existence of an optimal stationary policy is proven. Unlike the unconstrained theory, optimal stationary policies are in general randomized. Structural properties of an optimal policy are determined and the corresponding dynamic programming equations are derived. Finally, conditions are given for the existence of an optimal pure policy and an optimal “almost” bang-bang policy.

...read moreread less

Journal Article•DOI•

Continuous-time markov decision processes with nonzero terminal reward

[...]

Kyung Y. Jo¹•Institutions (1)

George Mason University¹

01 Jun 1984-Naval Research Logistics Quarterly

TL;DR: In this paper, a continuous-time Markov decision process with a denumerable state space and nonzero terminal rewards is considered and a dynamic programming approximation algorithm for the finite-horizon problem is introduced.

...read moreread less

Abstract: In this article we consider a continuous-time Markov decision process with a denumerable state space and nonzero terminal rewards. We first establish the necessary and sufficient optimality condition without any restriction on the cost functions. The necessary condition is derived through the Pontryagin maximum principle and the sufficient condition, by the inherent structure of the problem. We introduce a dynamic programming approximation algorithm for the finite-horizon problem. As the time between discrete points decreases, the optimal policy of the discretized problem converges to that of the continuous-time problem in the sense of weak convergence. For the infinite-horizon problem, a successive approximation method is introduced as an alternative to a policy iteration method.

...read moreread less

Book Chapter•DOI•

Weighted Markov Processes with an Application to Risk Theory

[...]

F. Delbaen¹, J. Haezendonck•Institutions (1)

Vrije Universiteit Brussel¹

01 Jan 1984

Journal Article•DOI•

Decision models in stochastic programming: Operational methods of decision making under uncertainty: Volume 7 in: North-Holland Series in System Science and Engineering, North-Holland, New York, 1982, xii + 190 pages, Dfl.130.00

[...]

Kurt Marti

01 Jan 1984-European Journal of Operational Research

Journal Article•DOI•

On the class of closed dynamic programs

[...]

Katsushige Sawaki

01 Jan 1984-Journal of The Operations Research Society of Japan

TL;DR: In this paper, the authors consider a class of general dynamic programs which satisfy the mono tonicity and con- traction assumption, and in which the sets of cost functions and policies are closed under the monotone contraction operators.

...read moreread less

Abstract: This paper considers a class of general dynamic programs which satisfies the mono tonicity and con- traction assumption, and in which the sets of cost functions and policies are closed under the monotone contraction operators. This class of dynamic programs includes, piecewise linear, affine dynamic programs, partially observable Markov decision processes, and many sequential decision processes under uncertainty such as machine maintenance control models and search problems with incomplete infOImation. An algorithm based on generalized policy improvement has the property that it only generates cost functions and policies belonging to distinguished subsets of cost functions and policies, respectively. as special cases. This paper considers a class of dynamic programs, called closed, with the property that the generalized policy improvement algorithm stays within a certain "small" subset of cost functions and policies. In other words, the sets of cost functions and policies generated by the algo rithm are closed under the monotone contraction operators. Furthermore, it is possible to keep such sets within "distinguished small" subsets of the sets of all bounded cost functions and all stationary policies, respectively. This property is very important to dynamic programming from a computational aspect. The class of closed dynamic programs includes piecewise linear dynamic progralnming (16), affine dynamic programming (5), (6), partially ob servable Markov decision processes (7), (IS), (17) and many sequential decision processes with imperfect information such as machine maintenance models (17) and search models. The approximation of dynamic programs is

...read moreread less

Proceedings Article•DOI•

A Simulation/optimization-based planning and decision support system

[...]

William T. Scherer, Chelsea C. White

01 Jan 1984

TL;DR: Application of the DSS to a spacecraft purchase problem of INTELSAT is discussed and is used to motivate the developments.

...read moreread less

Abstract: A decision support system (DSS) was developed for determining the number of items to purchase of a capital intensive commodity and when these items should be purchased in order to satisfy projected operational requirements. Such decisionmaking represents a major planning effort of many organizations such as airlines, trucking companies, and satellite telecommunications firms. The determination of these decisions is complicated by the length of time necessary to produce the commodity, the potential for failure, uncertain future costs and capacity requirements, multiple, conflicting, and noncommensurate objectives, and various exogenous factors. The DSS that was developed uses the simulation of a large Markov decision process (MDP) to evaluate purchase strategies generated by (1) domain experts, (2) heuristic procedures, and (3) the solution of an aggregated version of the MDP. Application of the DSS to a spacecraft purchase problem of INTELSAT is discussed and is used to motivate the developments.

...read moreread less

Report•DOI•

Bayesian Estimation of One Dimensional Discrete Markov Random Fields.

[...]

J L Marroquin

01 Dec 1984

TL;DR: Two deterministic algorithms for the maximum a posteriori estimation of a one dimensional, binary Markov random field from noisy observations are presented and an experimental comparison of the performance of optimal algorithms with a stochastic approximation scheme is presented.

...read moreread less

Abstract: : This document presents two deterministic algorithms for the maximum a posteriori estimation of a one dimensional, binary Markov random field from noisy observations. Extensions to other related problems, such as one dimensional signal matching, and estimation of continuous valued Markov random fields are also discussed. Finally, the author presents an experimental comparison of the performance of optimal algorithms with a stochastic approximation scheme (simulated annealing). Additional keywords: Mathematical models, Dynamic programming, Gaussian noise, White noise, Army research. (Author)

...read moreread less