Showing papers on "Markov decision process published in 1989"

PDF

Open Access

Book Chapter•DOI•

[...]

01 Jan 1989

TL;DR: This chapter introduces the stochastic control processes, also known as Markov decision processes or Markov dynamic programs, and discusses (briefly) more general control systems, such as non-stationary CMP’s and semi-Markov control models.

...read moreread less

Abstract: The objective of this chapter is to introduce the stochastic control processes we are interested in; these are the so-called (discrete-time) controlled Markov processes (CMP’s), also known as Markov decision processes or Markov dynamic programs. The main part is Section 1.2. It contains some basic definitions and the statement of the optimal and the adaptive control problems studied in this book. In Section 1.3 we present several examples; the idea is to illustrate the main concepts and provide sources for possible applications. Also in Section 1.3 we discuss (briefly) more general control systems, such as non-stationary CMP’s and semi-Markov control models. The chapter is concluded in Section 1.4 with some comments on related references.

...read moreread less

399 citations

Journal Article•DOI•

Average Cost Optimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs

[...]

Linn I. Sennott¹•Institutions (1)

Illinois State University¹

01 Aug 1989-Operations Research

TL;DR: In this paper, the authors considered infinite state Markov decision processes with unbounded costs and provided sufficient conditions for the existence of a distinguished state of smallest discounted value and a single stationary policy inducing an irreducible, ergodic Markov chain for which the average cost of a first passage from any state to the distinguished state is finite.

...read moreread less

Abstract: We deal with infinite state Markov decision processes with unbounded costs. Three simple conditions, based on the optimal discounted value function, guarantee the existence of an expected average cost optimal stationary policy. Sufficient conditions are the existence of a distinguished state of smallest discounted value and a single stationary policy inducing an irreducible, ergodic Markov chain for which the average cost of a first passage from any state to the distinguished state is finite. A result to verify this is also given. Two examples illustrate the ease of applying the criteria.

...read moreread less

199 citations

DOI•

Algorithms for partially observable markov decision processes

[...]

Hsien-Te Cheng, Shelby Brumelle

01 Jan 1989

TL;DR: The thesis develops methods to solve discrete-time finite-state partially observable Markov decision processes and proves that the policy improvement step in iterative discretization procedure can be replaced by the approximation version of linear support algorithm.

...read moreread less

Abstract: The thesis develops methods to solve discrete-time finite-state partially observable Markov decision processes. For the infinite horizon problem, only discounted reward case is considered. For the finite horizon problem, two new algorithms are developed. The first algorithm is called the relaxed region algorithm. For each support in the value function, this algorithm determines a region not smaller than its support region and modifies it implicitly in later steps until the exact support region is found. The second algorithm, called linear support algorithm, systematically approximates the value function until all supports in the value function are found. The most important feature of this algorithm is that it can be modified to find an approximate value function. It has been shown that these two algorithms are more efficient than the one-pass algorithm. For the infinite horizon problem, it is first shown that the approximation version of linear support algorithm can be used to substitute the policy improvement step in a standard successive approximation method to obtain an $\epsilon$-optimal value function. Next, an iterative discretization procedure is developed which uses a small number of states to find new supports and improve the value function between two policy improvement steps. Since only a finite number of states are chosen in this process, some techniques developed for finite MDP can be applied here. Finally, we prove that the policy improvement step in iterative discretization procedure can be replaced by the approximation version of linear support algorithm. The last part of the thesis deals with problems with continuous signals. We first show that if the signal processes are uniformly distributed, then the problem can be reformulated as a problem with finite number of signals. Then the result is extended to where the signal processes are step functions. Since step functions can be easily used to approximate most of the probability distributions, this method can be used to approximate most of the problems with continuous signals. Finally, we present some conditions which guarantee that the linear support can be computed for any given state, then the methods developed for finite signal cases can be easily modified and applied to problems for which the conditions hold.

...read moreread less

173 citations

Journal Article•DOI•

Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints

[...]

Keith W. Ross¹•Institutions (1)

University of Pennsylvania¹

01 Jun 1989-Operations Research

TL;DR: It is shown that a round-robin type policy is optimal, and conjecture the same for a steering policy that depends on the entire past history of the process, but whose implementation requires essentially no more storage than that of a pure policy.

...read moreread less

Abstract: The Markov decision problem of locating a policy to maximize the long-run average reward subject to K long-run average cost constraints is considered. It is assumed that the state and action spaces are finite and the law of motion is unichain, that is, every pure policy gives rise to a Markov chain with one recurrent class. It is first proved that there exists an optimal stationary policy with a degree of randomization no greater than K; consequently, it is never necessary to randomize in more than K states. A linear program produces the optimal policy with limited randomization. For the special case of a single constraint, we also address the problem of finding optimal nonrandomized, but nonstationary, policies. We show that a round-robin type policy is optimal, and conjecture the same for a steering policy that depends on the entire past history of the process, but whose implementation requires essentially no more storage than that of a pure policy.

...read moreread less

164 citations

Journal Article•DOI•

Variance-Penalized Markov Decision Processes

[...]

Jerzy A. Filar¹, Lodewijk C. M. Kallenberg², Huey-miin Lee³•Institutions (3)

University of Maryland, Baltimore County¹, Leiden University², Johns Hopkins University³

01 Feb 1989-Mathematics of Operations Research

TL;DR: This work considers a Markov decision process with both the expected limiting average, and the discounted total return criteria, appropriately modified to include a penalty for the variability in the stream of rewards.

...read moreread less

Abstract: We consider a Markov decision process with both the expected limiting average, and the discounted total return criteria, appropriately modified to include a penalty for the variability in the stream of rewards. In both cases we formulate appropriate nonlinear programs in the space of state-action frequencies averaged, or discounted whose optimal solutions are shown to be related to the optimal policies in the corresponding “variance-penalized MDP.” The analysis of one of the discounted cases is facilitated by the introduction of a “Cartesian product of two independent MDPs.”

...read moreread less

164 citations

Journal Article•DOI•

Markov and Markov reward model transient analysis: An overview of numerical approaches

[...]

Andrew L. Reibman¹, Roger M. H. Smith¹, Kishor S. Trivedi¹•Institutions (1)

Duke University¹

25 May 1989-European Journal of Operational Research

TL;DR: This paper first derive instantaneous and cumulative measures of Markov and Markov reward model behavior, and compares the complexity of several competing algorithms for the computation of these measures.

...read moreread less

161 citations

Journal Article•DOI•

Optimal circuit access policies in an ISDN environment: a Markov decision approach

[...]

Keith W. Ross¹, Danny H. K. Tsang²•Institutions (2)

University of Pennsylvania¹, Dalhousie University²

01 Sep 1989-IEEE Transactions on Communications

TL;DR: Both linear programming and value-iteration MDP algorithms are coupled with a novel state descriptor in order to locate the optimal policy for reasonable-size problems (several T1 carriers in parallel for the access-port case, and small networks of T2 carriers for the network-access case).

...read moreread less

Abstract: The problem of determining optimal access policies for circuit-switched networks that support traffic types with varying bandwidth requirements is addressed. The authors suppose that the network supports K classes of calls where each class is determined by a fixed route and a bandwidth requirement. A Markov decision process (MDP) approach is used to obtain optimal access policies for three models: the flexible scheme access-port model where a single link is shared; the contiguous scheme access-port model where wideband calls are required to occupy specific contiguous regions of the TDM frame; and the network-access model where a call holds several channels in different links simultaneously. Both linear programming and value-iteration MDP algorithms are coupled with a novel state descriptor in order to locate the optimal policy for reasonable-size problems (several T1 carriers in parallel for the access-port case, and small networks of T1 carriers for the network-access case). >

...read moreread less

153 citations

Journal Article•DOI•

The Markov Process as a Compositional Model: A Survey and Tutorial

[...]

Charles Ames

01 Jan 1989-Leonardo

TL;DR: A survey of Markov-based efforts in automated composition with a tutorial demonstrating how various theoretical properties associated with Markov processes can be put to practical use, and a contrast with alternative compositional strategies.

...read moreread less

Abstract: The author combines a survey of Markov-based efforts in automated composition with a tutorial demonstrating how various theoretical properties associated with Markov processes can be put to practical use. The historical background is traced from A. A. Markov’s original formulation through to the present. A digression into Markov-chain theory introduces ‘waiting counts’ and ‘stationary probabilities’. The author’s Demonstration 4 for solo clarinet illustrates how these properties affect the behavior of a melody composed using Markov chains. This simple example becomes a point of departure for increasingly general interpretations of the Markov process. The interpretation of ‘states’ is reevaluated in the light of recent musical efforts that employ Markov chains of higher-level objects and in the light of other efforts that incorporate relative attributes into the possible interpretations. Other efforts expand Markov’s original definition to embrace ‘Nth-order’ transitions, evolving transition matrices and chains of chains. The remainder of this article contrasts Markov processes with alternative compositional strategies.

...read moreread less

130 citations

Journal Article•DOI•

Dynamic Process Improvement

[...]

Charles H. Fine¹, Evan L. Porteus²•Institutions (2)

Massachusetts Institute of Technology¹, Stanford University²

01 Aug 1989-Operations Research

TL;DR: This paper explores the economics of investing in gradual process improvement, a key component, with empirically supported importance, of the well known Just-in-Time and Total Quality Control philosophies, and forms a Markov decision process, which is applied to the problem of setup reduction and process quality improvement.

...read moreread less

Abstract: This paper explores the economics of investing in gradual process improvement, a key component, with empirically supported importance, of the well known Just-in-Time and Total Quality Control philosophies. We formulate a Markov decision process, analyze it, and apply it to the problem of setup reduction and process quality improvement. Instead of a one-time investment opportunity for a large predictable technological advance, we allow many smaller investments over time, with potential process improvements of random magnitude. We use a somewhat nonstandard formulation of the immediate return, which facilitates the derivation of results. The policy that simply maximizes the immediate return, called the last chance policy, provides an upper bound on the optimal investment amount. Furthermore, if the last chance policy invests in process improvement, then so does the optimal policy. Each continues investing until a shared target state is attained. We derive fairly restrictive conditions that must be met for the policy of investing forever in process improvements to be optimal. Decreasing the uncertainty of the process making the potential improvements more predictable has a desirable effect: the total return is increased and the target state increases, so the ultimate system is more productive. Numerical examples are presented and analyzed.

...read moreread less

128 citations

Journal Article•

Bridge service life prediction model using the markov chain

[...]

Yi Jiang, Kumares C. Sinha

01 Jan 1989-Transportation Research Record

TL;DR: In this paper, a bridge service life prediction model using Markov chain was developed to reflect the stochastic nature of bridge condition and service life, and the comparison of service life predictions by statistical and Markov Chain approaches was made.

...read moreread less

Abstract: This paper describes the application of Markov chain technique in estimating bridge service life. The change of bridge conditions is a stochastic process and, therefore, the service life of bridges is related to the probabilities of condition transitions. A bridge service life prediction model, using the Markov chain, was developed to reflect the stochastic nature of bridge condition and service life. The paper includes a discussion on the concept of Markov chain, the development and application of the service life prediction model using the Markov chain, and the comparison of service life predictions by statistical and Markov chain approaches.

...read moreread less

94 citations

Journal Article•DOI•

Markov Decision Processes with Sample Path Constraints: The Communicating Case

[...]

Keith W. Ross¹, Ravi Varadarajan²•Institutions (2)

University of Pennsylvania¹, University of Florida²

01 Oct 1989-Operations Research

TL;DR: Assuming that a policy exists that meets the sample-path constraint, it is established that there exist nearly optimal stationary policies for communicating MDPs and a parametric linear programming algorithm is given to construct nearly optimalstationary policies.

...read moreread less

Abstract: We consider time-average Markov Decision Processes MDPs, which accumulate a reward and cost at each decision epoch. A policy meets the sample-path constraint if the time-average cost is below a specified value with probability one. The optimization problem is to maximize the expected average reward over all policies that meet the sample-path constraint. The sample-path constraint is compared with the more commonly studied constraint of requiring the average expected cost to be less than a specified value. Although the two criteria are equivalent for certain classes of MDPs, their feasible and optimal policies differ for many nontrivial problems. In general, there does not exist optimal or nearly optimal stationary policies when the expected average-cost constraint is employed. Assuming that a policy exists that meets the sample-path constraint, we establish that there exist nearly optimal stationary policies for communicating MDPs. A parametric linear programming algorithm is given to construct nearly optimal stationary policies. The discussion relies on well known results from the theory of stochastic processes and linear programming. The techniques lead to simple proofs of the existence of optimal and nearly optimal stationary policies for unichain and deterministic MDPs, respectively.

...read moreread less

Journal Article•DOI•

Solution Procedures for Partially Observed Markov Decision Processes

[...]

Chelsea C. White¹, William T. Scherer¹•Institutions (1)

University of Virginia¹

01 Oct 1989-Operations Research

TL;DR: Three algorithms to solve the infinite horizon, expected discounted total reward partially observed Markov decision process POMDP, with an appropriately generalized numerical technique that has been shown to reduce CPU time until convergence for the completely observed case.

...read moreread less

Abstract: We present three algorithms to solve the infinite horizon, expected discounted total reward partially observed Markov decision process POMDP. Each algorithm integrates a successive approximations algorithm for the POMDP due to A. Smallwood and E. Sondik with an appropriately generalized numerical technique that has been shown to reduce CPU time until convergence for the completely observed case. The first technique is reward revision. The second technique is reward revision integrated with modified policy iteration. The third is a standard extrapolation. A numerical study indicates the potentially significant computational value of these algorithms.

...read moreread less

Journal Article•DOI•

Average Cost Semi-Markov Decision Processes and the Control of Queueing Systems

[...]

Linn I. Sennott¹•Institutions (1)

Illinois State University¹

01 Apr 1989-Probability in the Engineering and Informational Sciences

TL;DR: In this article, the existence of an expected average cost optimal stationary policy is proved for infinite state semi-Markov decision processes with nonnegative, unbounded costs and finite action sets.

...read moreread less

Abstract: Semi-Markov decision processes underlie the control of many queueing systems. In this paper, we deal with infinite state semi-Markov decision processes with nonnegative, unbounded costs and finite action sets. Axioms for the existence of an expected average cost optimal stationary policy are presented. These conditions generalize the work in Sennott [22] for Markov decision processes. Verifiable conditions for the axioms to hold are obtained. The theory is applied to control of the M/G/l queue with variable service parameter, with on-off server, and with batch processing, and to control of the G/M/m queue with variable arrival parameter and customer rejection. It is applied to a timesharing network of queues with a single server and finally to optimal routing of Poisson arrivals to parallel exponential servers. The final section extends the existence result to compact action spaces.

...read moreread less

Journal Article•DOI•

Control of Markov chains with long-run average cost criterion: the dynamic programming equations

[...]

Vivek S. Borkar

01 May 1989-Siam Journal on Control and Optimization

TL;DR: In this article, the long run average cost control problem for discrete time Markov chains on a countable state space is studied in a very general framework and necessary and sufficient conditions for optimality in terms of the dynamic programming equations are given when an optimal stable stationary strategy is known to exist.

...read moreread less

Abstract: The long-run average cost control problem for discrete time Markov chains on a countable state space is studied in a very general framework. Necessary and sufficient conditions for optimality in terms of the dynamic programming equations are given when an optimal stable stationary strategy is known to exist (e.g., for the situations studied in [Stochastic Differential Systems, Stochastic Control Theory and Applications, IMA Vol. Math. App. 10, Springer-Verlag, New York, Berlin, 1988, pp. 57–77]). A characterization of the desired solution of the dynamic programming equations is given in a special case. Also included is a novel convex analytic argument for deducing the existence of an optimal stable stationary.strategy when that of a randomized one is known.

...read moreread less

Proceedings Article•DOI•

Likelihood Ratio Derivative Estimators For Stochastic Systems

[...]

Peter W. Glynn¹•Institutions (1)

Stanford University¹

01 Oct 1989

TL;DR: This paper discusses likelihood ratio derivative estimation techniques for stochastic systems and presents estimators for time homogeneous discrete-time Markov chains, semi-Markov processes, non-time homogeneous continuous- time Markov Chains, and generalized semi- Markov processes.

...read moreread less

Abstract: This paper discusses Iikelihood--ratio--derivative estimation techniques for stochastic systems. After a brief review of the basic concepts, likelihood-ratio-derivative estimators are presented for the following classes of stochastic processes: time homogeneous discrete-time Markov chains, non-time-homogeneous discrete-time Markov chains, time-homogeneous continuous-time Markov chains, semi-Markov processes, non-time-homogeneous continuous-time Markov chains, and generalized semi-Markov processes.

...read moreread less

Journal Article•DOI•

Optimal Acquisition of Automated Flexible Manufacturing Processes

[...]

George E. Monahan, Timothy L. Smunt

01 Apr 1989-Operations Research

TL;DR: The problem of converting a labor-intensive batch production process to one that incorporates flexible automation as a finite-state Markov decision process is formulated and the qualitative characteristics of optimal strategies for acquiring flexible automation are illustrated.

...read moreread less

Abstract: We formulate the problem of converting a labor-intensive batch production process to one that incorporates flexible automation as a finite-state Markov decision process. Interest rates and the level of automated technology influence both operating and acquisition costs and are treated as random variables. The model specifies the optimal level of capacity to convert to flexible automation. The optimization criterion is the minimization of the sum of expected, discounted costs incurred over a finite planning horizon. The optimal acquisition strategy depends upon the time period, the current interest rate, the current level of technology, and a measure of the remaining capacity that is not automated. We investigate the structure of optimal acquisition strategies using mathematical analysis and simulation. Our objective is to illustrate the qualitative characteristics of optimal strategies for acquiring flexible automation. As a step toward the implementation of the model, we examine the qualitative consequen...

...read moreread less

Journal Article•DOI•

The existence of minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin

[...]

M. Kurano

01 Mar 1989-Siam Journal on Control and Optimization

TL;DR: In this article, the authors considered the average-cost Markov decision process with compact state and action spaces and bounded lower semicontinuous cost functions, and proved that under additional weak conditions there exists an optimal stationary policy in the usual sense.

...read moreread less

Abstract: This paper studies the average-cost Markov decision process with compact state and action spaces and bounded lower semicontinuous cost functions. Following the idea of Borkar’s excellent papers [SIAMJ. Control Optim., 21 (1983), pp. 652–666; 22 (1984), pp. 965–978], the general case where irreducibility is not assumed is considered under the hypothesis of Doeblin and the existence of a minimum pair of state and policy, which attains the infimum of the average expected cost over all initial states and policies, is established. Further, it is proved that under additional weak conditions there exists an optimal stationary policy in the usual sense.

...read moreread less

Journal Article•DOI•

Optimal Dynamic Control of Resources in a Distributed System

[...]

Kang G. Shin, Chandan Krishna¹, Yann-Hang Lee²•Institutions (2)

University of Massachusetts Amherst¹, University of Florida²

01 Oct 1989-IEEE Transactions on Software Engineering

TL;DR: It is the purpose in this paper to quantitatively formulate the problem of controlling resources in a distributed system so as to optimize a reward function, and derive optimal control strategies using Markov decision theory.

...read moreread less

Abstract: The authors quantitatively formulate the problem of controlling resources in a distributed system so as to optimize a reward function and derive optimal control strategies using Markov decision theory. The control variables treated are quite general; they could be control decisions related to system configuration, repair, diagnostics, files, or data. Two algorithms for resource control in distributed systems are derived for time-invariant and periodic environments, respectively. A detailed example to demonstrate the power and usefulness of the approach is provided.

...read moreread less

Proceedings Article•DOI•

Computationally efficient adaptive control algorithms for Markov chains

[...]

A. Jalali¹, M. Ferguson¹•Institutions (1)

McGill University¹

13 Dec 1989

TL;DR: Algorithms for adaptive control of unknown finite Markov chains are proposed and are easy to implement and converge to the optimal policy in finite time.

...read moreread less

Abstract: Algorithms for adaptive control of unknown finite Markov chains are proposed. The algorithms consist of two parts: part one estimates the unknown parameters; part two computes the optimal policy. In this study the emphasis is on efficient online computation of the optimal policy. No a priori knowledge of the optimal policy is assumed. The optimal policy is computed recursively online. At each step a small amount of computation is required. At each transition of the chain, only the act corresponding to the present state of the chain is updated. The algorithms are easy to implement and converge to the optimal policy in finite time. >

...read moreread less

Journal Article•DOI•

Technical Note—Identifying Forecast Horizons in Nonhomogeneous Markov Decision Processes

[...]

Wallace J. Hopp

01 Apr 1989-Operations Research

TL;DR: A procedure for identifying forecast horizons in nonhomogeneous Markov decision processes, based on convergence results for relative value functions, is developed and a closed form expression for computing sufficiently long horizons to guarantee epsilon optimality is presented.

...read moreread less

Abstract: A procedure for identifying forecast horizons in nonhomogeneous Markov decision processes, based on convergence results for relative value functions, is developed. Two different algorithmic implementations of this procedure are discussed, and a closed form expression for computing sufficiently long horizons to guarantee epsilon optimality is presented.

...read moreread less

Book•

Optimization of stochastic systems : topics in discrete-time dynamics

[...]

正直青木

01 Jan 1989

TL;DR: Time Series and Econometric Models: Examples.

...read moreread less

Abstract: Deterministic Models and Their Control Problems. Stochastic Models. Stochastic Control Problems. Time Series and Econometric Models: Examples. Estimation. Convergence Questions. Adaptive Control Systems and Bayesian Optimal Control Problems. Linear Rational Expectations Models. Approximations in Sequential Decision Processes. References. Appendix: Markov Processes.

...read moreread less

Journal Article•DOI•

Necessary conditions for the optimality equation in average-reward Markov decision processes

[...]

Rolando Cavazos-Cadena¹•Institutions (1)

Universidad Autónoma Agraria Antonio Narro¹

01 Jan 1989-Applied Mathematics and Optimization

TL;DR: In this article, an average-reward Markov decision process (MDP) with discretetime parameter, denumerable state space, and bounded reward function is considered, and necessary and sufficient conditions are given so that the optimality equations have a bounded solution with an additional property.

...read moreread less

Abstract: An average-reward Markov decision process (MDP) with discretetime parameter, denumerable state space, and bounded reward function is considered. With such a model, we associate a family of MDPs. Then, we determinenecessary conditions for the existence of a bounded solution to the optimality equation for each one of the models in the family. Moreover,necessary andsufficient conditions are given so that the optimality equations have a bounded solution with an additional property.

...read moreread less

Proceedings Article•DOI•

Control of multi-service loss networks

[...]

Zbigniew Dziong¹, L.G. Mason¹•Institutions (1)

Institut national de la recherche scientifique¹

13 Dec 1989

TL;DR: A state-dependent routing policy for a multi-service circuit-switched network is synthesized and it is shown that the proposed model provides good traffic efficiency and automatic flow control, and that by means of the call reward parameters one can almost independently control the grade of service of each call class.

...read moreread less

Abstract: A state-dependent routing policy for a multi-service circuit-switched network is synthesized. To meet different requirements, the objective function is defined as the mean value of reward from the network. The theory of Markov decision processes is applied to find the optimal routing policy. It is shown that under the link independence assumption the problem can be decomposed into a set of link analysis problems. In this approach the optimal decision is a function of state-dependent link shadow prices, which are interpreted as prices for using each link from the path. The approach is implementable even for large systems if certain approximations are used. It is shown that the proposed model provides good traffic efficiency and automatic flow control, and that by means of the call reward parameters one can almost independently control the grade of service of each call class. >

...read moreread less

Journal Article•DOI•

Generalized inverses in discrete time Markov decision process

[...]

Bernard F. Lamond, Martin L. Puterman

03 Jan 1989-SIAM Journal on Matrix Analysis and Applications

TL;DR: In this article, a self-contained approach based on the Drazin generalized inverse is used to derive many basic results in discrete time, finite state Markov decision processes, including the average reward evaluation equations, Laurent series expansions, as well as the finite test for Blackwell optimality.

...read moreread less

Abstract: A new self-contained approach based on the Drazin generalized inverse is used to derive many basic results in discrete time, finite state Markov decision processes. A product form representation for the transition matrix of a stationary policy gives new derivations of the average reward evaluation equations, Laurent series expansions, as well as the finite test for Blackwell optimality. This representation also suggests new computational methods.

...read moreread less

Proceedings Article•DOI•

On partially observable Markov decision processes with an average cost criterion

[...]

E. Fernandez-Gaucherand, Ari Arapostathis, S.I. Marcus

13 Dec 1989

TL;DR: In this paper, the authors considered partially observable Markov decision processes with finite or countable state and observation spaces and finite control space, and provided sufficient conditions for a bounded solution to the average-cost optimality equation to exist.

...read moreread less

Abstract: Consideration is given to partially observable Markov decision processes with finite or countable (core) state and observation spaces and finite control space. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite control space but with an uncountable state space, namely, the space of probability distributions of the original core state space. It is observed that some characteristics induced in the original problem due to the finiteness, or countability, of the spaces involved are retained by the equivalent problem. Sufficient conditions are derived for a bounded solution to the average-cost optimality equation to exist. These results are illustrated in the context of machine replacement problems. Structural properties for average-cost policies are obtained for a two-state replacement problem and are similar to available results for discount optimal policies. The set of assumptions used seems to be significantly less restrictive than others currently available. >

...read moreread less

Journal Article•DOI•

Linear programming in tector criterion markov and semi-Markov decision processes

[...]

J. Novák

01 Jan 1989-Optimization

TL;DR: Optimality problems in infinite horizon, discrete time, vector criterion Markov and semi-Markov decision processes are expressed as standard problems of multi-objective linear programming as discussed by the authors, and methods for solving these problems are overviewed and simple numerical examples are given.

...read moreread less

Abstract: Optimality problems in infinite horizon, discrete time, vector criterion Markov and semi-Markov decision processes are expressed as standard problems of multiobjective linear programming. Processes with discounting, absorbing processes and completely ergodie processes without discounting are investigated. The common properties and special structure of derived multiobjective linear programming problems are overviewed. Computational simplicities associated with these problems in comparison with general multiobjective linear programming problems are discussed. Methods for solving these problems are overviewed and simple numerical examples are given.

...read moreread less

Proceedings Article•DOI•

Optimal solutions in weakly coupled multiple decision maker Markov chains with nonclassical information

[...]

R. Srikant¹, Tamer Basar¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

13 Dec 1989

TL;DR: In this paper, it was shown that if each block is controlled by only one agent, then it is possible to obtain policies arbitrarily close to the optimal control policy by making use of the fact that the coupling between the blocks is weak.

...read moreread less

Abstract: For Markov chains controlled by a team of agents there is no generally applicable method for obtaining the optimal control policy if the delay in information sharing between the agents is more than one-step. the authors consider such a problem for a Markov chain whose transition probability matrix consists of blocks, with the coupling between the blocks being on the order of epsilon , where epsilon is a small parameter. It is shown that if each block is controlled by only one agent, then it is possible to obtain policies arbitrarily close to the optimal control policy by making use of the fact that the coupling between the blocks is weak. The authors present a complete set of results for the finite-horizon case and discuss possible extensions to the finite-horizon case. >

...read moreread less

Journal Article•DOI•

Dynamic Probabilistic Decision Processes

[...]

Photios G. Ioannou

01 Jun 1989-Journal of Construction Engineering and Management-asce

TL;DR: In this paper, the authors present a general model for the formulation and solution of the risk-sensitive dynamic decision problem that maximizes the certain equivalent of the discounted rewards of a time-varying Markov decision process.

...read moreread less

Abstract: This paper presents a general model for the formulation and solution of the risk‐sensitive dynamic decision problem that maximizes the certain equivalent of the discounted rewards of a time‐varying Markov decision process. The problem is solved by applying the principle of optimality and stochastic dynamic programming to the immediate rewards and the certain equivalent associated with the remaining transitions of a time‐varying Markov process over a finite or infinite time horizon, under the assumptions of constant risk aversion and discounting of future cash flows. The solution provides transient and stationary optimal decision policies that depend on the presence or absence of discounting. The construction equipment replacement problem serves as an example application of the model to illustrate the solution methodology and the sensitivity of the optimal policy to the discount factor and the degree of risk aversion.

...read moreread less

Journal Article•DOI•

Forecast horizon in nonstationary Markov decision problems

[...]

Ryszarda Rempała¹•Institutions (1)

Polish Academy of Sciences¹

01 Jan 1989-Optimization

TL;DR: In this article, the authors extend the concept of decision and forecast horizons from classes of stationary to classes of nonstationary Markov decision problems and obtain the horizons explicitly for a family of inventory models.

...read moreread less

Abstract: The paper extends the concept of decision and forecast horizons from classes of stationary to classes of nonstationary Markov decision problems. The horizons are explicitly obtained for a family of inventory models. The family is indexed by nonstationary Markov chains and deterministic sequences. For the proof only reference to simlier work on the stationary case is made.

...read moreread less

Proceedings Article•DOI•

Variability sensitive Markov decision processes

[...]

Melike Baykal-Gürsoy¹, Keith W. Ross¹•Institutions (1)

Rutgers University¹

13 Dec 1989

TL;DR: In this paper, a stationary policy maximizes one of these criteria, namely, the expected long-run average variability, and an algorithm that produces such an optimal stationary policy is given.

...read moreread less

Abstract: Time-average Markov decision processes with finite state and action spaces are considered. Several definitions of variability are introduced and compared. It is shown that a stationary policy maximizes one of these criteria, namely, the expected long-run average variability. An algorithm that produces such an optimal stationary policy is given. >

...read moreread less