scispace - formally typeset
Search or ask a question

Showing papers on "Markov decision process published in 1991"


Book
01 Aug 1991
TL;DR: In this paper, the authors present a model-based approach to solving linear programming problems, which is based on the Gauss-Jordan method for solving systems of linear equations, and the Branch-and-Bound method for solving mixed integer programming problems.
Abstract: 1. INTRODUCTION TO MODEL BUILDING. An Introduction to Modeling. The Seven-Step Model-Building Process. Examples. 2. BASIC LINEAR ALGEBRA. Matrices and Vectors. Matrices and Systems of Linear Equations. The Gauss-Jordan Method for Solving Systems of Linear Equations. Linear Independence and Linear Dependence. The Inverse of a Matrix. Determinants. 3. INTRODUCTION TO LINEAR PROGRAMMING. What is a Linear Programming Problem? The Graphical Solution of Two-Variable Linear Programming Problems. Special Cases. A Diet Problem. A Work-Scheduling Problem. A Capital Budgeting Problem. Short-term Financial Planning. Blending Problems. Production Process Models. Using Linear Programming to Solve Multiperiod Decision Problems: An Inventory Model. Multiperiod Financial Models. Multiperiod Work Scheduling. 4. THE SIMPLEX ALGORITHM AND GOAL PROGRAMMING. How to Convert an LP to Standard Form. Preview of the Simplex Algorithm. The Simplex Algorithm. Using the Simplex Algorithm to Solve Minimization Problems. Alternative Optimal Solutions. Unbounded LPs. The LINDO Computer Package. Matrix Generators, LINGO, and Scaling of LPs. Degeneracy and the Convergence of the Simplex Algorithm. The Big M Method. The Two-Phase Simplex Method. Unrestricted-in-Sign Variables. Karmarkar"s Method for Solving LPs. Multiattribute Decision-Making in the Absence of Uncertainty: Goal Programming. Solving LPs with Spreadsheets. 5. SENSITIVITY ANALYSIS: AN APPLIED APPROACH. A Graphical Introduction to Sensitivity Analysis. The Computer and Sensitivity Analysis. Managerial Use of Shadow Prices. What Happens to the Optimal z-value if the Current Basis is No Longer Optimal? 6. SENSITIVITY ANALYSIS AND DUALITY. A Graphical Introduction to Sensitivity Analysis. Some Important Formulas. Sensitivity Analysis. Sensitivity Analysis When More Than One Parameter is Changed: The 100% Rule. Finding the Dual of an LP. Economic Interpretation of the Dual Problem. The Dual Theorem and Its Consequences. Shadow Prices. Duality and Sensitivity Analysis. 7. TRANSPORTATION, ASSIGNMENT, AND TRANSSHIPMENT PROBLEMS. Formulating Transportation Problems. Finding Basic Feasible Solutions for Transportation Problems. The Transportation Simplex Method. Sensitivity Analysis for Transportation Problems. Assignment Problems. Transshipment Problems. 8. NETWORK MODELS. Basic Definitions. Shortest Path Problems. Maximum Flow Problems. CPM and PERT. Minimum Cost Network Flow Problems. Minimum Spanning Tree Problems. The Network Simplex Method. 9. INTEGER PROGRAMMING. Introduction to Integer Programming. Formulation Integer Programming Problems. The Branch-and-Bound Method for Solving Pure Integer Programming Problems. The Branch-and-Bound Method for Solving Mixed Integer Programming Problems. Solving Knapsack Problems by the Branch-and-Bound Method. Solving Combinatorial Optimization Problems by the Branch-and-Bound Method. Implicit Enumeration. The Cutting Plane Algorithm. 10. ADVANCED TOPICS IN LINEAR PROGRAMMING. The Revised Simplex Algorithm. The Product Form of the Inverse. Using Column Generation to Solve Large-Scale LPs. The Dantzig-Wolfe Decomposition Algorithm. The Simplex Methods for Upper-Bounded Variables. Karmarkar"s Method for Solving LPs. 11. NONLINEAR PROGRAMMING. Review of Differential Calculus. Introductory Concepts. Convex and Concave Functions. Solving NLPs with One Variable. Golden Section Search. Unconstrained Maximization and Minimization with Several Variables. The Method of Steepest Ascent. Lagrange Multiples. The Kuhn-Tucker Conditions. Quadratic Programming. Separable Programming. The Method of Feasible Directions. Pareto Optimality and Tradeoff Curves. 12. REVIEW OF CALCULUS AND PROBABILITY. Review of Integral Calculus. Differentiation of Integrals. Basic Rules of Probability. Bayes" Rule. Random Variables. Mean Variance and Covariance. The Normal Distribution. Z-Transforms. Review Problems. 13. DECISION MAKING UNDER UNCERTAINTY. Decision Criteria. Utility Theory. Flaws in Expected Utility Maximization: Prospect Theory and Framing Effects. Decision Trees. Bayes" Rule and Decision Trees. Decision Making with Multiple Objectives. The Analytic Hierarchy Process. Review Problems. 14. GAME THEORY. Two-Person Zero-Sum and Constant-Sum Games: Saddle Points. Two-Person Zero-Sum Games: Randomized Strategies, Domination, and Graphical Solution. Linear Programming and Zero-Sum Games. Two-Person Nonconstant-Sum Games. Introduction to n-Person Game Theory. The Core of an n-Person Game. The Shapley Value. 15. DETERMINISTIC EOQ INVENTORY MODELS. Introduction to Basic Inventory Models. The Basic Economic Order Quantity Model. Computing the Optimal Order Quantity When Quantity Discounts Are Allowed. The Continuous Rate EOQ Model. The EOQ Model with Back Orders Allowed. Multiple Product Economic Order Quantity Models. Review Problems. 16. PROBABILISTIC INVENTORY MODELS. Single Period Decision Models. The Concept of Marginal Analysis. The News Vendor Problem: Discrete Demand. The News Vendor Problem: Continuous Demand. Other One-Period Models. The EOQ with Uncertain Demand: the (r, q) and (s,S models). The EOQ with Uncertain Demand: the Service Level Approach to Determining Safety Stock Level. Periodic Review Policy. The ABC Inventory Classification System. Exchange Curves. Review Problems. 17. MARKOV CHAINS. What is a Stochastic Process. What is a Markov Chain? N-Step Transition Probabilities. Classification of States in a Markov Chain. Steady-State Probabilities and Mean First Passage Times. Absorbing Chains. Work-Force Planning Models. 18.DETERMINISTIC DYNAMIC PROGRAMMING. Two Puzzles. A Network Problem. An Inventory Problem. Resource Allocation Problems. Equipment Replacement Problems. Formulating Dynamic Programming Recursions. The Wagner-Whitin Algorithm and the Silver-Meal Heuristic. Forward Recursions. Using Spreadsheets to Solve Dynamic Programming Problems. Review Problems. 19. PROBABILISTIC DYNAMIC PROGRAMMING. When Current Stage Costs are Uncertain but the Next Period"s State is Certain. A Probabilistic Inventory Model. How to Maximize the Probability of a Favorable Event Occurring. Further Examples of Probabilistic Dynamic Programming Formulations. Markov Decision Processes. Review Problems. 20. QUEUING THEORY. Some Queuing Terminology. Modeling Arrival and Service Processes. Birth-Death Processes. M/M/1/GD/o/o Queuing System and the Queuing Formula L=o W, The M/M/1/GD/o Queuing System. The M/M/S/ GD/o/o Queuing System. The M/G/ o/GD/oo and GI/G/o/GD/o/oModels. The M/ G/1/GD/o/o Queuing System. Finite Source Models: The Machine Repair Model. Exponential Queues in Series and Opening Queuing Networks. How to Tell whether Inter-arrival Times and Service Times Are Exponential. The M/G/S/GD/S/o System (Blocked Customers Cleared). Closed Queuing Networks. An Approximation for the G/G/M Queuing System. Priority Queuing Models. Transient Behavior of Queuing Systems. Review Problems. 21.SIMULATION. Basic Terminology. An Example of a Discrete Event Simulation. Random Numbers and Monte Carlo Simulation. An Example of Monte Carlo Simulation. Simulations with Continuous Random Variables. An Example of a Stochastic Simulation. Statistical Analysis in Simulations. Simulation Languages. The Simulation Process. 22.SIMULATION WITH PROCESS MODEL. Simulating an M/M/1 Queuing System. Simulating an M/M/2 System. A Series System. Simulating Open Queuing Networks. Simulating Erlang Service Times. What Else Can Process Models Do? 23. SPREADSHEET SIMULATION WITH @RISK. Introduction to @RISK: The Newsperson Problem. Modeling Cash Flows From A New Product. Bidding Models. Reliability and Warranty Modeling. Risk General Function. Risk Cumulative Function. Risktrigen Function. Creating a Distribution Based on a Point Forecast. Forecasting Income of a Major Corporation. Using Data to Obtain Inputs For New Product Simulations. Playing Craps with @RISK. Project Management. Simulating the NBA Finals. 24. FORECASTING. Moving Average Forecasting Methods. Simple Exponential Smoothing. Holt"s Method: Exponential Smoothing with Trend. Winter"s Method: Exponential Smoothing with Seasonality. Ad Hoc Forecasting, Simple Linear Regression. Fitting Non-Linear Relationships. Multiple Regression. Answers to Selected Problems. Index.

1,790 citations


Journal ArticleDOI
TL;DR: Several approximation methodologies are reviewed that have the potential to generate computationally feasible, high precision solutions for solving discrete-time, finite POMDPs over both finite and infinite horizons.
Abstract: A partially observed Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. The significant applied potential for such processes remains largely unrealized, due to an historical lack of tractable solution methodologies. This paper reviews some of the current algorithmic alternatives for solving discrete-time, finite POMDPs over both finite and infinite horizons. The major impediment to exact solution is that, even with a finite set of internal system states, the set of possible information states is uncountably infinite. Finite algorithms are theoretically available for exact solution of the finite horizon problem, but these are computationally intractable for even modest-sized problems. Several approximation methodologies are reviewed that have the potential to generate computationally feasible, high precision solutions.

610 citations


Journal ArticleDOI
TL;DR: This paper explains how to approximate the state space by a finite grid of points, and use that grid to construct upper and lower value function bounds, generate approximate nonstationary and stationary policies, and bound the value loss relative to optimal for using these policies in the decision problem.
Abstract: A partially observed Markov decision process (POMDP) is a sequential decision problem where information concerning parameters of interest is incomplete, and possible actions include sampling, surveying, or otherwise collecting additional information. Such problems can theoretically be solved as dynamic programs, but the relevant state space is infinite, which inhibits algorithmic solution. This paper explains how to approximate the state space by a finite grid of points, and use that grid to construct upper and lower value function bounds, generate approximate nonstationary and stationary policies, and bound the value loss relative to optimal for using these policies in the decision problem. A numerical example illustrates the methodology.

343 citations


Journal ArticleDOI
TL;DR: This article considers adaptive control architectures that integrate active sensory-motor systems with decision systems based on reinforcement learning and shows that perceptual aliasing destabilizes existing reinforcement learning algorithms with respect to the optimal decision policy.
Abstract: This article considers adaptive control architectures that integrate active sensory-motor systems with decision systems based on reinforcement learning. One unavoidable consequence of active perception is that the agent's internal representation often confounds external world states. We call this phoenomenon perceptual aliasing and show that it destabilizes existing reinforcement learning algorithms with respect to the optimal decision policy. We then describe a new decision system that overcomes these difficulties for a restricted class of decision problems. The system incorporates a perceptual subcycle within the overall decision cycle and uses a modified learning algorithm to suppress the effects of perceptual aliasing. The result is a control architecture that learns not only how to solve a task but also where to focus its visual attention in order to collect necessary sensory information.

310 citations


Journal ArticleDOI
TL;DR: Several computational procedures presented are convergence accelerating variants of, or approximations to, the Smallwood-Sondik algorithm, which generalizes the standard, completely observed Markov decision process, and new research directions involving heuristic search.
Abstract: We survey several computational procedures for the partially observed Markov decision process (POMDP) that have been developed since the Monahan survey was published in 1982. The POMDP generalizes the standard, completely observed Markov decision process by permitting the possibility that state observations may be noise-corrupted and/or costly. Several computational procedures presented are convergence accelerating variants of, or approximations to, the Smallwood-Sondik algorithm. Finite-memory suboptimal design results are reported, and new research directions involving heuristic search are discussed.

188 citations


Book
01 Jan 1991
TL;DR: In this article, a review of controlled Markov chains is presented for the discounted cost problem with finite time control problems and the existence results of the Kumar-Becker-Lin scheme.
Abstract: Markov chains - a review controlled Markov chains the discounted cost problem finite time control problems ergodic - existence results ergodic control - dynamic programming multiobject control problems control under partial observations adaptive control - the raw self-tuner adaptive control - the Kumar-Becker-Lin scheme concluding remarks. Appendices: spaces of probability measures.

137 citations


Journal ArticleDOI
TL;DR: In this paper, a multistage production/inventory system is modelled as a Markov Decision Process (MDP), combining of just-in-time (pull) and MRP (push) policies are used as alternatives in the MDP.
Abstract: A multistage production/inventory system is modelled. The system structure, which has the form of an assembly network, is abstracted from the production process of a typical integrated iron and steel works. The system is modelled as a Markov Decision Process (MDP). Combinations of Just-In-Time (pull) and MRP (push) policies are used as alternatives in the MDP. Optimal hybrid strategies are developed. In part II, we extend our observations to a more general case.

108 citations


Journal ArticleDOI
TL;DR: It is established that if there exists a policy that meets the constraint that the long-run average cost be no greater than a given value with probability one, then there exists an e-optimal stationary policy.
Abstract: We consider finite-state finite-action Markov decision processes which accumulate both a reward and a cost at each decision epoch. We study the problem of finding a policy that maximizes the expected long-run average reward subject to the constraint that the long-run average cost be no greater than a given value with probability one. We establish that if there exists a policy that meets the constraint, then there exists an e-optimal stationary policy. Furthermore, an algorithm is outlined to locate the e-optimal stationary policy. The proof of the result hinges on a decomposition of the state space into maximal recurrent classes and a set of transient states.

90 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider a controlled Markov chain with countable state and action spaces, and define a set of conditional frequencies, one for each state-action pair, describing the relative number of uses of each action.
Abstract: Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each state-action pair, describing for each state the relative number of uses of each action. These "conditional frequencies," which are defined pathwise, are shown to determine the "state-action frequencies" that, in the finite case, are known to determine the costs. This is extended to the countable case, allowing for unbounded costs. The space of frequencies is shown to be compact and convex, and the extreme points are identified with stationary deterministic policies. Conditions under which the search for optimality in several optimization problems may be restricted to stationary policies are given. These problems include the standard Markov decision process, as well as constrained optimization (both in terms of average cost functionals) and variability-sensitive optimization. An application to a queueing problem is given, where these results imply the existence and explicit computation of optimal policies in constrained optimization problems. The pathwise definition of the conditional frequencies implies that their values can be controlled directly; moreover, they depend only on the limiting behavior of the control. This has immediate application to adaptive control of Markov chains, including adaptive control under constraints.

76 citations


Book
01 Jan 1991

72 citations


Journal ArticleDOI
TL;DR: Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies.
Abstract: We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.

Journal ArticleDOI
TL;DR: In this article, the authors describe virtually all the recurrence conditions used heretofore for Markov decision processes with Borel state and action spaces, which include some forms of mixing and contraction properties, Doeblin's condition, Harris recurrence, strong ergodicity, and the existence of bounded solutions to the optimality equation for average reward processes.
Abstract: This paper describes virtually all the recurrence conditions used heretofore for Markov decision processes with Borel state and action spaces, which include some forms of mixing and contraction properties, Doeblin's condition, Harris recurrence, strong ergodicity, and the existence of bounded solutions to the optimality equation for average reward processes The aim is to establish (when possible) implications and equivalences between these conditions

Journal ArticleDOI
TL;DR: The optimal value and optimal policy for the expected average cost are obtained as limits of the dicounted case, as the discount factor goes to one, and the convergence of the optimalvalue for the discounted constrained finite horizon problem to the optimal value of the corresponding infinite horizon problem is established.
Abstract: We consider the optimization of finite-state, finite-action Markov decision processes under constraints. Costs and constraints are of the discounted or average type, and possibly finite-horizon. We investigate the sensitivity of the optimal cost and optimal policy to changes in various parameters. We relate several optimization problems to a generic linear program, through which we investigate sensitivity issues. We establish conditions for the continuity of the optimal value in the discount factor. In particular, the optimal value and optimal policy for the expected average cost are obtained as limits of the dicounted case, as the discount factor goes to one. This generalizes a well-known result for the unconstrained case. We also establish the continuity in the discount factor for certain non-stationary policies. We then discuss the sensitivity of optimal policies and optimal values to small changes in the transition matrix and in the instantaneous cost functions. The importance of the last two results is related to the performance of adaptive policies for constrained MDP under various cost criteria [3,5]. Finally, we establish the convergence of the optimal value for the discounted constrained finite horizon problem to the optimal value of the corresponding infinite horizon problem.

Proceedings Article
14 Jul 1991
TL;DR: This paper outlines an approach to the efficient construction of plans containing explicit sensing operations with the objective of finding nearly optimal cost effective plans with respect to both action and sensing.
Abstract: A primary problem facing real-world robots is the question of which sensing actions should be performed at any given time. It. is important that an agent be economical with its allocation of sensing when sensing is expensive or when there are many possible sensing operations available. Sensing is rational when the expected utility from the information obtained outweighs the execution cost of the sensing operation itself. This paper outlines an approach to the efficient construction of plans containing explicit sensing operations with the objective of finding nearly optimal cost effective plans with respect to both action and sensing. The scheduling of sensing operations, in addition to the usual scheduling of physical actions, potentially results in an enornous increase in the computational complexity of planning. Our approach avoids this pitfall through strict adherence to a static sensing policy. The approach, based upon the Markov Decision Process paradigm, handles a significant amount of uncertainty in the outcomes of actions.

Journal ArticleDOI
TL;DR: In this article, a singularly perturbed Markov decision process with the limiting average cost criterion is considered, and the authors prove the validity of the limit control principle, which states that an optimal solution to the perturbed MDP can be approximated by a nonlinear program in the space of long-run state action frequencies.
Abstract: In this paper we consider a singularly perturbed Markov decision process with the limiting average cost criterion. We assume that the underlying process is composed ofn separate irreducible processes, and that the small perturbation is such that it “unites” these processes into a single irreducible process. We formulate the underlying control problem for the singularly perturbed MDP, and call it the “limit Markov control problem” (limit MCP). We prove the validity of the “the limit control principle” which states that an optimal solution to the perturbed MDP can be approximated by an optimal solution of the limit MCP for any sufficiently small perturbation. We also demonstrate that the limit Markov control problem is equivalent to a suitably constructed nonlinear program in the space of long-run state-action frequencies. This approach combines the solutions of the original separated irreducible MDPs with the stationary distribution of a certain “aggregated MDP” and creates a framework for future algorithmic approaches.

Journal Article
TL;DR: A modified method using the Markov decision process (MDP), which overcomes most of the drawbacks of the LOS-based systems, is described and introduced measures of performance benefits that are less subjective than those used in the NCHRP LOS model.
Abstract: Systems based on level of service (LOS) currently implemented to manage highway maintenance use extensive subjective data. Collection of these data is tedious and expensive and the inherent uncertainties in the data render the results imprecise. A modified method using the Markov decision process (MDP), which overcomes most of the drawbacks of the LOS-based systems, is described. The adoption of the MDP is consistent with progressive evolution in the field of highway maintenance management. It introduces measures of performance benefits that are less subjective than those used in the NCHRP LOS model, which rely on attributes and utility functions. The modified model uses three types of key input data: transition probabilities, costs, and relative-importance weights. The transition probabilities are computed analytically using sample deterioration models and quality standards. An approach is described for computing the cost of each alternative from historical data. An analytic approach is also described for computing relative-importance weights using simple ranking of and comparison scores for the highway elements. This method was tested with 58 highway elements in 12 strata and 3 levels of service each. The resulting problem, which had 2,088 variables and 697 constraints, required less than 15 min on an IBM PC using an off-the-shelf linear programming package. The results of the test were consistent with the input data and demonstrated that the objectives set for the method were being met. Although the method was tested with mostly roadside elements, it can generally be used with any LOS-based system.

Journal ArticleDOI
TL;DR: In this article, the authors show that the conditions in [11] do not imply the existence of a solution to the average cost optimality equation, and they use a simple example to prove that such a result was obtained via an optimality inequality.

Journal ArticleDOI
TL;DR: In this paper, a new criterion of optimality in Markov decision processes is discussed, which is relevant in some production models where a limitation is imposed on the physical output (production quota) or on an input factor (scarce resources).
Abstract: A new criterion of optimality in Markov decision processes is discussed. The objective is to maximize the average net revenue per unit of physical output (or input). The criterion is relevant in some production models where a limitation is imposed on the physical output (production quota) or on an input factor (scarce resources). An obvious application is in dairy cow replacement models under milk quotas. Iterion cycles are presented for ordinary completely ergodic Markov decision processes and for hierarchic Markov processes. The consequences of the new criterion are illustrated by a numerical example. Copyright 1991 by Oxford University Press.

Journal ArticleDOI
TL;DR: In this article, the authors considered countable state Markov decision processes with finite action sets and (possibly) unbounded costs and gave assumptions guaranteeing the convergence of a quantity related to the minimum expectedn-stage cost when the process starts in statei.
Abstract: We deal with countable state Markov decision processes with finite action sets and (possibly) unbounded costs. Assuming the existence of an expected average cost optimal stationary policyf, with expected average costg, when canf andg be found using undiscounted value iteration? We give assumptions guaranteeing the convergence of a quantity related tong−Νn(i), whereΝn(i) is the minimum expectedn-stage cost when the process starts in statei. The theory is applied to a queueing system with variable service rates and to a queueing system with variable arrival parameter.

Journal ArticleDOI
TL;DR: In this article, the authors considered a discrete-time Markov Decision Process (MDP) with Borel state and action spaces X and A, respectively, and the long run expected average cost criterion.

Journal ArticleDOI
TL;DR: Turnpike improvement as discussed by the authors exploits the structure of Markov decision processes with continuous state and action spaces that can be associated with piecewise deterministic control systems and is applicable whenever a turnpike property holds for some associated infinite horizon deterministic Control problem.
Abstract: This paper proposes a numerical technique, called turnpike improvement, for the approximation of the solution of a class of piecewise deterministic control problems typically associated with manufacturing flow control models. This algorithm exploits the structure of Markov decision processes with continuous state and action spaces that can be associated with piecewise deterministic control systems. The numerical method is applicable whenever a turnpike property holds for some associated infinite horizon deterministic control problem. To illustrate the approach, we use a simple model fully studied from an analytic point of view in the literature. We compare the turnpike improvement technique with a direct approximation of the solution of the continuous-time Hamilton-Jacobi dynamic programming equations inspired by Kushner's work. The two approaches agree remarkably on this simple problem. We conclude with a discussion of the relative advantages of the two approaches.

Journal ArticleDOI
Viên Nguyen1
TL;DR: The optimal policy is shown to be a “generalized trunk reservation policy”; in other words, the optimal policy accepts higher-paying customers whenever possible and accepts lower- paying customers only if fewer than c1 servers are busy, where i is the number of busy servers in the overflow queue.
Abstract: This paper discusses an optimal dynamic policy for a queueing system with M servers, no waiting room, and two types of customers. Customer types differ with respect to the reward that is paid on commencement of service, but service times are exponentially distributed with the same mean for both types of customers. The arrival stream of one customer type is generated by a Poisson process, and the other customer type arrives according to the overflow process of an M/M/m/m queue. The objective is to determine a policy for admitting customers to maximize the expected long-run average reward. By posing the problem in the framework of Markov decision processes and exploiting properties of submodular functions, the optimal policy is shown to be a “generalized trunk reservation policy”; in other words, the optimal policy accepts higher-paying customers whenever possible and accepts lower-paying customers only if fewer than c1 servers are busy, where i is the number of busy servers in the overflow queue. Computational issues are also discussed. More specifically, approximations of the overflow process by an interrupted Poisson process and a Poisson process are investigated.

Journal ArticleDOI
TL;DR: In this paper, countable state space Markov decision processes endowed with a (long-run expected) average reward criterion are considered. But the main focus of this paper is the equivalence of average optimality criteria.
Abstract: This paper concerns countable state space Markov decision processes endowed with a (long-run expected)average reward criterion. For these models we summarize and, in some cases,extend some recent results on sufficient conditions to establish the existence of optimal stationary policies. The topics considered are the following: (i) the new assumptions introduced by Sennott in [20–23], (ii)necessary and sufficient conditions for the existence of a bounded solution to the optimality equation, and (iii) equivalence of average optimality criteria. Some problems are posed.

Journal ArticleDOI
TL;DR: In this article, necessary and sufficient conditions for the existence of a bounded solution to the optimality equation arising in Markov decision processes under a long-run, expected average cost criterion are given.

Journal ArticleDOI
TL;DR: The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain and the improvement step is modified to select only unICHain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided.
Abstract: This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard's multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain we recommend use of this algorithm instead of unichain policy iteration.

Journal ArticleDOI
TL;DR: This work considers several applications of two state, finite action, infinite horizon, discrete-time Markov decision processes with partial observations, and shows that in each of these cases the optimal cost function is piecewise linear.
Abstract: We consider several applications of two state, finite action, infinite horizon, discrete-time Markov decision processes with partial observations, for two special cases of observation quality, and show that in each of these cases the optimal cost function is piecewise linear. This in turn allows us to obtain either explicit formulas or simplified algorithms to compute the optimal cost function and the associated optimal control policy. Several examples are presented.

Journal ArticleDOI
TL;DR: In this paper, the theoretical framework of forwards induction/Gittins indexation is used to develop approaches to strategy evaluation for quite general (J, Γ) jobs. And the performance of both forwards induction strategies and a class of quasi-myopic heuristics is assessed.
Abstract: A single machine is available to process a collection J of jobs. The machine is free to switch between jobs at any time, but processing must respect a set Γof precedence constraints. Jobs evolve stochastically and earn rewards as they are processed, not otherwise. The theoretical framework of forwards induction/Gittins indexation is used to develop approaches to strategy evaluation for quite general ( J, Γ). The performance of both forwards induction strategies and a class of quasi-myopic heuristics is assessed.

Journal ArticleDOI
TL;DR: For the problems tested it was found that the criterion of minimum variance is most effective for Markov models while an hybrid use of both criteria yields best results when solving semi-Markov decision problems.

Journal ArticleDOI
TL;DR: In this article, the authors consider finite-state finite-action Markov decision processes which accumulate both a reward and a cost at each decision epoch and study the problem of finding a policy that maximizes the expected expected reward.
Abstract: We consider finite-state finite-action Markov decision processes which accumulate both a reward and a cost at each decision epoch. We study the problem of finding a policy that maximizes the expect...

Journal ArticleDOI
TL;DR: Existing theory based on the Gittins index is extended to give bounds on R(S*)-R(S) for this important class of processes, which are used to model a variety of problems in stochastic resource allocation and in the sequential design of experiments.
Abstract: A class of discounted Markov decision processes (MDPs) is formed by bringing together individual MDPs sharing the same discount rate. These are in competition in the sense that at each decision epoch a single action is chosen from the union of the action sets of the individual MDPs. Such families of competing MDPs have been used to model a variety of problems in stochastic resource allocation and in the sequential design of experiments. Suppose thatS is a stationary strategy for such a family, thatS* is an optimal strategy and thatR(S),R(S*) denote the respective rewards earned. The paper extends (and explains) existing theory based on the Gittins index to give bounds onR(S*)-R(S) for this important class of processes. The procedures are illustrated by examples taken from the fields of stochastic scheduling and research planning.