scispace - formally typeset
Search or ask a question

Showing papers on "Markov decision process published in 1974"


Book
01 Jan 1974

213 citations


Journal ArticleDOI
TL;DR: In this paper, a method is introduced with which Markov models of dynamic programming can be transformed and which preserves the Markov property, which applies to relatively general sets of states.
Abstract: If a set of states is given in a problem of dynamic programming in which each state can be observed only partially, the given model is generally transformed into a new model with completely observed states. In this article a method is introduced with which Markov models of dynamic programming can be transformed and which preserves the Markov property. The method applies to relatively general sets of states.

89 citations


Journal ArticleDOI
TL;DR: An example is given which demonstrates that using a decision theory analysis for the basic chance-constrained model of stochastic linear programming may lead to an apparent dilemma, namely, 0 > EVSI > EVPI.
Abstract: An example is given which demonstrates that using a decision theory analysis for the basic chance-constrained model of stochastic linear programming may lead to an apparent dilemma, namely, 0 > EVSI > EVPI. The problem is discussed and a resolution suggested.

32 citations


Journal ArticleDOI
TL;DR: This paper examines real-time decision rules for a U.S. Air Force inventory system where items are repaired rather than “used up” and a heuristic rule is presented, justified theoretically by showing that the rule is optimal for a modified model.
Abstract: This paper examines real-time decision rules for a U.S. Air Force inventory system where items are repaired rather than “used up.” The problem is to decide which user in the system has the greatest need for the newly available inventory items coming out of repair. The system is modeled as a Markov decision process and a heuristic rule is presented. This rule, the Transportation Time Look Ahead policy, is justified theoretically by showing that the rule is optimal for a modified model. Thus we have a theoretical justification of a decision rule in a large scale dynamic programming application.

29 citations


Journal ArticleDOI
TL;DR: For continuous time Markov decision chains of finite duration, this article showed that the vector of maximal total rewards, less a linear average-return term, converges as the duration $t \rightarrow \infty.
Abstract: For continuous time Markov decision chains of finite duration, we show that the vector of maximal total rewards, less a linear average-return term, converges as the duration $t \rightarrow \infty$. We then show that there are policies which are both simultaneously $\varepsilon$-optimal for all durations $t$ and are stationary except possibly for a final, finite segment. Further, the length of this final segment depends on $\varepsilon$, but not on $t$ for large enough $t$, while the initial stationary part of the policy is independent of both $\varepsilon$ and $t$.

21 citations



Journal ArticleDOI
TL;DR: In this paper, the authors consider countable state, finite action dynamic programming problems with bounded rewards and show that a policy is optimal if it maximizes the expected discounted total return for all values of the discount factor sufficiently close to 1.
Abstract: We consider countable state, finite action dynamic programming problems with bounded rewards. Under Blackwell's optimality criterion, a policy is optimal if it maximizes the expected discounted total return for all values of the discount factor sufficiently close to 1. We give an example where a policy meets that optimality criterion, but is not optimal with respect to Derman's average cost criterion. We also give conditions under which this pathology cannot occur.

18 citations


Journal ArticleDOI
TL;DR: A theory of the design process, based on an analogy with the well-known Markov process in probability theory, is developed and applied to the classic highway location problem first discussed by Alexander and Manheim (1962).
Abstract: A theory of the design process, based on an analogy with the well-known Markov process in probability theory, is developed and applied in this paper. Design is considered as a process of averaging a set of conflicting factors, and the sequential averaging characteristic of the Markov process is presented algebraically with an emphasis upon the weight of each factor in the final solution. A classification of Markov chains and an interpretation using linear graph theory serves to delimit the set of relevant design problems, and a particular group of such problems based on symmetric structures is specifically described. A second analogy between the choice of design method and the theory of Markov decision processes exists, and the problem of selecting an optimal method using this decision theory is solved using a dynamic programming algorithm due to Howard (1960). The theory is then applied to the classic highway location problem first discussed by Alexander and Manheim (1962), and some comparisons between the different results are attempted. Finally the place of the theory in the wider context of design is briefly alluded to.

18 citations



01 Jan 1974
TL;DR: Some new algorithms for solving discounted Markov decision problems are introduced while furthermore it will be shown how the several S.A.A.) algorithms may be combined.
Abstract: Successive Approximation (S.A.) methods, for solving discounted Markov decision problems, have been developed to avoid the extensive computations that are connected with linear programming and policy iteration techniques for solving large scaled problems. Several authors give such an S.A. algorithm. In this paper we introduce some new algorithms while furthermore it will be shown how the several S.A. algorithms may be combined. For each algorithm converging sequences of upper and lower bounds for the optimal value will be given.

16 citations


Journal ArticleDOI
TL;DR: A stationary discrete dynamic programming model that is a generalization of the finite state and finite action Markov programming problem and allows the parameters of the problem to be random variables to indicate when the expected values or these random variables are certainty equivalents.
Abstract: This paper considers a stationary discrete dynamic programming model that is a generalization of the finite state and finite action Markov programming problem. We specify conditions under which an optimal stationary linear decision rule exists and show how this optimal policy can be calculated using linear programming, policy iteration, or value iteration. In addition we allow the parameters of the problem to be random variables and indicate when the expected values or these random variables are certainty equivalents.


Journal ArticleDOI
TL;DR: This work characterize decision rules, called preferred, which may be used in the initially stationary part of nearly optimal policies, and then, under conditions involving state recurrence and accessibility, consider finding such rules.
Abstract: Motivated by a planning horizon result for continuous time Markov decision chains, we study decision rules, called preferred, which may be used in the initially stationary part of nearly optimal policies We characterize these rules and then, under conditions involving state recurrence and accessibility, consider finding such rules We also discuss the connection between preferred rules and certain discounted process decision rules, and the role of preferred rules in optimal policies


Journal ArticleDOI
TL;DR: In this paper, the problem of computing, by iterative methods, optimal policies for Markov decision processes was considered and the policies computed are optimal for all sufficiently small interest rates. But the problem was not addressed in this paper.
Abstract: This paper considers the problem of computing, by iterative methods, optimal policies for Markov decision processes. The policies computed are optimal for all sufficiently small interest rates.

01 Jan 1974
TL;DR: In this paper, the human capital associated with an organization is estimated by estimating the Human Capital Associated with an Organization (HCA) with respect to the number of employees in the organization.
Abstract: (1975). Estimating the Human Capital Associated with an Organization. Accounting and Business Research: Vol. 6, No. 21, pp. 48-56.

Book ChapterDOI
01 Jan 1974
TL;DR: In this article, a class of Markovian decision processes is characterized using a weak row sum criterion, which is shown to be necessary and sufficient for the absolute convergence of present values, and a modified successive value iteration procedure is obtained.
Abstract: A class of Markovian decision processes is characterized using a weak row sum criterion. The criterion is shown to be necessary and sufficient for the absolute convergence of present values. A modified successive value iteration procedure is obtained.

01 Jan 1974
TL;DR: This paper considers a completely ergodic Markov decision process with finite state and decision spaces using the average return per unit time criterion and derives an algorithm which approximates the optimal solution.
Abstract: In this paper we consider a completely ergodic Markov decision process with finite state and decision spaces using the average return per unit time criterion. An algorithm is derived which approximates the optimal solution. It will be shown that this algorithm is finite and supplies upper and lower bounds for the maximal average return and a near optimal policy with average return between these bounds.


Journal ArticleDOI
TL;DR: In this article, the question of whether there always exists an initially stationary optimal policy for continuous time Markov decision chains was raised and the authors gave an example in which there is do such policy.
Abstract: Results in another paper in this issue may raise the question of whether for finite state and action, continuous time parameter Markov decision chains there always exists an initially stationary optimal policy. We give an example in which there is do such policy.

Journal ArticleDOI
TL;DR: In this article, an expression for the expected return for each available action is developed, as a perturbation to the basic process, and the optimal action and value of the forecast are obtained by combining a Policy Iteration solution of the imperfect information process and evaluation of the policies for the perturbed process.
Abstract: The Markov Decision Process formulation and its application to processes in which there is uncertainty as to process state, or imperfect state information, are reviewed. The question of how to determine an optimal action if some form of intelligence or “forecast” of the process state for a single process stage is posed. An expression for the expected return for each available action is developed, as a perturbation to the basic process. The optimal action and value of the forecast are obtained by combining a Policy Iteration solution of the imperfect information process and evaluation of the policies for the perturbed process based on these imperfect information process parameters.


Proceedings ArticleDOI
01 Jan 1974
TL;DR: A program for simulating Markov processes (MARKOV) intended for use by someone with programming experience, which establishes the initial state of the process, the present state, the previous state, and the number of transitions from each state to all states.
Abstract: This paper describes in language free flow diagram form, a program for simulating Markov processes (MARKOV) intended for use by someone with programming experience. The program establishes the initial state of the process (if it is unknown), the present state, the previous state, and the number of transitions from each state to all states. These parameters can be used to determine various characteristics of Markov processes of interest to the systems analyst.