Showing papers on "Markov decision process published in 1982"

PDF

Open Access

Journal Article•DOI•

State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms

[...]

01 Jan 1982-Management Science

TL;DR: A wide range of models in such areas as quality control, machine maintenance, internal auditing, learning, and optimal stopping are discussed within the POMDP-framework.

...read moreread less

Abstract: This paper surveys models and algorithms dealing with partially observable Markov decision processes. A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process which permits uncertainty regarding the state of a Markov process and allows for state information acquisition. A general framework for finite state and action POMDP's is presented. Next, there is a brief discussion of the development of POMDP's and their relationship with other decision processes. A wide range of models in such areas as quality control, machine maintenance, internal auditing, learning, and optimal stopping are discussed within the POMDP-framework. Lastly, algorithms for computing optimal solutions to POMDP's are presented.

...read moreread less

703 citations

Journal Article•DOI•

The variance of discounted Markov decision processes

[...]

Matthew J. Sobel

01 Dec 1982-Journal of Applied Probability

TL;DR: In this article, the variance and higher moments of the present value of single-stage rewards in a finite Markov decision process are presented for a semi-Markov Decision Process.

...read moreread less

Abstract: Formulae are presented for the variance and higher moments of the present value of single-stage rewards in a finite Markov decision process. Similar formulae are exhibited for a semi-Markov decision process. There is a short discussion of the obstacles to using the variance formula in algorithms to maximize the mean minus a multiple of the standard deviation.

...read moreread less

243 citations

Journal Article•DOI•

Multi-objective infinite-horizon discounted Markov decision processes

[...]

Douglas J. White¹•Institutions (1)

University of Manchester¹

01 Oct 1982-Journal of Mathematical Analysis and Applications

95 citations

Journal Article•DOI•

An Iterative Aggregation Procedure for Markov Decision Processes

[...]

Roy Mendelssohn¹•Institutions (1)

National Marine Fisheries Service¹

01 Feb 1982-Operations Research

TL;DR: Global convergence of the algorithm is proven under very weak assumptions and the proof relates this technique to other iterative methods that have been suggested for general linear programs.

...read moreread less

Abstract: An iterative aggregation procedure is described for solving large scale, finite state, finite action Markov decision processes MDPs. At each iteration, an aggregate master problem and a sequence of smaller subproblems are solved. The weights used to form the aggregate master problem are based on the estimates from the previous iteration. Each subproblem is a finite state, finite action MDP with a reduced state space and unequal row sums. Global convergence of the algorithm is proven under very weak assumptions. The proof relates this technique to other iterative methods that have been suggested for general linear programs.

...read moreread less

65 citations

Journal Article•DOI•

Non-Randomized Markov and Semi-Markov Strategies in Dynamic Programming

[...]

E. A. Fainberg

01 Mar 1982-Theory of Probability and Its Applications

43 citations

Journal Article•DOI•

Learning control of finite Markov chains with unknown transition probabilities

[...]

Mitsuo Sato¹, K. Abe, H. Takeda•Institutions (1)

Tohoku University¹

01 Apr 1982-IEEE Transactions on Automatic Control

TL;DR: It is shown that the algorithms are asymptotically optimal in the sense that the probability of selecting an optimal policy converges to unity.

...read moreread less

Abstract: For a Markovian decision problem in which the transition probabilities are unknown, two learning algorithms are devised from the viewpoint of asymptotic optimality. Each time the algorithms select decisions to be used on the basis of not only the estimates of the unknown probabilities but also uncertainty of them. It is shown that the algorithms are asymptotically optimal in the sense that the probability of selecting an optimal policy converges to unity.

...read moreread less

31 citations

Journal Article•DOI•

On the evaluation of suboptimal strategies for families of alternative bandit processes

[...]

Kevin D. Glazebrook

01 Sep 1982-Journal of Applied Probability

TL;DR: In this article, a family of alternative bandit processes have been used as models for problems in a variety of areas, and optimal strategies for these decision processes are determined by dynamic allocation indices.

...read moreread less

Abstract: Families of alternative bandit processes have been used as models for problems in a variety of areas. Optimal strategies for these decision processes are determined by dynamic allocation indices. These indices are here shown to play an important role in the evaluation of suboptimal strategies. BANDIT PROBLEM; DYNAMIC ALLOCATION INDEX; GITTINS INDEX; MARKOV DECISION PROCESS; SUBOPTIMAL STRATEGIES

...read moreread less

27 citations

Journal Article•DOI•

Finite state approximations for denumerable state infinite horizon discounted Markov decision processes with unbounded rewards

[...]

Douglas J. White¹•Institutions (1)

University of Manchester¹

01 Mar 1982-Journal of Mathematical Analysis and Applications

23 citations

Book Chapter•DOI•

Dominant Strategies in Stochastic Allocation and Scheduling Problems

[...]

P. Nash¹, Richard Weber¹•Institutions (1)

University of Cambridge¹

01 Jan 1982

TL;DR: A sufficient condition for this to occur in the case where the problem can be modelled by a Markov decision process with costs depending only on the state of the process is presented.

...read moreread less

Abstract: Some problems of stochastic allocation and scheduling have the property that there is a single strategy which minimizes the expected value of the costs incurred up to every finite time horizon. We present a sufficient condition for this to occur in the case where the problem can be modelled by a Markov decision process with costs depending only on the state of the process. The condition is used to establish the nature of the optimal strategies for problems of customer assignment, dynamic memory allocation, optimal gambling, maintenance and scheduling.

...read moreread less

22 citations

Journal Article•DOI•

Recent Developments in Markov Decision Processes

[...]

Adrian C. Lavercombe

01 Jul 1982-Journal of the Operational Research Society

22 citations

Journal Article•DOI•

Algorithms for evaluating the dynamic allocation index

[...]

D. R. Robinson¹•Institutions (1)

University of Manchester¹

01 Apr 1982-Operations Research Letters

TL;DR: Gittins has shown that for a class of Markov decision processes called alternative bandit processes, optimal policies can easily be determined once the dynamic allocation indices (DAIs) for the constituentBandit processes are computed.

...read moreread less

Journal Article•DOI•

On the Convergence of the Discrete Time Dynamic Programming Equation for General Semigroups

[...]

A. Bensoussan, M. Robin

01 Sep 1982-Siam Journal on Control and Optimization

TL;DR: In this paper, the authors consider several classes of control problems for Markov processes (continuous control, optimal stopping, impulse control) and study the discrete time approximation of the dynamic programming equation, using mainly an analytical approach.

...read moreread less

Abstract: We consider several classes of control problems for Markov processes (continuous control, optimal stopping, impulse control). The formulation we use is valid for general Markov semigroups. We study the discrete time approximation of the dynamic programming equation, using mainly an analytical approach. Probabilistic interpretation is given for some of the results.

...read moreread less

Book Chapter•DOI•

Markov Processes and Functional Analysis

[...]

Masatoshi Fukushima¹•Institutions (1)

Osaka University¹

01 Jan 1982-North-holland Mathematics Studies

TL;DR: In this article, the authors discuss Markov processes and functional analysis, and the corresponding important analytical data are infinitesimal generators (of transition semigroups) of Markov Processes.

...read moreread less

Abstract: Publisher Summary This chapter discusses Markov processes and functional analysis For the theory of Markov processes, the corresponding important analytical data are infinitesimal generators (of transition semigroups) Equivalent roles are played by Dirichlet forms in a large class of Markov processes These notions, being relevant to diverse spaces of functions defined on the state space, may well be the objects of independent interests without referring to the associated Markov processes on the state space The Hille–Yosida theory of semigroups and the Beurling–Deny theory of Dirichlet spaces are also discussed in the chapter The chapter outlines a difference between the formulation of a Markov process and that of other important stochastic process—for example, a martingale

...read moreread less

Journal Article•DOI•

Calculating the variance in Markov-processes with random reward

[...]

Francisco Benito

01 Oct 1982-Trabajos De Estadistica Y De Investigacion Operativa

TL;DR: In this paper, a generalization of Markov Decision Processes with discreet time is presented, where the immediate rewards in every period are not deterministic but random, with the two first moments of the distribution given.

...read moreread less

Abstract: In this article we present a generalization of Markov Decision Processes with discreet time where the immediate rewards in every period are not deterministic but random, with the two first moments of the distribution given. Formulas are developed to calculate the expected value and the variance of the reward of the process, which formulas generalize and partially correct other results. We make some observations about the distribution of rewards for processes with limited or unlimited horizon and with or without discounting. Applications with risk sensitive policies are possible; this is illustrated in a numerical example where the results are revalidated by simulation.

...read moreread less

Journal Article•DOI•

Sequential estimation in finite-state Markov processes

[...]

S. Trybuła

01 Jan 1982-Applicationes Mathematicae

Book•

Decision models in stochastic programming : operational methods of decision making under uncertainty

[...]

Jati Kumar Sengupta

01 Jan 1982

Journal Article•DOI•

The Determination of Approximately Optimal Policies in Markov Decision Processes by the Use of Bounds

[...]

Douglas J. White¹•Institutions (1)

University of Manchester¹

01 Mar 1982-Journal of the Operational Research Society

TL;DR: Regardless of the method used, a straight application of one step of Howard's policy space method will give the desired results.

...read moreread less

Abstract: In the general area of Markov decision processes, a lot of attention has been given to deriving upper and lower bounds for approximating the optimal performance level. These are, in themselves, not useful unless they can be used to derive an approximately optimal policy. The existing literature does this specifically in the context of the computational methods being used at the time. However, irrespective of the method used, a straight application of one step of Howard's policy space method will give the desired results.

...read moreread less

Journal Article•DOI•

Negatively isotone optimal policies for random walk type Markov decision processes

[...]

D. J. White¹•Institutions (1)

University of Manchester¹

01 Mar 1982-Or Spektrum

TL;DR: In this article, a random walk type Markov decision process is considered, where the state space is an integer subset of IR and the action space is independent of i eI. The natural order is assumed, overI, and a quasi order, overK, together with aconditional convexity assumption on the returns.

...read moreread less

Abstract: This paper considers a random walk type Markov decision process in which the state spaceI is an integer subset of IR m , and the action spaceK is independent ofi eI. The natural order , overI, and a quasi order, ′, overK, is assumed, together with aconditional convexity assumption on the returns {r i k }, and certain other assumptions about these rewards and the transition probabilities in relationship to the orders and ′.A negatively isotone policy is one for whichi i′→δ(i)⊁′)δ(i′) (i.e.δ(i) ′δ(i)′ orδ(i′) ′δi)). It is shown that, under specified conditions, a negatively isotone optimal policy exists. Some consideration is given to computational implications in particular relationship to Howard's policy space method.

...read moreread less

Journal Article•DOI•

Finite Generalized Markov Programming.

[...]

Martin L. Puterman, P. J. Weeda

01 Jun 1982-Journal of the American Statistical Association

Journal Article•DOI•

Semi-Markov decision processes with polynomial reward

[...]

Zvi Rosberg

01 Jun 1982-Journal of Applied Probability

Journal Article•DOI•

Optimization of STEOR networks via Markov renewal programming

[...]

Wolfram Nicolai

01 Dec 1982

TL;DR: It is shown how to treat the problems of time optimization and cost optimization of STEOR networks (GERT networks with only nodes of the “stochastic exclusive-or” type) within the scope of Markov decision processes and the related dynamic programming techniques.

...read moreread less

Abstract: We show how to treat the problems of time optimization and cost optimization of STEOR networks (GERT networks with only nodes of the “stochastic exclusive-or” type) within the scope of Markov decision processes and present the related dynamic programming techniques.

...read moreread less

Proceedings Article•DOI•

A model of transport level flow control

[...]

Alan Harbitter¹, Satish K. Tripathi¹•Institutions (1)

University of Maryland, College Park¹

30 Aug 1982

TL;DR: A Markov Decision Process model is developed to analyze buffer assignment at the transport level of the ARPAnet protocol and results are a method for obtaining an assignment policy which is optimal with respect to a delay/throughput/overhead reward function.

...read moreread less

Abstract: A Markov Decision Process model is developed to analyze buffer assignment at the transport level of the ARPAnet protocol. The result of the analysis is a method for obtaining an assignment policy which is optimal with respect to a delay/throughput/overhead reward function. The nature of the optimal policy is investigated by varying parameters of the reward.

...read moreread less

Journal Article•DOI•

A note on "Multilayer control of large markov chains"

[...]

Alain Haurie¹•Institutions (1)

École Normale Supérieure¹

01 Jun 1982-IEEE Transactions on Automatic Control

TL;DR: It is shown that the multilayer control scheme of the above paper can be constructed by using available results on Markov renewal theory and semi-Markov decision processes.

...read moreread less

Abstract: It is shown that the multilayer control scheme of the above paper can be constructed by using available results on Markov renewal theory and semi-Markov decision processes.

...read moreread less

Markovian models of a transactional system supported by checkpointing and recovery strategies, Part 2: A model with a specified number of completed transactions between checkpoints

[...]

V.F. Nicola

01 Jan 1982

TL;DR: A submitted manuscript is the author's version of the article upon submission and before peer-review as discussed by the authors, and the final published version features the final layout of the paper including the volume, issue and page numbers.

...read moreread less

Abstract: • A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers.

...read moreread less

State of the Art--A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms

[...]

Monahan

01 Jan 1982

The Determination of Approximately Optimal Policies in Markov Decision Processes

[...]

D. J. White

01 Jan 1982

TL;DR: In the general area of Markov decision processes, a lot of attention has been given to deriving upper and lower bounds for approximating the optimal performance level as discussed by the authors, which is not useful unless they can be used to derive an approximately optimal policy.

...read moreread less

Abstract: In the general area of Markov decision processes, a lot of attention has been given to deriving upper and lower bounds for approximating the optimal performance level. These are, in them? selves, not useful unless they can be used to derive an approximately optimal policy. The existing literature does this specifically in the context of the computational methods being used at the time. However, irrespective of the method used, a straight application of one step of Howard's policy space method will give the desired results.

...read moreread less

Book Chapter•DOI•

Markov Processes and Related Problems of Analysis: Markov representations of stochastic systems (RMS 30:1 (1975) 65–104)

[...]

E. B. Dynkin

01 Jan 1982

Buffer allocation policies for message switching networks

[...]

Ashok Kumar Thareja

01 Jan 1982

TL;DR: It is demonstrated that significant improvements in system throughput can be realized by the use of optimal delayed resolution policies, and some simple procedures which use the values of model parameters can solve the analogue of the identification problem in adaptive control systems.

...read moreread less

Abstract: A multi-class queue with typed servers has been used as an analytic model for a subnetwork node of a message switching network In the context of such a model, this dissertation addresses the problem of designing an optimal policy for sharing finite buffers Namely, given the values of model parameters, ie, number of job classes, buffer size, arrival and service functions, what buffer sharing policy should be used to obtain the optimal performance? Unlike past work in this area, no policy is assumed a priori The search space of permissible buffer sharing policies is defined in terms of primitive actions and decisions It is shown that an optimal policy must belong to the class of policies termed stationary delayed resolution policies An iterative procedure based upon policy iteration methods for Markov decision processes is used to obtain the optimal delayed resolution policy It is demonstrated that significant improvements in system throughput can be realized by the use of optimal delayed resolution policies The policy iteration technique involves solving linear systems of equations The order of the number of equations to be solved is a function of B('K), where K and B are the number of classes and number of buffers, respectively Therefore, it becoms intractable to solve for the optimal policy for reasonable values of B and K A class of policies called SRS delayed resolution policies are proposed It is shown that performance of the best SRS delayed resolution policies is close to the performance of the optimal delayed resolution policies The buffer allocation model is applied to model a node of a store-and-forward network An analysis of the message delays with respect to a link level protocol is presented It is shown that under some situations, a non-delayed resolution policy may provide smaller message delays than a delayed resolution policy Analysis is presented to determine network parameters for which one class of policies are likely to provide smaller delays than the other class of policies In networks where the traffic does not remain constant, the buffer sharing policies must be able to adapt to such a changing environment Using the analogy of adaptive control systems, the adaptive policies may be obtained by generalizing the policies that are known to be optimal for static environments It is shown that some simple procedures which use the values of model parameters can solve the analogue of the identification problem in adaptive control systems

...read moreread less

Journal Article•DOI•

Asymptotic Properties of a Finite State Continuous Time Markov Decision Process

[...]

Gerd Rodé

01 Nov 1982-Siam Journal on Control and Optimization

TL;DR: In this article, the authors consider a continuous time Markov decision process with a finite state space and show that the expected reward always tends to a limit as the time parameter $t \to \infty $.

...read moreread less

Abstract: We consider a continuous time Markov decision process with a finite state space. There is a specified terminal reward, but the reward or cost rate is always zero. The maximum expected final gain can then be obtained by means of the exponential of a certain sublinear operator on $R^n$ This representation allows us to describe the asymptotic properties of the reward vector. We prove that the expected reward always tends to a limit as the time parameter $t \to \infty $. If we assume that it is allowed to stop the process in any state, then we can construct an almost optimal stationary control. Finally, we characterize the case where the asymptotic gain is independent of the initial state.

...read moreread less

Book Chapter•DOI•

Numerical analysis of Markov decision processes

[...]

Lmm Veugen, van der J Jan Wal, J Jaap Wessels

01 Jan 1982

TL;DR: In dieser Arbeit werden einige Aspekte der numerisehen Analyse von Markoffschen Entscheidungsprozessen mit Diskontierung diskutiert, inwieweit Aggregation and Disaggregation vorteilhaft sind.

...read moreread less

Abstract: In dieser Arbeit werden einige Aspekte der numerisehen Analyse von Markoffschen Entscheidungsprozessen mit Diskontierung diskutiert. Insbesondere wird versucht, die spezielle Problemstruktur zur Wahl effizienter Algorithmen auszunutzen. Beispiele solcher speziellen Strukturen sind die Periodizitat der Bedarfe und die Struktur des Aktionsraums in Lagerhaltungsmodellen. Fur letztere wird untersucht, inwieweit Aggregation und Disaggregation vorteilhaft sind.

...read moreread less