Showing papers on "Markov decision process published in 1992"

PDF

Open Access

Book•

Controlled Markov processes and viscosity solutions

[...]

Wendell H. Fleming¹, H. Mete Soner•Institutions (1)

18 Dec 1992

TL;DR: In this paper, an introduction to optimal stochastic control for continuous time Markov processes and to the theory of viscosity solutions is given, as well as a concise introduction to two-controller, zero-sum differential games.

...read moreread less

Abstract: This book is intended as an introduction to optimal stochastic control for continuous time Markov processes and to the theory of viscosity solutions. The authors approach stochastic control problems by the method of dynamic programming. The text provides an introduction to dynamic programming for deterministic optimal control problems, as well as to the corresponding theory of viscosity solutions. A new Chapter X gives an introduction to the role of stochastic optimal control in portfolio optimization and in pricing derivatives in incomplete markets. Chapter VI of the First Edition has been completely rewritten, to emphasize the relationships between logarithmic transformations and risk sensitivity. A new Chapter XI gives a concise introduction to two-controller, zero-sum differential games. Also covered are controlled Markov diffusions and viscosity solutions of Hamilton-Jacobi-Bellman equations. The authors have tried, through illustrative examples and selective material, to connect stochastic control theory with other mathematical areas (e.g. large deviations theory) and with applications to engineering, physics, management, and finance. In this Second Edition, new material on applications to mathematical finance has been added. Concise introductions to risk-sensitive control theory, nonlinear H-infinity control and differential games are also included.

...read moreread less

3,885 citations

Journal Article•DOI•

On the Gittins Index for Multiarmed Bandits

[...]

Richard Weber

01 Nov 1992-Annals of Applied Probability

TL;DR: In this article, the authors considered the multiarmed bandit problem and presented a new proof of the optimality of the Gittins index policy, which does not require an interchange argument.

...read moreread less

Abstract: This paper considers the multiarmed bandit problem and presents a new proof of the optimality of the Gittins index policy. The proof is intuitive and does not require an interchange argument. The insight it affords is used to give a streamlined summary of previous research and to prove a new result: The optimal value function is a submodular set function of the available projects.

...read moreread less

245 citations

Proceedings Article•DOI•

Markov paging

[...]

Anna R. Karlin, Steven J. Phillips, Prabhakar Raghavan

24 Oct 1992

TL;DR: This paper considers the problem of paging under the assumption that the sequence of pages accessed is generated by a Markov chain, and draws on the theory of Markov decision processes to characterize the paging algorithm that achieves optimal fault-rate on any Markov chains.

...read moreread less

Abstract: This paper considers the problem of paging under the assumption that the sequence of pages accessed is generated by a Markov chain. The authors use this model to study the fault-rate of paging algorithms, a quantity of interest to practitioners. They first draw on the theory of Markov decision processes to characterize the paging algorithm that achieves optimal fault-rate on any Markov chain. They address the problem of efficiently devising a paging strategy with low fault-rate for a given Markov chain. They show that a number of intuitively good approaches fail. Their main result is an efficient procedure that, on any Markov chain, will give a paging algorithm with fault-rate at most a constant times optimal. Their techniques also show that some algorithms that do poorly in practice fail in the Markov setting, despite known (good) performance guarantees when the requests are generated independently from a probability distribution. >

...read moreread less

92 citations

Journal Article•DOI•

Rolling horizon procedures in nonhomogeneous Markov decision processes

[...]

Jeffrey M. Alden, Robert L. Smith

01 May 1992-Operations Research

TL;DR: It is shown that the error goes to zero for any fixed rolling horizon as this Doeblin measure of control over the future decreases, and provides a cost error bound for a general rolling horizon algorithm when applied to infinite horizon nonhomogeneous Markov decision processes.

...read moreread less

Abstract: By far the most common planning procedure found in practice is to approximate the solution to an infinite horizon problem by a series of rolling finite horizon solutions. Although many empirical studies have been done, this so-called rolling horizon procedure has been the subject of few analytic studies. We provide a cost error bound for a general rolling horizon algorithm when applied to infinite horizon nonhomogeneous Markov decision processes, both in the discounted and average cost cases. We show that a Doeblin coefficient of ergodicity acts much like a discount factor to reduce this error. In particular, we show that the error goes to zero for any fixed rolling horizon as this Doeblin measure of control over the future decreases. The theory is illustrated through an application to vehicle deployment.

...read moreread less

81 citations

Journal Article•DOI•

Random yield, rework and scrap in a multistage batch manufacturing environment

[...]

Anne Spence Wein

01 May 1992-Operations Research

TL;DR: The objective of the study is to endogenize rework and scrap decisions in a multistage production process using a Markov decision process model developed and solved using dynamic programming techniques.

...read moreread less

Abstract: This study is motivated by a make-to-order marketing environment where an order is met from a single production lot size. The objective of the study is to endogenize rework and scrap decisions in a multistage production process. A Markov decision process model is developed and solved using dynamic programming techniques. The model assumes that demand is given, and material, processing and rework costs are linear in the production lot size. Modeling random yield at each stage of the production process is of key interest. The solution to the problem is characterized and the sensitivity of the solution to the parameters of the model is examined.

...read moreread less

70 citations

Journal Article•DOI•

Algorithms for singularly perturbed limiting average Markov control problems

[...]

M. Abbad¹, Jerzy A. Filar¹, Tomasz R. Bielecki²•Institutions (2)

University of Baltimore¹, University of Kansas²

01 Jan 1992-IEEE Transactions on Automatic Control

TL;DR: Two algorithms for the solution of the underlying limit Markov control problem are presented, a linear program possessing the Wolfe-Dantzig structure inherited from the ergodic 'nearly decomposable' assumption in the model and an aggregation-disaggregation policy improvement algorithm.

...read moreread less

Abstract: A singularly perturbed Markov decision process with the limiting average reward criterion is considered. It is assumed that the underlying process is composed of n separate irreducible processes, and that the small perturbation is such that it unites these processes into a single irreducible process. Two algorithms for the solution of the underlying limit Markov control problem are presented. The first of these is a linear program possessing the Wolfe-Dantzig structure inherited from the ergodic 'nearly decomposable' assumption in the model. The second is an aggregation-disaggregation policy improvement algorithm. >

...read moreread less

69 citations

Journal Article•DOI•

Perturbation and stability theory for Markov control problems

[...]

M. Abbad¹, Jerzy A. Filar¹•Institutions (1)

University of Maryland, Baltimore¹

01 Jan 1992-IEEE Transactions on Automatic Control

TL;DR: A unified approach to the asymptotic analysis of a Markov decision process disturbed by an epsilon -additive perturbation is proposed in this article, where the underlying control problem that needs to be understood is the limit Markov control problem.

...read moreread less

Abstract: A unified approach to the asymptotic analysis of a Markov decision process disturbed by an epsilon -additive perturbation is proposed. Irrespective of whether the perturbation is regular or singular, the underlying control problem that needs to be understood is the limit Markov control problem. The properties of this problem are studied. >

...read moreread less

67 citations

Journal Article•DOI•

Comparing recent assumptions for the existence of average optimal stationary policies

[...]

Rolando Cavazos-Cadena¹, Linn I. Sennott²•Institutions (2)

Universidad Autónoma Agraria Antonio Narro¹, Illinois State University²

01 Feb 1992-Operations Research Letters

TL;DR: This work considers discrete time average cost Markov decision processes with countable state space and finite action sets and concludes that the Sennott conditions are the weakest.

...read moreread less

66 citations

Journal Article•DOI•

Some comments on a theorem of Hardy and Littlewood

[...]

Roman Sznajder¹, Jerzy A. Filar¹•Institutions (1)

University of Maryland, Baltimore County¹

01 Oct 1992-Journal of Optimization Theory and Applications

TL;DR: In this article, the authors reconstruct a proof of a classical result due to Hardy and Littlewood, which is not covered by the Hardy-Littlewood theorem, and provide either examples or complete citations for other related cases which are not covered.

...read moreread less

Abstract: In this note, we reconstruct a proof of a classical result due to Hardy and Littlewood. While this result has played an important role in the modern theories of Markov decision processes and stochastic games, it is not that easy to find its proof in the literature in the format in which it has been applied. Furthermore, we supply either examples or complete citations for the other related cases which are not covered by the Hardy-Littlewood theorem.

...read moreread less

51 citations

Proceedings Article•DOI•

Flow control using the theory of zero sum Markov games

[...]

Eitan Altman¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

16 Dec 1992

TL;DR: It is shown that there exists an optimal stationary policy (such that the decisions depend only on the actual number of customers in the queue); it is of a threshold type, and it uses randomization in at most one state.

...read moreread less

Abstract: The author considers the problem of dynamic flow control of arriving packets into an infinite buffer. The service rate may depend on the state of the system, may change in time, and is unknown to the controller. The goal of the controller is to design an efficient policy which guarantees the best performance under the worst service conditions. The cost is composed of a holding cost, a cost for rejecting customers (packets) and a cost that depends on the quality of the service. The problem is studied in the framework of zero-sum Markov games, and a value iteration algorithm is used to solve it. It is shown that there exists an optimal stationary policy (such that the decisions depend only on the actual number of customers in the queue); it is of a threshold type, and it uses randomization in at most one state. >

...read moreread less

45 citations

Journal Article•DOI•

Recurrence Conditions for Average and Blackwell Optimality in Denumerable State Markov Decision Chains

[...]

Rommert Dekker¹, Arie Hordijk²•Institutions (2)

Royal Dutch Shell¹, Leiden University²

01 May 1992-Mathematics of Operations Research

TL;DR: It is shown that the operator theoretical approach presented for multichain Markov decision processes with a countable state space, compact action sets and unbounded rewards can also be carried out under recurrence conditions.

...read moreread less

Abstract: In a previous paper Dekker and Hordijk 1988 presented an operator theoretical approach for multichain Markov decision processes with a countable state space, compact action sets and unbounded rewards. Conditions were presented guaranteeing the existence of a Laurent series expansion for the discounted rewards, the existence of average and Blackwell optimal policies and the existence of solutions for the average and Blackwell optimality equations. While these assumptions were operator oriented and formulated as conditions for the deviation matrix, we will show in this paper that the same approach can also be carried out under recurrence conditions. These new conditions seem easier to check in general and are especially suited for applications in queueing models.

...read moreread less

Proceedings Article•DOI•

An analysis of near optimal call admission and routing model for multi-service loss networks

[...]

Zbigniew Dziong¹, L.G. Mason¹•Institutions (1)

Institut national de la recherche scientifique¹

01 May 1992

TL;DR: A state-dependent call admission and routing policy for a multiservice circuit-switched network is analyzed and the numeral study showed that the convergence of the analyzed strategy is achieved in at most two iterations and the good traffic efficiency of the approach was showed.

...read moreread less

Abstract: A state-dependent call admission and routing policy for a multiservice circuit-switched network is analyzed. The policy is based on decomposition of the Markov decision problem into a set of separable link problems. To provide an exact link analysis model a value iteration algorithm was offered. This allows examination of the accuracy of several approximations used to reduce the complexity of the problem. The numeral study showed that the convergence of the analyzed strategy is achieved in at most two iterations. The study also showed the good traffic efficiency of the approach and confirmed the predicted ability to control the distribution of call classes grade of service. The approach, together with its sensitivity analysis with respect to the arrival rates, provides a very general framework for studying, constructing, and optimizing other call admission and routing strategies. The results of sensitivity analysis are used to compare the proposed decomposition approach with the decomposition approach developed by F.P. Kelly (1988) for optimization of a load sharing policy. Also, the relationship to other routing strategies based on Markov decision theory is investigated. >

...read moreread less

Journal Article•DOI•

Zero-sum two-person semi-markov games

[...]

A. K. Lal, Sagnik Sinha

01 Mar 1992-Journal of Applied Probability

TL;DR: In this paper, the authors investigated semi-Markov games under discounted and limiting average payoff criteria, and proved the existence of a solution to the optimality equation under a natural ergodic condition.

...read moreread less

Abstract: Semi-Markov games are investigated under discounted and limiting average payoff criteria. The issue of the existence of the value and a pair of stationary optimal strategies are settled; the optimality equation is studied and under a natural ergodic condition the existence of a solution to the optimality equation is proved for the limiting average case. Semi-Markov games provide useful flexibility in constructing recursive game models. All the work on Markov/semi-Markov decision processes and Markov (stochastic) games can be viewed as special cases of the developments in this paper.

...read moreread less

Journal Article•DOI•

Optimal replacement in the dairy herd: A multi-component system

[...]

Anders Ringgaard Kristensen

01 Jan 1992-Agricultural Systems

TL;DR: In this article, an approximate method combining dynamic programming and stochastic simulation in the determination of a set of descriptive parameters is suggested, which is used in the calculation of the multi-component replacement criterion for cows and heifers.

...read moreread less

Proceedings Article•DOI•

Dynamic programming for optimal control of setup scheduling with neural network modifications

[...]

Gary Bradski¹•Institutions (1)

Boston University¹

07 Jun 1992

TL;DR: An optimal control solution to change of machine setup scheduling based on dynamic programming average cost per stage value iteration as set forth by M. Caramanis et al. (1991) is demonstrated.

...read moreread less

Abstract: Demonstrated is an optimal control solution to change of machine setup scheduling based on dynamic programming average cost per stage value iteration as set forth by M. Caramanis et al. (1991) for the 2-D case. The difficulty with the optimal approach lies in the explosive computational growth of the resulting solution. A method of reducing the computational complexity is developed using ideas from biology and neural networks. A real-time controller is described that uses a linear-log representation of state space with neural networks employed to fit cost surfaces. >

...read moreread less

Journal Article•DOI•

Variability sensitive Markov decision processes

[...]

Melike Baykal-Gürsoy¹, Keith W. Ross²•Institutions (2)

Rutgers University¹, University of Pennsylvania²

01 Aug 1992-Mathematics of Operations Research

TL;DR: Two definitions of variability are introduced, namely, the expected time- average variability and time-average expected variability, and a randomized stationary policy is constructed that is e-optimal for both criteria.

...read moreread less

Abstract: Considered are time-average Markov Decision Processes MDPs with finite state and action spaces. Two definitions of variability are introduced, namely, the expected time-average variability and time-average expected variability. The two criteria are in general different, although they can both be employed to penalize for variance in the stream of rewards. For communicating MDPs, we construct a randomized stationary policy that is e-optimal for both criteria; the policy is optimal and pure for a specific variability function. For general multichain MDPs, a state space decomposition leads to a similar result for the expected time-average variability. We also consider the problem of the decision maker choosing the initial state along with the policy.

...read moreread less

Journal Article•DOI•

A weighted Markov decision process

[...]

Dmitry Krass¹, Jerzy A. Filar², Sagnik Sinha³•Institutions (3)

University of Toronto¹, University of Maryland, Baltimore², Indian Statistical Institute³

01 Nov 1992-Operations Research

TL;DR: An iterative algorithm for computing an e-optimal nonstationary policy with a very simple structure is presented, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights.

...read moreread less

Abstract: The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to “neglect” the future, concentrating on the short-term rewards, while the second one tends to do the opposite. We consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in turn can be outperformed by the nonstationary policies; an optimal policy might not exist. We present an iterative algorithm for computing an e-optimal nonstationary policy with a very simple structure.

...read moreread less

Journal Article•DOI•

Optimal stationary policies in general state space Markov decision chains with finite action sets

[...]

Robert K. Ritt¹, Linn I. Sennott¹•Institutions (1)

Illinois State University¹

01 Nov 1992-Mathematics of Operations Research

TL;DR: The result of Sennott [9] on the existence of optimal stationary policies in countable state Markov decision chains with finite action sets is generalized to arbitrary state space Markov decisions chains.

...read moreread less

Abstract: The result of Sennott [9] on the existence of optimal stationary policies in countable state Markov decision chains with finite action sets is generalized to arbitrary state space Markov decision chains. The assumption of finite action sets occurring in a global countable action space allows a particularly simple theoretical structure for the general state space Markov decision chain. Two examples illustrate the results. Example 1 is a system of parallel queues with stochastic work requirements, a movable server with controllable service rate, and a reject option. Example 2 is a system of parallel queues with stochastic controllable inputs, a movable server with fixed service rates, and a reject option.

...read moreread less

Journal Article•DOI•

A stopping rule for forecast horizons in nonhomogeneous Markov decision processes

[...]

James C. Bean, Wallace J. Hopp, Izak Duenyas

01 Nov 1992-Operations Research

TL;DR: A Bender's decomposition approach to solving this problem that evaluates the stopping rule, eliminates some suboptimal combinations of actions, and yields bounds on the maximum error that could result from the selection of a candidate action in the initial stage is given.

...read moreread less

Abstract: We formulate a mixed integer program to determine whether a finite time horizon is a forecast horizon in a nonhomogeneous Markov decision process. We give a Bender's decomposition approach to solving this problem that evaluates the stopping rule, eliminates some suboptimal combinations of actions, and yields bounds on the maximum error that could result from the selection of a candidate action in the initial stage. The integer program arising from the decomposition has special properties that allow efficient solution. We illustrate the approach with numerical examples.

...read moreread less

Journal Article•DOI•

Weighted reward criteria in competitive Markov decision processes

[...]

Jerzy A. Filar¹, O.J. Vrieze•Institutions (1)

University of Maryland, Baltimore County¹

01 Jan 1992

TL;DR: Competitive Markov Decision Processes in which the controllers/players are antagonistic and aggregate their sequences of expected rewards according to “weighted” or “horizonsensitive” criteria are considered.

...read moreread less

Abstract: We consider Competitive Markov Decision Processes in which the controllers/players are antagonistic and aggregate their sequences of expected rewards according to “weighted” or “horizonsensitive” criteria. These are either a convex combination of two discounted objectives, or of one discounted and one limiting average reward objective. In both cases we establish the existence of the game-theoretic value vector, and supply a description of 6-optimal non-stationary strategies.

...read moreread less

Journal Article•DOI•

Equivalence of Lyapunov stability criteria in a class of Markov decision processes

[...]

Rolando Cavazos-Cadena¹, Onésimo Hernández-Lerma²•Institutions (2)

Universidad Autónoma Agraria Antonio Narro¹, CINVESTAV²

01 Sep 1992-Applied Mathematics and Optimization

TL;DR: In this article, the authors proved the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function.

...read moreread less

Abstract: We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.

...read moreread less

Journal Article•DOI•

Learning real-time scheduling rules from optimal policy of semi-Markov decision processes

[...]

Yuehwern Yih

01 May 1992-International Journal of Computer Integrated Manufacturing

TL;DR: A knowledge acquisition method that extracts the real-time scheduling rules from the optimal policy of the user-based semi-Markov decision processes and combines the human's knowledge of realtime scheduling with the optimization technique to create a better knowledge resource.

...read moreread less

Abstract: This paper proposes a knowledge acquisition method that extracts the real-time scheduling rules from the optimal policy of the user-based semi-Markov decision processes. This method combines the human's knowledge of realtime scheduling with the optimization technique to create a better knowledge resource. A revised rule formation algorithm developed in trace driven knowledge acquisition (TDKA) is used to generalize the optimal policy derived from semi-Markov decision processes. A hoist scheduling problem in circuit board production lines demonstrates the feasibility and the superior performance of the proposed method.

...read moreread less

Journal Article•DOI•

On strong average optimality of markov decision processes with unbounded costs

[...]

Mrinal K. Ghosh¹, Steven I. Marcus¹•Institutions (1)

University of Maryland, College Park¹

01 Mar 1992-Operations Research Letters

TL;DR: Under a certain penalizing condition on the cost for unstable behavior, the existence of a stable stationary strategy which is strong average optimal is established.

...read moreread less

Journal Article•DOI•

Theory for automatic learning under partially observed Markov-dependent noise

[...]

S. Yakowtiz¹, T. Jayawardena¹, S. Li¹•Institutions (1)

University of Arizona¹

01 Sep 1992-IEEE Transactions on Automatic Control

TL;DR: In a sense, in order to be made precise, the algorithm offered is shown to attain asymptotically optimal performance, and rates are assured.

...read moreread less

Abstract: A vigorous branch of automatic learning is directed at the task of locating a global minimum of an unknown multimodal function f( theta ) on the basis of noisy observations L( theta (i))=f( theta (i))+W( theta (i)) taken at sequentially-chosen control points ( theta (i)). In all preceding convergence deviations known to the authors, the noise is postulated to depend on the past only through control selection. Here they allow the observation noise sequence to be stochastically dependent, in particular, a function of an unknown underlying Markov decision process, the observations being the stagewise losses. In a sense, in order to be made precise, the algorithm offered is shown to attain asymptotically optimal performance, and rates are assured. A motivating example from queueing theory is offered, and connections with classical problems of Markov control theory and other disciplines are mentioned. >

...read moreread less

Journal Article•DOI•

Computational approaches to variance-penalised Markov decision processes

[...]

D. J. White¹•Institutions (1)

University of Manchester¹

01 Jun 1992-Or Spektrum

TL;DR: Three computational approaches for solving a variance-penalised Markov decision process are developed, viz. parametric linear programming, parametric Lagrangean programming, and a parametric policy space approach.

...read moreread less

Abstract: This paper develops three computational approaches for solving a variance-penalised Markov decision process, viz. parametric linear programming, parametric Lagrangean programming, and a parametric policy space approach. Fur ein Markov-Entscheidungsmodell, in dem der Durchschnittsgewinn um einen Teil der mittleren Varianz vermindert wird, werden drei Typen von Losungsverfahren entwickelt, und zwar parametrische lineare Programmierung, parametrische Lagrange-Optimierung und parametrische Politik-Iterations-Algorithmen.

...read moreread less

Journal Article•DOI•

Discounted and average Markov decision processes with unbounded rewards: New conditions

[...]

Qi Ying Hu¹•Institutions (1)

Xidian University¹

15 Nov 1992-Journal of Mathematical Analysis and Applications

TL;DR: A new condition is presented about the unbounded rewards under which the discounted optimality equation has a unique solution, then sufficient conditions are exposed for the existence of a solution of the average optimality equations in the discrete time Markov decision processes.

...read moreread less

Journal Article•DOI•

Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state

[...]

Rolando Cavazos-Cadena¹•Institutions (1)

Universidad Autónoma Agraria Antonio Narro¹

01 Sep 1992-Applied Mathematics and Optimization

TL;DR: In this paper, the existence of an optimal stationary policy under structural restrictions on the model is proved; both the lim inf and lim sup average criteria are considered, and the arguments are based on well-known facts from Renewal Theory.

...read moreread less

Abstract: We consider discrete-timeaverage reward Markov decision processes with denumerable state space andbounded reward function. Under structural restrictions on the model the existence of an optimal stationary policy is proved; both the lim inf and lim sup average criteria are considered. In contrast to the usual approach our results donot rely on the average regard optimality equation. Rather, the arguments are based on well-known facts fromRenewal Theory.

...read moreread less

Journal Article•DOI•

Optimal stationary policies in the vector-valued Markov decision process

[...]

Kazuyoshi Wakuta

01 Aug 1992-Stochastic Processes and their Applications

TL;DR: In this paper, the authors considered the vector-valued Markov decision process and considered the characterization of optimal stationary policies among the set of all (randomized, history-dependent) policies.

...read moreread less

Journal Article•DOI•

A Markov chain model in teachers’ decision making

[...]

Steve C. Perdikaris¹•Institutions (1)

American Hotel & Lodging Educational Institute¹

01 May 1992-International Journal of Mathematical Education in Science and Technology

TL;DR: This article applied an ergodic Markov chain process to teachers' decision-making process and obtained a measure of teacher's ability as a decision maker in the process of asking questions in class, which may be helpful to the teacher in developing a better questioning technique that will encourage his students to think in higher levels.

...read moreread less

Abstract: We apply an ergodic Markov chain process to teachers’ decision making. Through this we succeed in giving a new approach to the teachers’ ‘behaviour’ in the decision‐making process and obtain a measure of teacher's ability as a decision maker in the process of asking questions in class. This measure may be helpful to the teacher of mathematics in developing a better questioning technique that will encourage his students to think in higher levels.

...read moreread less

Proceedings Article•DOI•

A class of two-dimensional stochastic approximations and steering policies for Markov decision processes

[...]

D.-J. Ma, Armand M. Makowski

16 Dec 1992

TL;DR: These convergence properties provide an alternative proof for some of the properties of steering policies as well as an indirect argument that blends standard results on stochastic approximations with a version of the law of large number for martingale differences.

...read moreread less

Abstract: The authors consider a specific multidimensional stochastic approximation scheme of the Robbins-Monro type that naturally arises in the study of steering policies for Markov decision processes. The usual convergence results (in the almost sure sense) do not seem to apply for this simple scheme. Almost sure convergence is established by an indirect argument that blends standard results on stochastic approximations with a version of the law of large number for martingale differences. These convergence properties provide an alternative proof for some of the properties of steering policies. >

...read moreread less