Showing papers on "Markov decision process published in 2005"

PDF

Open Access

Journal Article•DOI•

Robust Control of Markov Decision Processes with Uncertain Transition Matrices

[...]

Arnab Nilim¹, Laurent El Ghaoui¹•Institutions (1)

01 Sep 2005-Operations Research

TL;DR: This work considers a robust control problem for a finite-state, finite-action Markov decision process, where uncertainty on the transition matrices is described in terms of possibly nonconvex sets, and shows that perfect duality holds for this problem, and that it can be solved with a variant of the classical dynamic programming algorithm, the "robust dynamic programming" algorithm.

...read moreread less

Abstract: Optimal solutions to Markov decision problems may be very sensitive with respect to the state transition probabilities In many practical problems, the estimation of these probabilities is far from accurate Hence, estimation errors are limiting factors in applying Markov decision processes to real-world problems We consider a robust control problem for a finite-state, finite-action Markov decision process, where uncertainty on the transition matrices is described in terms of possibly nonconvex sets We show that perfect duality holds for this problem, and that as a consequence, it can be solved with a variant of the classical dynamic programming algorithm, the "robust dynamic programming" algorithm We show that a particular choice of the uncertainty sets, involving likelihood regions or entropy bounds, leads to both a statistically accurate representation of uncertainty, and a complexity of the robust recursion that is almost the same as that of the classical recursion Hence, robustness can be added at practically no extra computing cost We derive similar results for other uncertainty sets, including one with a finite number of possible values for the transition matrices We describe in a practical path planning example the benefits of using a robust strategy instead of the classical optimal strategy; even if the uncertainty level is only crudely guessed, the robust strategy yields a much better worst-case expected travel time

...read moreread less

740 citations

Journal Article•DOI•

An MDP-Based Recommender System

[...]

Guy Shani, David Heckerman, Ronen I. Brafman

01 Dec 2005-Journal of Machine Learning Research

TL;DR: In this paper, the authors argue that it is more appropriate to view the problem of generating recommendations as a sequential optimization problem and, consequently, that Markov decision processes (MDPs) provide a more appropriate model for recommender systems.

...read moreread less

Abstract: Typical recommender systems adopt a static view of the recommendation process and treat it as a prediction problem. We argue that it is more appropriate to view the problem of generating recommendations as a sequential optimization problem and, consequently, that Markov decision processes (MDPs) provide a more appropriate model for recommender systems. MDPs introduce two benefits: they take into account the long-term effects of each recommendation and the expected value of each recommendation. To succeed in practice, an MDP-based recommender system must employ a strong initial model, must be solvable quickly, and should not consume too much memory. In this paper, we describe our particular MDP model, its initialization using a predictive model, the solution and update algorithm, and its actual performance on a commercial site. We also describe the particular predictive model we used which outperforms previous models. Our system is one of a small number of commercially deployed recommender systems. As far as we know, it is the first to report experimental analysis conducted on a real commercial site. These results validate the commercial value of recommender systems, and in particular, of our MDP-based approach.

...read moreread less

690 citations

Journal Article•DOI•

Perseus: randomized point-based value iteration for POMDPs

[...]

Matthijs T. J. Spaan¹, Nikos Vlassis¹•Institutions (1)

University of Amsterdam¹

01 Jul 2005-Journal of Artificial Intelligence Research

TL;DR: This work presents a randomized point-based value iteration algorithm called PERSEUS, which backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set.

...read moreread less

Abstract: Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called PERSEUS. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, PERSEUS backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of PERSEUS in large scale POMDP problems.

...read moreread less

674 citations

Journal Article•DOI•

Robust Dynamic Programming

[...]

Garud Iyengar¹•Institutions (1)

Columbia University¹

01 May 2005-Mathematics of Operations Research

TL;DR: It is proved that when this set of measures has a certain "rectangularity" property, all of the main results for finite and infinite horizon DP extend to natural robust counterparts.

...read moreread less

Abstract: In this paper we propose a robust formulation for discrete time dynamic programming (DP). The objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating a set of conditional measures with each state-action pair. Consequently, in the robust formulation each policy has a set of measures associated with it. We prove that when this set of measures has a certain "rectangularity" property, all of the main results for finite and infinite horizon DP extend to natural robust counterparts. We discuss techniques from Nilim and El Ghaoui [17] for constructing suitable sets of conditional measures that allow one to efficiently solve for the optimal robust policy. We also show that robust DP is equivalent to stochastic zero-sum games with perfect information.

...read moreread less

585 citations

Journal Article•DOI•

A framework for sequential planning in multi-agent settings

[...]

Piotr J. Gmytrasiewicz¹, Prashant Doshi¹•Institutions (1)

University of Illinois at Chicago¹

01 Jul 2005-Journal of Artificial Intelligence Research

TL;DR: In this paper, the authors extend the framework of partially observable Markov decision processes (POMDPs) to multi-agent settings by incorporating the notion of agent models into the state space.

...read moreread less

Abstract: This paper extends the framework of partially observable Markov decision processes (POMDPs) to multi-agent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian updates to maintain their beliefs over time. The solutions map belief states to actions. Models of other agents may include their belief states and are related to agent types considered in games of incomplete information. We express the agents' autonomy by postulating that their models are not directly manipulable or observable by other agents. We show that important properties of POMDPs, such as convergence of value iteration, the rate of convergence, and piece-wise linearity and convexity of the value functions carry over to our framework. Our approach complements a more traditional approach to interactive settings which uses Nash equilibria as a solution paradigm. We seek to avoid some of the drawbacks of equilibria which may be non-unique and do not capture off-equilibrium behaviors. We do so at the cost of having to represent, process and continuously revise models of other agents. Since the agent's beliefs may be arbitrarily nested, the optimal solutions to decision making problems are only asymptotically computable. However, approximate belief updates and approximately optimal plans are computable. We illustrate our framework using a simple application domain, and we show examples of belief updates and value functions.

...read moreread less

315 citations

Journal Article•DOI•

Risk-sensitive reinforcement learning applied to control under constraints

[...]

Peter Geibel¹, Fritz Wysotzki²•Institutions (2)

University of Osnabrück¹, Technical University of Berlin²

01 Jul 2005-Journal of Artificial Intelligence Research

TL;DR: A model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies based on weighting the original value function and the risk, which was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column.

...read moreread less

Abstract: In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed.

...read moreread less

283 citations

Journal Article•DOI•

Finding approximate POMDP solutions through belief compression

[...]

Nicholas Roy¹, Geoffrey J. Gordon², Sebastian Thrun³•Institutions (3)

Massachusetts Institute of Technology¹, Carnegie Mellon University², Stanford University³

01 Jan 2005-Journal of Artificial Intelligence Research

TL;DR: This thesis describes a scalable approach to POMDP planning which uses low-dimensional representations of the belief space and demonstrates how to make use of a variant of Principal Components Analysis (PCA) called Exponential family PCA in order to compress certain kinds of large real-world PomDPs, and find policies for these problems.

...read moreread less

Abstract: Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in real-world POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, low-dimensional subspace embedded in the high-dimensional belief space. Finding a good approximation to the optimal value function for only this subspace can be much easier than computing the full value function. We introduce a new method for solving large-scale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, high-dimensional belief spaces using small sets of learned features of the belief state. We then plan only in terms of the low-dimensional belief features. By planning in this low-dimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks.

...read moreread less

244 citations

Exploiting structure to efficiently solve large scale partially observable markov decision processes

[...]

Pascal Poupart¹•Institutions (1)

University of Toronto¹

01 Jan 2005

TL;DR: This thesis first presents a Bounded Policy Iteration algorithm to robustly find a good policy represented by a small finite state controller, and describes three approaches that combine techniques capable of dealing with each source of intractability: VDC with BPI, VDCWith Perseus, and state abstraction with Perseus.

...read moreread less

Abstract: Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finite-horizon discrete POMDP is PSPACE-complete. In practice, two important sources of intractability plague most solution algorithms: Large policy spaces and large state spaces. On the other hand, for many real-world POMDPs it is possible to define effective policies with simple rules of thumb. This suggests that we may be able to find small policies that are near optimal. This thesis first presents a Bounded Policy Iteration (BPI) algorithm to robustly find a good policy represented by a small finite state controller. Real-world POMDPs also tend to exhibit structural properties that can be exploited to mitigate the effect of large state spaces. To that effect, a value-directed compression (VDC) technique is also presented to reduce POMDP models to lower dimensional representations. In practice, it is critical to simultaneously mitigate the impact of complex policy representations and large state spaces. Hence, this thesis describes three approaches that combine techniques capable of dealing with each source of intractability: VDC with BPI, VDC with Perseus (a randomized point-based value iteration algorithm by Spaan and Vlassis [136]), and state abstraction with Perseus. The scalability of those approaches is demonstrated on two problems with more than 33 million states: synthetic network management and a real-world system designed to assist elderly persons with cognitive deficiencies to carry out simple daily tasks such as hand-washing. This represents an important step towards the deployment of POMDP techniques in ever larger, real-world, sequential decision making problems.

...read moreread less

242 citations

Journal Article•DOI•

Optimal vehicle routing with real-time traffic information

[...]

Seongmoon Kim¹, Mark E. Lewis², C.C. White³•Institutions (3)

University of Miami¹, University of Michigan², Georgia Institute of Technology³

01 Jun 2005-IEEE Transactions on Intelligent Transportation Systems

TL;DR: Significant advantages when using real-time traffic information to optimal vehicle routing in a nonstationary stochastic network are demonstrated when using this information in terms of total cost savings and vehicle usage reduction while satisfying or improving service levels for just-in-time delivery.

...read moreread less

Abstract: This paper examines the value of real-time traffic information to optimal vehicle routing in a nonstationary stochastic network We present a systematic approach to aid in the implementation of transportation systems integrated with real-time information technology We develop decision-making procedures for determining the optimal driver attendance time, optimal departure times, and optimal routing policies under time-varying traffic flows based on a Markov decision process formulation With a numerical study carried out on an urban road network in Southeast Michigan, we demonstrate significant advantages when using this information in terms of total cost savings and vehicle usage reduction while satisfying or improving service levels for just-in-time delivery

...read moreread less

235 citations

Journal Article•DOI•

Revenue Management for Parallel Flights with Customer-Choice Behavior

[...]

Dan Zhang¹, William L. Cooper•Institutions (1)

University of Minnesota¹

01 May 2005-Operations Research

TL;DR: This work considers the simultaneous seat-inventory control of a set of parallel flights between a common origin and destination with dynamic customer choice among the flights as an extension of the classic multiperiod, single-flight "block demand" revenue management model.

...read moreread less

Abstract: We consider the simultaneous seat-inventory control of a set of parallel flights between a common origin and destination with dynamic customer choice among the flights. We formulate the problem as an extension of the classic multiperiod, single-flight "block demand" revenue management model. The resulting Markov decision process is quite complex, owing to its multidimensional state space and the fact that the airline's inventory controls do affect the distribution of demand. Using stochastic comparisons, consumer-choice models, and inventory-pooling ideas, we derive easily computable upper and lower bounds for the value function of our model. We propose simulation-based techniques for solving the stochastic optimization problem and also describe heuristics based upon an extension of a well-known linear programming formulation. We provide numerical examples.

...read moreread less

213 citations

Proceedings Article•DOI•

Speculative Markov blanket discovery for optimal feature selection

[...]

S. Yaramakala¹, Dimitris Margaritis¹•Institutions (1)

Iowa State University¹

27 Nov 2005

TL;DR: A novel algorithm for the induction of Markov blankets from data, called Fast-IAMB, that employs a heuristic to quickly recover the Markov blanket and performs in many cases faster and more reliably than existing algorithms without adversely affecting the accuracy of the recovered Markov covers.

...read moreread less

Abstract: In this paper we address the problem of learning the Markov blanket of a quantity from data in an efficient manner Markov blanket discovery can be used in the feature selection problem to find an optimal set of features for classification tasks, and is a frequently-used preprocessing phase in data mining, especially for high-dimensional domains. Our contribution is a novel algorithm for the induction of Markov blankets from data, called Fast-IAMB, that employs a heuristic to quickly recover the Markov blanket. Empirical results show that Fast-IAMB performs in many cases faster and more reliably than existing algorithms without adversely affecting the accuracy of the recovered Markov blankets.

...read moreread less

Journal Article•DOI•

Optimization for condition-based maintenance with semi-Markov decision process

[...]

Dongyan Chen¹, Kishor S. Trivedi²•Institutions (2)

Xavier University of Louisiana¹, Duke University²

01 Oct 2005-Reliability Engineering & System Safety

TL;DR: The semi-Markov decision process (SMDP) is built for the maintenance policy optimization of condition-based preventive maintenance problems, and the approach for joint optimization of inspection rate and maintenance policy is presented.

...read moreread less

Journal Article•DOI•

A Partially Observed Markov Decision Process for Dynamic Pricing

[...]

Yossi Aviv¹, Amit Pazgal¹•Institutions (1)

Washington University in St. Louis¹

01 Sep 2005-Management Science

TL;DR: A stylized partially observed Markov decision process (POMDP) framework is developed to study a dynamic pricing problem faced by sellers of fashion-like goods and proposes an active-learning heuristic pricing policy.

...read moreread less

Abstract: In this paper, we develop a stylized partially observed Markov decision process (POMDP) framework to study a dynamic pricing problem faced by sellers of fashion-like goods. We consider a retailer that plans to sell a given stock of items during a finite sales season. The objective of the retailer is to dynamically price the product in a way that maximizes expected revenues. Our model brings together various types of uncertainties about the demand, some of which are resolvable through sales observations. We develop a rigorous upper bound for the seller's optimal dynamic decision problem and use it to propose an active-learning heuristic pricing policy. We conduct a numerical study to test the performance of four different heuristic dynamic pricing policies in order to gain insight into several important managerial questions that arise in the context of revenue management.

...read moreread less

Proceedings Article•DOI•

A theoretical analysis of Model-Based Interval Estimation

[...]

Alexander L. Strehl¹, Michael L. Littman¹•Institutions (1)

Rutgers University¹

07 Aug 2005

TL;DR: The first theoretical analysis of MBIE is presented, proving its efficiency even under worst-case conditions, and a new performance metric, average loss, is introduced, which relates it to its less "online" cousins from the literature.

...read moreread less

Abstract: Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents the first theoretical analysis of MBIE, proving its efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less "online" cousins from the literature.

...read moreread less

Journal Article•DOI•

An actor-critic algorithm for constrained Markov decision processes

[...]

Vivek S. Borkar¹•Institutions (1)

Tata Institute of Fundamental Research¹

01 Mar 2005-Systems & Control Letters

TL;DR: An actor-critic type reinforcement learning algorithm is proposed and analyzed for constrained controlled Markov decision processes using multiscale stochastic approximation theory and the `envelope theorem' of mathematical economics.

...read moreread less

Book Chapter•DOI•

Modeling Medical Treatment Using Markov Decision Processes

[...]

Andrew J. Schaefer¹, Matthew D. Bailey¹, Steven M. Shechter¹, Mark S. Roberts¹•Institutions (1)

University of Pittsburgh¹

01 Jan 2005

TL;DR: This chapter describes MDP modeling in the context of medical treatment and discusses when MDPs are an appropriate technique, and discusses the challenges and opportunities for applying M DPs to medical treatment decisions.

...read moreread less

Abstract: Medical treatment decisions are often sequential and uncertain. Markov decision processes (MDPs) are an appropriate technique for modeling and solving such stochastic and dynamic decisions. This chapter gives an overview of MDP models and solution techniques. We describe MDP modeling in the context of medical treatment and discuss when MDPs are an appropriate technique. We review selected successful applications of MDPs to treatment decisions in the literature. We conclude with a discussion of the challenges and opportunities for applying MDPs to medical treatment decisions.

...read moreread less

Journal Article•DOI•

An Adaptive Sampling Algorithm for Solving Markov Decision Processes

[...]

Hyeong Soo Chang¹, Michael C. Fu¹, Jiaqiao Hu¹, Steven I. Marcus¹•Institutions (1)

University of Maryland, College Park¹

01 Jan 2005-Operations Research

TL;DR: An adaptive sampling algorithm that adaptively chooses which action to sample as the sampling process proceeds and generates an asymptotically unbiased estimator, whose bias is bounded by a quantity that converges to zero at rate (lnN)/ N.

...read moreread less

Abstract: Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process proceeds and generates an asymptotically unbiased estimator, whose bias is bounded by a quantity that converges to zero at rate (lnN)/ N, whereN is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm isO(( |A|N) H ), independent of the size of the state space, where | A| is the size of the action space andH is the horizon length. The algorithm can be used to create an approximate receding horizon control to solve infinite-horizon MDPs. To illustrate the algorithm, computational results are reported on simple examples from inventory control.

...read moreread less

Journal Article•DOI•

Efficient computation of time-bounded reachability probabilities in uniform continuous-time Markov decision processes

[...]

Christel Baier¹, Holger Hermanns², Joost-Pieter Katoen², Boudewijn R. Haverkort²•Institutions (2)

University of Bonn¹, University of Twente²

21 Nov 2005

TL;DR: This paper presents an efficient algorithm to compute the maximum (or minimum) probability to reach a set of goal states within a given time bound in a uniform CTMDP, and proves that these probabilities coincide for (time-abstract) history-dependent and Markovian schedulers that resolve nondeterminism either deterministically or in a randomized way.

...read moreread less

Abstract: A continuous-time Markov decision process (CTMDP) is a generalization of a continuous-time Markov chain in which both probabilistic and nondeterministic choices co-exist. This paper presents an efficient algorithm to compute the maximum (or minimum) probability to reach a set of goal states within a given time bound in a uniform CTMDP, i.e., a CTMDP in which the delay time distribution per state visit is the same for all states. It furthermore proves that these probabilities coincide for (time-abstract) history-dependent and Markovian schedulers that resolve nondeterminism either deterministically or in a randomized way.

...read moreread less

Factored partially observable Markov decision processes for dialogue management

[...]

Jason D. Williams¹, Pascal Poupart², Steve Young¹•Institutions (2)

University of Cambridge¹, University of Waterloo²

16 Sep 2005

TL;DR: This work shows how a dialogue model can be represented as a factored Partially Observable Markov Decision Process (POMDP), and shows howA dialogue manager produced with a POMDP optimisation technique may be directly compared to a handcrafted dialogue manager.

...read moreread less

Abstract: This work shows how a dialogue model can be represented as a factored Partially Observable Markov Decision Process (POMDP). The factored representation has several benefits, such as enabling more nuanced reward functions to be specified. Although our dialogue model is significantly larger than past work using POMDPs, experiments on a small testbed problem demonstrate that recent optimisation techniques scale well and produce policies which outperform a traditional fully-observable Markov Decision Process. This work then shows how a dialogue manager produced with a POMDP optimisation technique may be directly compared to a handcrafted dialogue manager. Experiments on the testbed problem show that automatically generated dialogue managers outperform several handcrafted dialogue managers, and that automatically generated dialogue managers for the testbed problem successfully adapt to changes in speech recognition accuracy.

...read moreread less

Journal Article•DOI•

On Solving the Multirotational Timber Harvesting Problem with Stochastic Prices: A Linear Complementarity Formulation

[...]

Margaret Insley¹, Kimberly Rollins²•Institutions (2)

University of Waterloo¹, University of Nevada, Reno²

01 Aug 2005-American Journal of Agricultural Economics

TL;DR: In this article, a two-factor real options model of the harvesting decision over infinite rotations assuming a known stochastic price process and using a rigorous Hamilton-Jacobi-Bellman methodology was developed.

...read moreread less

Abstract: This article develops a two-factor real options model of the harvesting decision over infinite rotations assuming a known stochastic price process and using a rigorous Hamilton-Jacobi-Bellman methodology. The harvesting problem is formulated as a linear complementarity problem that is solved numerically using a fully implicit finite difference method. This approach is contrasted with the Markov decision process models commonly used in the literature. The model is used to estimate the value of a representative stand in Ontario's boreal forest, both when there is complete flexibility regarding harvesting time and when regulations dictate the harvesting date.

...read moreread less

Book Chapter•DOI•

Recursive markov decision processes and recursive stochastic games

[...]

Kousha Etessami¹, Mihalis Yannakakis²•Institutions (2)

University of Edinburgh¹, Columbia University²

11 Jul 2005

TL;DR: Recursive Markov Decision Processes (RMDPs) and Recursive Simple Stochastic Games (RSSGs) are introduced, and the decidability and complexity of algorithms for their analysis and verification are studied.

...read moreread less

Abstract: We introduce Recursive Markov Decision Processes (RMDPs) and Recursive Simple Stochastic Games (RSSGs), and study the decidability and complexity of algorithms for their analysis and verification. These models extend Recursive Markov Chains (RMCs), introduced in [EY05a, EY05b] as a natural model for verification of probabilistic procedural programs and related systems involving both recursion and probabilistic behavior. RMCs define a class of denumerable Markov chains with a rich theory generalizing that of stochastic context-free grammars and multi-type branching processes, and they are also intimately related to probabilistic pushdown systems. RMDPs & RSSGs extend RMCs with one controller or two adversarial players, respectively. Such extensions are useful for modeling nondeterministic and concurrent behavior, as well as modeling a system’s interactions with an environment. We provide upper and lower bounds for deciding, given an RMDP (or RSSG) A and probability p, whether player 1 has a strategy to force termination at a desired exit with probability at least p. We also address “qualitative” termination, where p=1, and model checking questions.

...read moreread less

Journal Article•DOI•

Dynamic Workflow Composition: Using Markov Decision Processes

[...]

Prashant Doshi¹, Richard Goodwin², Rama Akkiraju², Kunal Verma³•Institutions (3)

University of Illinois at Chicago¹, IBM², University of Georgia³

01 Jan 2005-International Journal of Web Services Research

TL;DR: This work proposes using Markov decision processes (MDPs) to model workflow composition and produces workflows that are robust to non-deterministic behaviors of Web services and that adapt to a changing environment.

...read moreread less

Abstract: The advent of Web services has made automated workflow composition relevant to Web-based applications. One technique that has received some attention for automatically composing workflows is AI-based classical planning. However, workflows generated by classical planning algorithms suffer from the paradoxical assumption of deterministic behavior of Web services, then requiring the additional overhead of execution monitoring to recover from unexpected behavior of services due to service failures, and the dynamic nature of real-world environments. To address these concerns, we propose using Markov decision processes (MDPs) to model workflow composition. To account for the uncertainty over the true environmental model, and for dynamic environments, we interleave MDP-based workflow generation and Bayesian model learning. Consequently, our method models both the inherent stochastic nature of Web services and the dynamic nature of the environment. Our algorithm produces workflows that are robust to non-deterministic behaviors of Web services and that adapt to a changing environment. We use a supply chain scenario to demonstrate our method and provide empirical results.

...read moreread less

Proceedings Article•

Solving POMDPs with continuous or large discrete observation spaces

[...]

Jesse Hoey¹, Pascal Poupart²•Institutions (2)

University of Toronto¹, University of Waterloo²

30 Jul 2005

TL;DR: It is demonstrated how to find this partition while computing a policy, and how the resulting discretisation of the observation space reveals the relevant features of the application domain.

...read moreread less

Abstract: We describe methods to solve partially observable Markov decision processes (POMDPs) with continuous or large discrete observation spaces. Realistic problems often have rich observation spaces, posing significant problems for standard POMDP algorithms that require explicit enumeration of the observations. This problem is usually approached by imposing an a priori discretisation on the observation space, which can be sub-optimal for the decision making task. However, since only those observations that would change the policy need to be distinguished, the decision problem itself induces a lossless partitioning of the observation space. This paper demonstrates how to find this partition while computing a policy, and how the resulting discretisation of the observation space reveals the relevant features of the application domain. The algorithms are demonstrated on a toy example and on a realistic assisted living task.

...read moreread less

Journal Article•DOI•

mGPT: a probabilistic planner based on heuristic search

[...]

Blai Bonet¹, Hector Geffner²•Institutions (2)

Simón Bolívar University¹, Pompeu Fabra University²

01 Jul 2005-Journal of Artificial Intelligence Research

TL;DR: The version of the GPT planner used in the probabilistic track of the 4th International Planning Competition (IPC-4), called mGPT, solves Markov Decision Processes specified in the PPDDL language by extracting and using different classes of lower bounds along with various heuristic-search algorithms.

...read moreread less

Abstract: We describe the version of the GPT planner used in the probabilistic track of the 4th International Planning Competition (IPC-4). This version, called mGPT, solves Markov Decision Processes specified in the PPDDL language by extracting and using different classes of lower bounds along with various heuristic-search algorithms. The lower bounds are extracted from deterministic relaxations where the alternative probabilistic effects of an action are mapped into different, independent, deterministic actions. The heuristic-search algorithms use these lower bounds for focusing the updates and delivering a consistent value function over all states reachable from the initial state and the greedy policy.

...read moreread less

Journal Article•DOI•

Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems

[...]

Martin V. Butz¹, David E. Goldberg¹, Pier Luca Lanzi²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Polytechnic University of Milan²

01 Oct 2005-IEEE Transactions on Evolutionary Computation

TL;DR: The extension of XCS to gradient-based update methods results in a classifier system that is more robust and more parameter independent, solving large and difficult maze problems reliably.

...read moreread less

Abstract: The accuracy-based XCS classifier system has been shown to solve typical data mining problems in a machine-learning competitive way. However, successful applications in multistep problems, modeled by a Markov decision process, were restricted to very small problems. Until now, the temporal difference learning technique in XCS was based on deterministic updates. However, since a prediction is actually generated by a set of rules in XCS and Learning Classifier Systems in general, gradient-based update methods are applicable. The extension of XCS to gradient-based update methods results in a classifier system that is more robust and more parameter independent, solving large and difficult maze problems reliably. Additionally, the extension to gradient methods highlights the relation of XCS to other function approximation methods in reinforcement learning.

...read moreread less

Journal Article•DOI•

Modeling of risk-based inspection, maintenance and life-cycle cost with partially observable Markov decision processes

[...]

Ross B. Corotis¹, J. Hugh Ellis², Mingxiang Jiang³•Institutions (3)

University of Colorado Boulder¹, Johns Hopkins University², Caterpillar Inc.³

01 Mar 2005-Structure and Infrastructure Engineering

TL;DR: In this article, the utilization of Markov decision processes as a sequential decision algorithm in the management actions of infrastructure (inspection, maintenance and repair) is discussed, and the use of this approach to determine optimal inspection strategies is described, as well as the role of deterioration and maintenance for steel structures.

...read moreread less

Abstract: The utilization of Markov decision processes as a sequential decision algorithm in the management actions of infrastructure (inspection, maintenance and repair) is discussed. The realistic issue of partial information from inspection is described, and the classic approach of partially observable Markov decision processes is then introduced. The use of this approach to determine optimal inspection strategies is described, as well as the role of deterioration and maintenance for steel structures. Discrete structural shapes and maintenance actions provide a tractable approach. In-service inspection incorporates Bayesian updating and leads to optimal operation and initial design. Finally, the concept of management policy is described with strategy vectors.

...read moreread less

Journal Article•DOI•

Basic Ideas for Event-Based Optimization of Markov Systems

[...]

Xi-Ren Cao¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Jun 2005-Discrete Event Dynamic Systems

TL;DR: It is shown that Markov decision processes and the policy-gradient approach, or perturbation analysis, can be derived easily from two fundamental sensitivity formulas, and such formulas can be flexibly constructed, by first principles, with performance potentials as building blocks.

...read moreread less

Abstract: The goal of this paper is two-fold: First, we present a sensitivity point of view on the optimization of Markov systems. We show that Markov decision processes (MDPs) and the policy-gradient approach, or perturbation analysis (PA), can be derived easily from two fundamental sensitivity formulas, and such formulas can be flexibly constructed, by first principles, with performance potentials as building blocks. Second, with this sensitivity view we propose an event-based optimization approach, including the event-based sensitivity analysis and event-based policy iteration. This approach utilizes the special feature of a system characterized by events and illustrates how the potentials can be aggregated using the special feature and how the aggregated potential can be used in policy iteration. Compared with the traditional MDP approach, the event-based approach has its advantages: the number of aggregated potentials may scale to the system size despite that the number of states grows exponentially in the system size, this reduces the policy space and saves computation; the approach does not require actions at different states to be independent; and it utilizes the special feature of a system and does not need to know the exact transition probability matrix. The main ideas of the approach are illustrated by an admission control problem.

...read moreread less

Proceedings Article•

Lazy approximation for solving continuous finite-horizon MDPs

[...]

Lihong Li¹, Michael L. Littman¹•Institutions (1)

Rutgers University¹

09 Jul 2005

TL;DR: Empirical study shows that lazy approximation performs much better than discretization, and this new technique is successfully applied to a more realistic planetary rover planning problem.

...read moreread less

Abstract: Solving Markov decision processes (MDPs) with continuous state spaces is a challenge due to, among other problems. the well-known curse of dimensionality. Nevertheless, numerous real-world applications such as transportation planning and telescope observation scheduling exhibit a critical dependence on continuous states. Current approaches to continuous-state MDPs include discretizing their transition models. In this paper, we propose and study an alternative, discretization-free approach we call lazy approximation. Empirical study shows that lazy approximation performs much better than discretization, and we successfully applied this new technique to a more realistic planetary rover planning problem.

...read moreread less

Journal Article•DOI•

Hierarchical adaptive dynamic power management

[...]

Zhiyuan Ren, Bruce H. Krogh, Radu Marculescu

01 Apr 2005-IEEE Transactions on Computers

TL;DR: Simulation results show that the novel hierarchical scheme for adaptive dynamic power management under nonstationary service requests can lead to significant power savings compared to previously proposed heuristic approaches.

...read moreread less

Abstract: Dynamic power management aims at extending battery life by switching devices to lower-power modes when there is a reduced demand for service. Static power management strategies can lead to poor performance or unnecessary power consumption when there are wide variations in the rate of requests for service. This paper presents a hierarchical scheme for adaptive dynamic power management (DPM) under nonstationary service requests. As the main theoretical contribution, we model the nonstationary request process as a Markov-modulated process with a collection of modes, each corresponding to a particular stationary request process. Optimal DPM policies are precalculated offline for selected modes using standard algorithms available for stationary Markov decision processes (MDPs). The power manager then switches online among these policies to accommodate the stochastic mode-switching request dynamics using an adaptive algorithm to determine the optimal switching rule based on the observed sample path. As a target application, we present simulations of hierarchical DPM for hard disk drives where the read/write request arrivals are modeled as a Markov-modulated Poisson process. Simulation results show that the power consumption of our approach under highly nonstationary request arrivals is less than that of a previously proposed heuristic approach and is even comparable to that of the optimal policy under stationary Poisson request process with the same arrival rate as the average arrival rate of the nonstationary request process.

...read moreread less

Journal Article•DOI•

Managing Response Time in a Call-Routing Problem with Service Failure

[...]

Francis de Véricourt¹, Yong-Pin Zhou²•Institutions (2)

Duke University¹, University of Washington²

01 Nov 2005-Operations Research

TL;DR: In this paper, a routing problem in a system where customers call back when their problems are not completely resolved by the customer service representatives (CSRs) is analyzed, and the concept of call resolution probability constitutes a good proxy for call quality.

...read moreread less

Abstract: Traditional research on routing in queueing systems usually ignores service quality related factors. In this paper, we analyze the routing problem in a system where customers call back when their problems are not completely resolved by the customer service representatives (CSRs). We introduce the concept of call resolution probability, and we argue that it constitutes a good proxy for call quality. For each call, both the call resolution probability (p) and the average service time (1/µ) are CSR dependent. We use a Markov decision process formulation to obtain analytical results and insights about the optimal routing policy that minimizes the average total time of call resolution, including callbacks. In particular, we provide sufficient conditions under which it is optimal to route to the CSR with the highest call resolution rate (pµ) among those available. We also develop efficient heuristics that can be easily implemented in practice.

...read moreread less

Collapse