scispace - formally typeset
Search or ask a question
Topic

Markov decision process

About: Markov decision process is a research topic. Over the lifetime, 14258 publications have been published within this topic receiving 351684 citations. The topic is also known as: MDP & MDPs.


Papers
More filters
Book
15 Dec 2008
TL;DR: This exciting and pioneering new overview of multiagent systems, which are online systems composed of multiple interacting intelligent agents, i.e., online trading, offers a newly seen computer science perspective on multi agent systems, while integrating ideas from operations research, game theory, economics, logic, and even philosophy and linguistics.
Abstract: This exciting and pioneering new overview of multiagent systems, which are online systems composed of multiple interacting intelligent agents, i.e., online trading, offers a newly seen computer science perspective on multiagent systems, while integrating ideas from operations research, game theory, economics, logic, and even philosophy and linguistics. The authors emphasize foundations to create a broad and rigorous treatment of their subject, with thorough presentations of distributed problem solving, game theory, multiagent communication and learning, social choice, mechanism design, auctions, cooperative game theory, and modal logics of knowledge and belief. For each topic, basic concepts are introduced, examples are given, proofs of key results are offered, and algorithmic considerations are examined. An appendix covers background material in probability theory, classical logic, Markov decision processes and mathematical programming. Written by two of the leading researchers of this engaging field, this book will surely serve as THE reference for researchers in the fastest-growing area of computer science, and be used as a text for advanced undergraduate or graduate courses.

2,068 citations

Book
01 Aug 1991
TL;DR: In this paper, the authors present a model-based approach to solving linear programming problems, which is based on the Gauss-Jordan method for solving systems of linear equations, and the Branch-and-Bound method for solving mixed integer programming problems.
Abstract: 1. INTRODUCTION TO MODEL BUILDING. An Introduction to Modeling. The Seven-Step Model-Building Process. Examples. 2. BASIC LINEAR ALGEBRA. Matrices and Vectors. Matrices and Systems of Linear Equations. The Gauss-Jordan Method for Solving Systems of Linear Equations. Linear Independence and Linear Dependence. The Inverse of a Matrix. Determinants. 3. INTRODUCTION TO LINEAR PROGRAMMING. What is a Linear Programming Problem? The Graphical Solution of Two-Variable Linear Programming Problems. Special Cases. A Diet Problem. A Work-Scheduling Problem. A Capital Budgeting Problem. Short-term Financial Planning. Blending Problems. Production Process Models. Using Linear Programming to Solve Multiperiod Decision Problems: An Inventory Model. Multiperiod Financial Models. Multiperiod Work Scheduling. 4. THE SIMPLEX ALGORITHM AND GOAL PROGRAMMING. How to Convert an LP to Standard Form. Preview of the Simplex Algorithm. The Simplex Algorithm. Using the Simplex Algorithm to Solve Minimization Problems. Alternative Optimal Solutions. Unbounded LPs. The LINDO Computer Package. Matrix Generators, LINGO, and Scaling of LPs. Degeneracy and the Convergence of the Simplex Algorithm. The Big M Method. The Two-Phase Simplex Method. Unrestricted-in-Sign Variables. Karmarkar"s Method for Solving LPs. Multiattribute Decision-Making in the Absence of Uncertainty: Goal Programming. Solving LPs with Spreadsheets. 5. SENSITIVITY ANALYSIS: AN APPLIED APPROACH. A Graphical Introduction to Sensitivity Analysis. The Computer and Sensitivity Analysis. Managerial Use of Shadow Prices. What Happens to the Optimal z-value if the Current Basis is No Longer Optimal? 6. SENSITIVITY ANALYSIS AND DUALITY. A Graphical Introduction to Sensitivity Analysis. Some Important Formulas. Sensitivity Analysis. Sensitivity Analysis When More Than One Parameter is Changed: The 100% Rule. Finding the Dual of an LP. Economic Interpretation of the Dual Problem. The Dual Theorem and Its Consequences. Shadow Prices. Duality and Sensitivity Analysis. 7. TRANSPORTATION, ASSIGNMENT, AND TRANSSHIPMENT PROBLEMS. Formulating Transportation Problems. Finding Basic Feasible Solutions for Transportation Problems. The Transportation Simplex Method. Sensitivity Analysis for Transportation Problems. Assignment Problems. Transshipment Problems. 8. NETWORK MODELS. Basic Definitions. Shortest Path Problems. Maximum Flow Problems. CPM and PERT. Minimum Cost Network Flow Problems. Minimum Spanning Tree Problems. The Network Simplex Method. 9. INTEGER PROGRAMMING. Introduction to Integer Programming. Formulation Integer Programming Problems. The Branch-and-Bound Method for Solving Pure Integer Programming Problems. The Branch-and-Bound Method for Solving Mixed Integer Programming Problems. Solving Knapsack Problems by the Branch-and-Bound Method. Solving Combinatorial Optimization Problems by the Branch-and-Bound Method. Implicit Enumeration. The Cutting Plane Algorithm. 10. ADVANCED TOPICS IN LINEAR PROGRAMMING. The Revised Simplex Algorithm. The Product Form of the Inverse. Using Column Generation to Solve Large-Scale LPs. The Dantzig-Wolfe Decomposition Algorithm. The Simplex Methods for Upper-Bounded Variables. Karmarkar"s Method for Solving LPs. 11. NONLINEAR PROGRAMMING. Review of Differential Calculus. Introductory Concepts. Convex and Concave Functions. Solving NLPs with One Variable. Golden Section Search. Unconstrained Maximization and Minimization with Several Variables. The Method of Steepest Ascent. Lagrange Multiples. The Kuhn-Tucker Conditions. Quadratic Programming. Separable Programming. The Method of Feasible Directions. Pareto Optimality and Tradeoff Curves. 12. REVIEW OF CALCULUS AND PROBABILITY. Review of Integral Calculus. Differentiation of Integrals. Basic Rules of Probability. Bayes" Rule. Random Variables. Mean Variance and Covariance. The Normal Distribution. Z-Transforms. Review Problems. 13. DECISION MAKING UNDER UNCERTAINTY. Decision Criteria. Utility Theory. Flaws in Expected Utility Maximization: Prospect Theory and Framing Effects. Decision Trees. Bayes" Rule and Decision Trees. Decision Making with Multiple Objectives. The Analytic Hierarchy Process. Review Problems. 14. GAME THEORY. Two-Person Zero-Sum and Constant-Sum Games: Saddle Points. Two-Person Zero-Sum Games: Randomized Strategies, Domination, and Graphical Solution. Linear Programming and Zero-Sum Games. Two-Person Nonconstant-Sum Games. Introduction to n-Person Game Theory. The Core of an n-Person Game. The Shapley Value. 15. DETERMINISTIC EOQ INVENTORY MODELS. Introduction to Basic Inventory Models. The Basic Economic Order Quantity Model. Computing the Optimal Order Quantity When Quantity Discounts Are Allowed. The Continuous Rate EOQ Model. The EOQ Model with Back Orders Allowed. Multiple Product Economic Order Quantity Models. Review Problems. 16. PROBABILISTIC INVENTORY MODELS. Single Period Decision Models. The Concept of Marginal Analysis. The News Vendor Problem: Discrete Demand. The News Vendor Problem: Continuous Demand. Other One-Period Models. The EOQ with Uncertain Demand: the (r, q) and (s,S models). The EOQ with Uncertain Demand: the Service Level Approach to Determining Safety Stock Level. Periodic Review Policy. The ABC Inventory Classification System. Exchange Curves. Review Problems. 17. MARKOV CHAINS. What is a Stochastic Process. What is a Markov Chain? N-Step Transition Probabilities. Classification of States in a Markov Chain. Steady-State Probabilities and Mean First Passage Times. Absorbing Chains. Work-Force Planning Models. 18.DETERMINISTIC DYNAMIC PROGRAMMING. Two Puzzles. A Network Problem. An Inventory Problem. Resource Allocation Problems. Equipment Replacement Problems. Formulating Dynamic Programming Recursions. The Wagner-Whitin Algorithm and the Silver-Meal Heuristic. Forward Recursions. Using Spreadsheets to Solve Dynamic Programming Problems. Review Problems. 19. PROBABILISTIC DYNAMIC PROGRAMMING. When Current Stage Costs are Uncertain but the Next Period"s State is Certain. A Probabilistic Inventory Model. How to Maximize the Probability of a Favorable Event Occurring. Further Examples of Probabilistic Dynamic Programming Formulations. Markov Decision Processes. Review Problems. 20. QUEUING THEORY. Some Queuing Terminology. Modeling Arrival and Service Processes. Birth-Death Processes. M/M/1/GD/o/o Queuing System and the Queuing Formula L=o W, The M/M/1/GD/o Queuing System. The M/M/S/ GD/o/o Queuing System. The M/G/ o/GD/oo and GI/G/o/GD/o/oModels. The M/ G/1/GD/o/o Queuing System. Finite Source Models: The Machine Repair Model. Exponential Queues in Series and Opening Queuing Networks. How to Tell whether Inter-arrival Times and Service Times Are Exponential. The M/G/S/GD/S/o System (Blocked Customers Cleared). Closed Queuing Networks. An Approximation for the G/G/M Queuing System. Priority Queuing Models. Transient Behavior of Queuing Systems. Review Problems. 21.SIMULATION. Basic Terminology. An Example of a Discrete Event Simulation. Random Numbers and Monte Carlo Simulation. An Example of Monte Carlo Simulation. Simulations with Continuous Random Variables. An Example of a Stochastic Simulation. Statistical Analysis in Simulations. Simulation Languages. The Simulation Process. 22.SIMULATION WITH PROCESS MODEL. Simulating an M/M/1 Queuing System. Simulating an M/M/2 System. A Series System. Simulating Open Queuing Networks. Simulating Erlang Service Times. What Else Can Process Models Do? 23. SPREADSHEET SIMULATION WITH @RISK. Introduction to @RISK: The Newsperson Problem. Modeling Cash Flows From A New Product. Bidding Models. Reliability and Warranty Modeling. Risk General Function. Risk Cumulative Function. Risktrigen Function. Creating a Distribution Based on a Point Forecast. Forecasting Income of a Major Corporation. Using Data to Obtain Inputs For New Product Simulations. Playing Craps with @RISK. Project Management. Simulating the NBA Finals. 24. FORECASTING. Moving Average Forecasting Methods. Simple Exponential Smoothing. Holt"s Method: Exponential Smoothing with Trend. Winter"s Method: Exponential Smoothing with Seasonality. Ad Hoc Forecasting, Simple Linear Regression. Fitting Non-Linear Relationships. Multiple Regression. Answers to Selected Problems. Index.

1,790 citations

Book
30 Mar 1999
TL;DR: In this paper, a unified approach for the study of constrained Markov decision processes with a countable state space and unbounded costs is presented, where a single controller has several objectives; it is desirable to design a controller that minimize one of cost objectives, subject to inequality constraints on other cost objectives.
Abstract: This report presents a unified approach for the study of constrained Markov decision processes with a countable state space and unbounded costs. We consider a single controller having several objectives; it is desirable to design a controller that minimize one of cost objective, subject to inequality constraints on other cost objectives. The objectives that we study are both the expected average cost, as well as the expected total cost (of which the discounted cost is a special case). We provide two frameworks: the case were costs are bounded below, as well as the contracting framework. We characterize the set of achievable expected occupation measures as well as performance vectors. This allows us to reduce the original control dynamic problem into an infinite Linear Programming. We present a Lagrangian approach that enables us to obtain sensitivity analysis. In particular, we obtain asymptotical results for the constrained control problem: convergence of both the value and the policies in the time horizon and in the discount factor. Finally, we present and several state truncation algorithms that enable to approximate the solution of the original control problem via finite linear programs.

1,519 citations

Journal ArticleDOI
TL;DR: The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges with probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction.
Abstract: This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics--as a subroutine hierarchy--and a declarative semantics--as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consistent with the given hierarchy. The decomposition also creates opportunities to exploit state abstractions, so that individual MDPs within the hierarchy can ignore large parts of the state space. This is important for the practical application of the method. This paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges with probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this nonhierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.

1,486 citations

Journal ArticleDOI
TL;DR: All three variants of the classical problem of optimal policy computation in Markov decision processes, finite horizon, infinite horizon discounted, and infinite horizon average cost are shown to be complete for P, and therefore most likely cannot be solved by highly parallel algorithms.
Abstract: We investigate the complexity of the classical problem of optimal policy computation in Markov decision processes. All three variants of the problem finite horizon, infinite horizon discounted, and infinite horizon average cost were known to be solvable in polynomial time by dynamic programming finite horizon problems, linear programming, or successive approximation techniques infinite horizon. We show that they are complete for P, and therefore most likely cannot be solved by highly parallel algorithms. We also show that, in contrast, the deterministic cases of all three problems can be solved very fast in parallel. The version with partially observed states is shown to be PSPACE-complete, and thus even less likely to be solved in polynomial time than the NP-complete problems; in fact, we show that, most likely, it is not possible to have an efficient on-line implementation involving polynomial time on-line computations and memory of an optimal policy, even if an arbitrary amount of precomputation is allowed. Finally, the variant of the problem in which there are no observations is shown to be NP-complete.

1,466 citations


Network Information
Related Topics (5)
Optimization problem
96.4K papers, 2.1M citations
89% related
Markov chain
51.9K papers, 1.3M citations
86% related
Robustness (computer science)
94.7K papers, 1.6M citations
84% related
Probabilistic logic
56K papers, 1.3M citations
83% related
Server
79.5K papers, 1.4M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023961
20221,967
20211,278
20201,352
20191,231
2018865