scispace - formally typeset
Search or ask a question
Topic

Markov decision process

About: Markov decision process is a research topic. Over the lifetime, 14258 publications have been published within this topic receiving 351684 citations. The topic is also known as: MDP & MDPs.


Papers
More filters
Book
01 Jan 1996
TL;DR: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
Abstract: From the Publisher: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control

3,665 citations

Journal ArticleDOI
TL;DR: It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning.

3,233 citations

Proceedings ArticleDOI
04 Jul 2004
TL;DR: This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.
Abstract: We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using "inverse reinforcement learning" to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert's reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert's unknown reward function.

3,110 citations

Book
15 Jun 1960

3,046 citations

Book
01 Jan 1981
TL;DR: A First Course Algebraic methods in Markov Chains Ratio Theorems of Transition Probabilities and Applications Sums of Independent Random Variables as a Markov Chain Order Statistics, Poisson Processes, and Applications Continuous Time Markov chains Diffusion Processes Compounding Stochastic Processes Fluctuation Theory of Partial Sum of Independent Identically Distributed Random Variable Queueing Processes Miscellaneous Problems Index as discussed by the authors.
Abstract: Preface Preface to A First Course Preface to First Edition Contents of A First Course Algebraic Methods in Markov Chains Ratio Theorems of Transition Probabilities and Applications Sums of Independent Random Variables as a Markov Chain Order Statistics, Poisson Processes, and Applications Continuous Time Markov Chains Diffusion Processes Compounding Stochastic Processes Fluctuation Theory of Partial Sums of Independent Identically Distributed Random Variables Queueing Processes Miscellaneous Problems Index

2,987 citations


Network Information
Related Topics (5)
Optimization problem
96.4K papers, 2.1M citations
89% related
Markov chain
51.9K papers, 1.3M citations
86% related
Robustness (computer science)
94.7K papers, 1.6M citations
84% related
Probabilistic logic
56K papers, 1.3M citations
83% related
Server
79.5K papers, 1.4M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023961
20221,967
20211,278
20201,352
20191,231
2018865