Book ChapterDOI
Learning policies for partially observable environments: scaling up
Michael L. Littman,Anthony R. Cassandra,Leslie Pack Kaelbling +2 more
- pp 495-503
Reads0
Chats0
TLDR
This paper discusses several simple solution methods and shows that all are capable of finding near- optimal policies for a selection of extremely small POMDP'S taken from the learning literature, but shows that none are able to solve a slightly larger and noisier problem based on robot navigation.Abstract:
Partially observable Markov decision processes (POMDP's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of POMDP's is motivated by a need to address realistic problems, existing techniques for finding optimal behavior do not appear to scale well and have been unable to find satisfactory policies for problems with more than a dozen states. After a brief review of POMDP's, this paper discusses several simple solution methods and shows that all are capable of finding near- optimal policies for a selection of extremely small POMDP'S taken from the learning literature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We find that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains.read more
Citations
More filters
Journal ArticleDOI
Deep learning in neural networks
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
Journal ArticleDOI
Reinforcement learning: a survey
TL;DR: Central issues of reinforcement learning are discussed, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.
Journal ArticleDOI
Planning and Acting in Partially Observable Stochastic Domains
TL;DR: A novel algorithm for solving pomdps off line and how, in some cases, a finite-memory controller can be extracted from the solution to a POMDP is outlined.
Journal ArticleDOI
Machine-Learning Research
TL;DR: This article summarizes four directions of machine-learning research, the improvement of classification accuracy by learning ensembles of classifiers, methods for scaling up supervised learning algorithms, reinforcement learning, and the learning of complex stochastic models.
Proceedings Article
Point-based value iteration: an anytime algorithm for POMDPs
TL;DR: This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning, and presents results on a robotic laser tag problem as well as three test domains from the literature.
References
More filters
Journal ArticleDOI
The Optimal Control of Partially Observable Markov Processes over a Finite Horizon
TL;DR: In this article, the optimal control problem for a class of mathematical models in which the system to be controlled is characterized by a finite-state discrete-time Markov process is formulated.
Journal ArticleDOI
The Complexity of Markov Decision Processes
TL;DR: All three variants of the classical problem of optimal policy computation in Markov decision processes, finite horizon, infinite horizon discounted, and infinite horizon average cost are shown to be complete for P, and therefore most likely cannot be solved by highly parallel algorithms.
Journal ArticleDOI
Convergence of Stochastic Iterative Dynamic Programming Algorithms
TL;DR: A rigorous proof of convergence of DP-based learning algorithms is provided by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem, which establishes a general class of convergent algorithms to which both TD() and Q-learning belong.
Proceedings Article
Acting Optimally in Partially Observable Stochastic Domains
TL;DR: The existing algorithms for computing optimal control strategies for partially observable stochastic environments are found to be highly computationally inefficient and a new algorithm is developed that is empirically more efficient.