scispace - formally typeset
Book ChapterDOI

Learning policies for partially observable environments: scaling up

Reads0
Chats0
TLDR
This paper discusses several simple solution methods and shows that all are capable of finding near- optimal policies for a selection of extremely small POMDP'S taken from the learning literature, but shows that none are able to solve a slightly larger and noisier problem based on robot navigation.
Abstract
Partially observable Markov decision processes (POMDP's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of POMDP's is motivated by a need to address realistic problems, existing techniques for finding optimal behavior do not appear to scale well and have been unable to find satisfactory policies for problems with more than a dozen states. After a brief review of POMDP's, this paper discusses several simple solution methods and shows that all are capable of finding near- optimal policies for a selection of extremely small POMDP'S taken from the learning literature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We find that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains.

read more

Citations
More filters
Journal ArticleDOI

Deep learning in neural networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
Journal ArticleDOI

Reinforcement learning: a survey

TL;DR: Central issues of reinforcement learning are discussed, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.
Journal ArticleDOI

Planning and Acting in Partially Observable Stochastic Domains

TL;DR: A novel algorithm for solving pomdps off line and how, in some cases, a finite-memory controller can be extracted from the solution to a POMDP is outlined.
Journal ArticleDOI

Machine-Learning Research

Thomas G. Dietterich
- 15 Dec 1997 - 
TL;DR: This article summarizes four directions of machine-learning research, the improvement of classification accuracy by learning ensembles of classifiers, methods for scaling up supervised learning algorithms, reinforcement learning, and the learning of complex stochastic models.
Proceedings Article

Point-based value iteration: an anytime algorithm for POMDPs

TL;DR: This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning, and presents results on a robotic laser tag problem as well as three test domains from the literature.
References
More filters
Journal ArticleDOI

The Optimal Control of Partially Observable Markov Processes over a Finite Horizon

TL;DR: In this article, the optimal control problem for a class of mathematical models in which the system to be controlled is characterized by a finite-state discrete-time Markov process is formulated.
Journal ArticleDOI

The Complexity of Markov Decision Processes

TL;DR: All three variants of the classical problem of optimal policy computation in Markov decision processes, finite horizon, infinite horizon discounted, and infinite horizon average cost are shown to be complete for P, and therefore most likely cannot be solved by highly parallel algorithms.
Journal ArticleDOI

Convergence of Stochastic Iterative Dynamic Programming Algorithms

TL;DR: A rigorous proof of convergence of DP-based learning algorithms is provided by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem, which establishes a general class of convergent algorithms to which both TD() and Q-learning belong.
Proceedings Article

Acting Optimally in Partially Observable Stochastic Domains

TL;DR: The existing algorithms for computing optimal control strategies for partially observable stochastic environments are found to be highly computationally inefficient and a new algorithm is developed that is empirically more efficient.