Learning policies for partially observable environments: scaling up

doi:10.1016/B978-1-55860-377-6.50052-9

Book ChapterDOI

Learning policies for partially observable environments: scaling up

Michael L. Littman, +2 more

- pp 495-503

Chats0

TLDR

This paper discusses several simple solution methods and shows that all are capable of finding near- optimal policies for a selection of extremely small POMDP'S taken from the learning literature, but shows that none are able to solve a slightly larger and noisier problem based on robot navigation.

Abstract:

Partially observable Markov decision processes (POMDP's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of POMDP's is motivated by a need to address realistic problems, existing techniques for finding optimal behavior do not appear to scale well and have been unable to find satisfactory policies for problems with more than a dozen states. After a brief review of POMDP's, this paper discusses several simple solution methods and shows that all are capable of finding near- optimal policies for a selection of extremely small POMDP'S taken from the learning literature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We find that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning in neural networks

Jürgen Schmidhuber

- 01 Jan 2015 -

Neural Networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

...read moreread less

Journal ArticleDOI

Reinforcement learning: a survey

Leslie Pack Kaelbling, +2 more

- 01 Jan 1996 -

Journal of Artificial Intelligence Resea...

TL;DR: Central issues of reinforcement learning are discussed, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.

...read moreread less

Journal ArticleDOI

Planning and Acting in Partially Observable Stochastic Domains

Leslie Pack Kaelbling, +2 more

- 01 May 1998 -

Artificial Intelligence

TL;DR: A novel algorithm for solving pomdps off line and how, in some cases, a finite-memory controller can be extracted from the solution to a POMDP is outlined.

...read moreread less

Journal ArticleDOI

Machine-Learning Research

Thomas G. Dietterich

- 15 Dec 1997 -

Ai Magazine

TL;DR: This article summarizes four directions of machine-learning research, the improvement of classification accuracy by learning ensembles of classifiers, methods for scaling up supervised learning algorithms, reinforcement learning, and the learning of complex stochastic models.

...read moreread less

Proceedings Article

Point-based value iteration: an anytime algorithm for POMDPs

Joelle Pineau, +2 more

TL;DR: This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning, and presents results on a robotic laser tag problem as well as three test domains from the literature.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

The Optimal Control of Partially Observable Markov Processes over a Finite Horizon

Richard D. Smallwood, +1 more

- 01 Oct 1973 -

Operations Research

TL;DR: In this article, the optimal control problem for a class of mathematical models in which the system to be controlled is characterized by a finite-state discrete-time Markov process is formulated.

...read moreread less

Journal ArticleDOI

The Complexity of Markov Decision Processes

Christos H. Papadimitriou, +1 more

- 01 Aug 1987 -

Mathematics of Operations Research

TL;DR: All three variants of the classical problem of optimal policy computation in Markov decision processes, finite horizon, infinite horizon discounted, and infinite horizon average cost are shown to be complete for P, and therefore most likely cannot be solved by highly parallel algorithms.

...read moreread less

Journal ArticleDOI

Convergence of Stochastic Iterative Dynamic Programming Algorithms

Tommi S. Jaakkola, +2 more

TL;DR: A rigorous proof of convergence of DP-based learning algorithms is provided by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem, which establishes a general class of convergent algorithms to which both TD() and Q-learning belong.

...read moreread less

Journal ArticleDOI

Optimal control of Markov processes with incomplete state information

Karl Johan Åström

- 01 Feb 1965 -

Journal of Mathematical Analysis and App...

Proceedings Article

Acting Optimally in Partially Observable Stochastic Domains

Anthony R. Cassandra, +2 more

TL;DR: The existing algorithms for computing optimal control strategies for partially observable stochastic environments are found to be highly computationally inefficient and a new algorithm is developed that is empirically more efficient.

...read moreread less

Learning policies for partially observable environments: scaling up

Citations

Deep learning in neural networks

Reinforcement learning: a survey

Planning and Acting in Partially Observable Stochastic Domains

Machine-Learning Research

Point-based value iteration: an anytime algorithm for POMDPs

References

The Optimal Control of Partially Observable Markov Processes over a Finite Horizon

The Complexity of Markov Decision Processes

Convergence of Stochastic Iterative Dynamic Programming Algorithms

Optimal control of Markov processes with incomplete state information

Acting Optimally in Partially Observable Stochastic Domains

Related Papers (5)

Planning and Acting in Partially Observable Stochastic Domains

Point-based value iteration: an anytime algorithm for POMDPs

Reinforcement Learning: An Introduction

The Complexity of Markov Decision Processes

Dynamic Programming