Compositional Planning Using Optimal Option Models

Open AccessPosted Content

Compositional Planning Using Optimal Option Models

- 27 Jun 2012 -

TLDR

In this article, a unified view of intra-and inter-option model learning is presented, based on a major generalisation of the Bellman equation, which enables compositional planning over many levels of abstraction.

Abstract:

In this paper we introduce a framework for option model composition. Option models are temporal abstractions that, like macro-operators in classical planning, jump directly from a start state to an end state. Prior work has focused on constructing option models from primitive actions, by intra-option model learning; or on using option models to construct a value function, by inter-option planning. We present a unified view of intra- and inter-option model learning, based on a major generalisation of the Bellman equation. Our fundamental operation is the recursive composition of option models into other option models. This key idea enables compositional planning over many levels of abstraction. We illustrate our framework using a dynamic programming algorithm that simultaneously constructs optimal option models for multiple subgoals, and also searches over those option models to provide rapid progress towards other subgoals.

Citations

PDF

Open Access

More filters

Posted Content

Reinforcement Learning with Unsupervised Auxiliary Tasks

Max Jaderberg, +6 more

- 16 Nov 2016 -

arXiv: Learning

TL;DR: This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.

...read moreread less

Posted Content

The Option-Critic Architecture

Pierre-Luc Bacon, +2 more

- 16 Sep 2016 -

arXiv: Artificial Intelligence

TL;DR: This paper propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, without the need to provide any additional rewards or subgoals.

...read moreread less

Journal ArticleDOI

From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning

George Konidaris, +2 more

- 31 Jan 2018 -

Journal of Artificial Intelligence Resea...

TL;DR: The results establish a principled link between high-level actions and abstract representations, a concrete theoretical foundation for constructing abstract representations with provable properties, and a practical mechanism for autonomously learning abstract high- level representations.

...read moreread less

Posted Content

Variational Intrinsic Control

Karol Gregor, +2 more

- 22 Nov 2016 -

arXiv: Learning

TL;DR: In this article, the authors introduce an unsupervised reinforcement learning method for discovering the set of intrinsic options available to an agent, which is learned by maximizing the number of different states an agent can reliably reach.

...read moreread less

Journal ArticleDOI

The algorithmic anatomy of model-based evaluation

Nathaniel D. Daw, +1 more

- 05 Nov 2014 -

Philosophical Transactions of the Royal ...

TL;DR: This work studies the realization of MB calculations, and the ways that this might be woven together with MF values and evaluation methods.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Book

Dynamic Programming

Richard Ernest Bellman

TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.

...read moreread less

Book

Introduction to Reinforcement Learning

Richard S. Sutton, +1 more

TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.

...read moreread less

Journal ArticleDOI

What is dynamic programming

Sean R. Eddy

- 01 Jul 2004 -

Nature Biotechnology

TL;DR: Sequence alignment methods often use something called a 'dynamic programming' algorithm, which can be a good idea or a bad idea, depending on the method used.

...read moreread less

Journal ArticleDOI

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Richard S. Sutton, +2 more

- 01 Aug 1999 -

Artificial Intelligence

TL;DR: It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning.

...read moreread less

Related Papers (5)

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Richard S. Sutton, +2 more

- 01 Aug 1999 -

Artificial Intelligence

arXiv: Artificial Intelligence

Compositional Planning Using Optimal Option Models

Citations

Reinforcement Learning with Unsupervised Auxiliary Tasks

The Option-Critic Architecture

From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning

Variational Intrinsic Control

The algorithmic anatomy of model-based evaluation

References

Reinforcement Learning: An Introduction

Dynamic Programming

Introduction to Reinforcement Learning

What is dynamic programming

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Related Papers (5)

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density

Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

Human-level control through deep reinforcement learning

The Option-Critic Architecture