scispace - formally typeset
Open AccessPosted Content

Compositional Planning Using Optimal Option Models

TLDR
In this article, a unified view of intra-and inter-option model learning is presented, based on a major generalisation of the Bellman equation, which enables compositional planning over many levels of abstraction.
Abstract
In this paper we introduce a framework for option model composition. Option models are temporal abstractions that, like macro-operators in classical planning, jump directly from a start state to an end state. Prior work has focused on constructing option models from primitive actions, by intra-option model learning; or on using option models to construct a value function, by inter-option planning. We present a unified view of intra- and inter-option model learning, based on a major generalisation of the Bellman equation. Our fundamental operation is the recursive composition of option models into other option models. This key idea enables compositional planning over many levels of abstraction. We illustrate our framework using a dynamic programming algorithm that simultaneously constructs optimal option models for multiple subgoals, and also searches over those option models to provide rapid progress towards other subgoals.

read more

Citations
More filters
Posted Content

Reinforcement Learning with Unsupervised Auxiliary Tasks

TL;DR: This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Posted Content

The Option-Critic Architecture

TL;DR: This paper propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, without the need to provide any additional rewards or subgoals.
Journal ArticleDOI

From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning

TL;DR: The results establish a principled link between high-level actions and abstract representations, a concrete theoretical foundation for constructing abstract representations with provable properties, and a practical mechanism for autonomously learning abstract high- level representations.
Posted Content

Variational Intrinsic Control

TL;DR: In this article, the authors introduce an unsupervised reinforcement learning method for discovering the set of intrinsic options available to an agent, which is learned by maximizing the number of different states an agent can reliably reach.
Journal ArticleDOI

The algorithmic anatomy of model-based evaluation

TL;DR: This work studies the realization of MB calculations, and the ways that this might be woven together with MF values and evaluation methods.
References
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Book

Dynamic Programming

TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.
Book

Introduction to Reinforcement Learning

TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
Journal ArticleDOI

What is dynamic programming

TL;DR: Sequence alignment methods often use something called a 'dynamic programming' algorithm, which can be a good idea or a bad idea, depending on the method used.
Journal ArticleDOI

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

TL;DR: It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning.
Related Papers (5)