Open AccessPosted Content
Compositional Planning Using Optimal Option Models
David Silver,Kamil Ciosek +1 more
TLDR
In this article, a unified view of intra-and inter-option model learning is presented, based on a major generalisation of the Bellman equation, which enables compositional planning over many levels of abstraction.Abstract:
In this paper we introduce a framework for option model composition. Option models are temporal abstractions that, like macro-operators in classical planning, jump directly from a start state to an end state. Prior work has focused on constructing option models from primitive actions, by intra-option model learning; or on using option models to construct a value function, by inter-option planning. We present a unified view of intra- and inter-option model learning, based on a major generalisation of the Bellman equation. Our fundamental operation is the recursive composition of option models into other option models. This key idea enables compositional planning over many levels of abstraction. We illustrate our framework using a dynamic programming algorithm that simultaneously constructs optimal option models for multiple subgoals, and also searches over those option models to provide rapid progress towards other subgoals.read more
Citations
More filters
Posted Content
Reinforcement Learning with Unsupervised Auxiliary Tasks
Max Jaderberg,Volodymyr Mnih,Wojciech Marian Czarnecki,Tom Schaul,Joel Z. Leibo,David Silver,Koray Kavukcuoglu +6 more
TL;DR: This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.
Posted Content
The Option-Critic Architecture
TL;DR: This paper propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, without the need to provide any additional rewards or subgoals.
Journal ArticleDOI
From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning
TL;DR: The results establish a principled link between high-level actions and abstract representations, a concrete theoretical foundation for constructing abstract representations with provable properties, and a practical mechanism for autonomously learning abstract high- level representations.
Posted Content
Variational Intrinsic Control
TL;DR: In this article, the authors introduce an unsupervised reinforcement learning method for discovering the set of intrinsic options available to an agent, which is learned by maximizing the number of different states an agent can reliably reach.
Journal ArticleDOI
The algorithmic anatomy of model-based evaluation
Nathaniel D. Daw,Peter Dayan +1 more
TL;DR: This work studies the realization of MB calculations, and the ways that this might be woven together with MF values and evaluation methods.
References
More filters
Book
Reinforcement Learning: An Introduction
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Book
Dynamic Programming
TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.
Book
Introduction to Reinforcement Learning
TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
Journal ArticleDOI
What is dynamic programming
TL;DR: Sequence alignment methods often use something called a 'dynamic programming' algorithm, which can be a good idea or a bad idea, depending on the method used.
Journal ArticleDOI
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
TL;DR: It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning.
Related Papers (5)
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density
Amy McGovern,Andrew G. Barto +1 more