The Option-Critic Architecture

Open AccessPosted Content

The Option-Critic Architecture

- 16 Sep 2016 -

TLDR

This paper propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, without the need to provide any additional rewards or subgoals.

Abstract:

Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals. Experimental results in both discrete and continuous environments showcase the flexibility and efficiency of the framework.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A brief survey of deep reinforcement learning

Kai Arulkumaran, +3 more

- 09 Nov 2017 -

arXiv: Learning

TL;DR: This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL.

...read moreread less

Posted Content

Deep Reinforcement Learning: An Overview

Yuxi Li

- 25 Jan 2017 -

arXiv: Learning

TL;DR: This work discusses core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration, and important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn.

...read moreread less

Proceedings Article

FeUdal Networks for Hierarchical Reinforcement Learning

Alexander Vezhnevets, +6 more

TL;DR: This work introduces FeUdal Networks (FuNs), a novel architecture for hierarchical reinforcement learning inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and gains power and efficacy by decoupling end-to-end learning across multiple levels -- allowing it to utilise different resolutions of time.

...read moreread less

Proceedings Article

Data-Efficient Hierarchical Reinforcement Learning

Ofir Nachum, +3 more

TL;DR: This paper studies how to develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L. Puterman

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.

...read moreread less

Posted Content

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, +6 more

- 19 Dec 2013 -

arXiv: Learning

TL;DR: This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

...read moreread less

Journal ArticleDOI

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Ronald J. Williams

- 01 May 1992 -

Machine Learning

TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

...read moreread less

Proceedings Article

Asynchronous methods for deep reinforcement learning

Volodymyr Mnih, +7 more

TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

...read moreread less

Proceedings Article

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Richard S. Sutton, +3 more

TL;DR: This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

...read moreread less

Collapse

Related Papers (5)

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Richard S. Sutton, +2 more

- 01 Aug 1999 -

Artificial Intelligence

arXiv: Learning

The Option-Critic Architecture

Citations

A brief survey of deep reinforcement learning

Deep Reinforcement Learning: An Overview

FeUdal Networks for Hierarchical Reinforcement Learning

Data-Efficient Hierarchical Reinforcement Learning

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning.

References

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Playing Atari with Deep Reinforcement Learning

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Asynchronous methods for deep reinforcement learning

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Related Papers (5)

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Human-level control through deep reinforcement learning

Reinforcement Learning: An Introduction

Asynchronous methods for deep reinforcement learning

Continuous control with deep reinforcement learning