Off-Policy Deep Reinforcement Learning without Exploration

Open AccessProceedings Article

Off-Policy Deep Reinforcement Learning without Exploration

Scott Fujimoto, +2 more

- pp 2052-2062

Chats0

TLDR

This paper introduces a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data.

Abstract:

Many practical applications of reinforcement learning constrain agents to learn from a fixed batch of data which has already been gathered, without offering further possibility for data collection. In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data. We present the first continuous control deep reinforcement learning algorithm which can learn effectively from arbitrary, fixed batch data, and empirically demonstrate the quality of its behavior in several tasks.

Citations

PDF

Open Access

More filters

Posted Content

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Sergey Levine, +3 more

- 04 May 2020 -

arXiv: Learning

TL;DR: This tutorial article aims to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcementlearning algorithms that utilize previously collected data, without additional online data collection.

...read moreread less

Posted Content

Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar, +3 more

- 08 Jun 2020 -

arXiv: Learning

TL;DR: Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.

...read moreread less

Journal ArticleDOI

Toward Causal Representation Learning

Bernhard Schölkopf, +6 more

TL;DR: The authors reviewed fundamental concepts of causal inference and related them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research.

...read moreread less

Journal Article

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Justin Fu, +4 more

- 04 May 2021 -

arXiv: Learning

TL;DR: This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.

...read moreread less

Posted Content

Decision Transformer: Reinforcement Learning via Sequence Modeling

Lili Chen, +8 more

- 02 Jun 2021 -

arXiv: Learning

TL;DR: Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

Auto-Encoding Variational Bayes

Diederik P. Kingma, +1 more

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

...read moreread less

Journal ArticleDOI

Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton

- 01 Aug 1988 -

Machine Learning

TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.

...read moreread less

Deep reinforcement learning with double Q-learning

H Van Hasselt, +2 more

TL;DR: In this article, the authors show that the DQN algorithm suffers from substantial overestimation in some games in the Atari 2600 domain, and they propose a specific adaptation to the algorithm and show that this algorithm not only reduces the observed overestimations, but also leads to much better performance on several games.

...read moreread less

Collapse

arXiv: Learning

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

Off-Policy Deep Reinforcement Learning without Exploration

Citations

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Conservative Q-Learning for Offline Reinforcement Learning

Toward Causal Representation Learning

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Decision Transformer: Reinforcement Learning via Sequence Modeling

References

Adam: A Method for Stochastic Optimization

Human-level control through deep reinforcement learning

Auto-Encoding Variational Bayes

Learning to Predict by the Methods of Temporal Differences

Deep reinforcement learning with double Q-learning

Related Papers (5)

Human-level control through deep reinforcement learning

Reinforcement Learning: An Introduction

MuJoCo: A physics engine for model-based control

Continuous control with deep reinforcement learning

Proximal Policy Optimization Algorithms