Behavior Regularized Offline Reinforcement Learning

Open AccessPosted Content

Behavior Regularized Offline Reinforcement Learning

Yifan Wu, +2 more

- 25 Sep 2019 -

arXiv: Learning

Chats0

TLDR

A general framework, behavior regularized actor critic (BRAC), is introduced to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks.

Abstract:

In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment. However in many real-world applications, access to the environment is limited to a fixed offline dataset of logged experience. In such settings, standard RL algorithms have been shown to diverge or otherwise yield poor performance. Accordingly, recent work has suggested a number of remedies to these issues. In this work, we introduce a general framework, behavior regularized actor critic (BRAC), to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks. Surprisingly, we find that many of the technical complexities introduced in recent methods are unnecessary to achieve strong performance. Additional ablations provide insights into which design choices matter most in the offline RL setting.

Citations

PDF

Open Access

More filters

Posted Content

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Sergey Levine, +3 more

- 04 May 2020 -

arXiv: Learning

TL;DR: This tutorial article aims to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcementlearning algorithms that utilize previously collected data, without additional online data collection.

...read moreread less

Posted Content

Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar, +3 more

- 08 Jun 2020 -

arXiv: Learning

TL;DR: Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.

...read moreread less

Journal Article

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Justin Fu, +4 more

- 04 May 2021 -

arXiv: Learning

TL;DR: This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.

...read moreread less

Posted Content

Decision Transformer: Reinforcement Learning via Sequence Modeling

Lili Chen, +8 more

- 02 Jun 2021 -

arXiv: Learning

TL;DR: Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

...read moreread less

Proceedings Article

MOPO: Model-based Offline Policy Optimization

Tianhe Yu, +7 more

TL;DR: Model-based offline policy optimization (MOPO) as discussed by the authors proposes to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics and theoretically shows that the algorithm maximizes a lower bound of the policy's return under the true MDP.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L. Puterman

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.

...read moreread less

Posted Content

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, +6 more

- 19 Dec 2013 -

arXiv: Learning

TL;DR: This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

...read moreread less

Proceedings Article

Asynchronous methods for deep reinforcement learning

Volodymyr Mnih, +7 more

TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

...read moreread less

MonographDOI

Markov Decision Processes

P. Whittle, +1 more

- 15 Apr 1994 -

Journal of The Royal Statistical Society...

TL;DR: Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.

...read moreread less

Posted Content

Continuous control with deep reinforcement learning

Timothy P. Lillicrap, +7 more

- 09 Sep 2015 -

arXiv: Learning

TL;DR: This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

...read moreread less