Benchmarking Batch Deep Reinforcement Learning Algorithms.

Open AccessPosted Content

Benchmarking Batch Deep Reinforcement Learning Algorithms.

Scott Fujimoto, +3 more

- 03 Oct 2019 -

arXiv: Learning

Chats0

TLDR

This paper benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy, and finds that many of these algorithms underperform DQN trained online with the same amount of data.

Abstract:

Widely-used deep reinforcement learning algorithms have been shown to fail in the batch setting--learning from a fixed data set without interaction with the environment. Following this result, there have been several papers showing reasonable performances under a variety of environments and batch settings. In this paper, we benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy. We find that under these conditions, many of these algorithms underperform DQN trained online with the same amount of data, as well as the partially-trained behavioral policy. To introduce a strong baseline, we adapt the Batch-Constrained Q-learning algorithm to a discrete-action setting, and show it outperforms all existing algorithms at this task.

Citations

PDF

Open Access

More filters

Journal Article

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Justin Fu, +4 more

- 04 May 2021 -

arXiv: Learning

TL;DR: This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.

...read moreread less

Posted Content

An Optimistic Perspective on Offline Reinforcement Learning

Rishabh Agarwal, +2 more

- 10 Jul 2019 -

arXiv: Learning

TL;DR: It is demonstrated that recent off-policy deep RL algorithms, even when trained solely on this replay dataset, outperform the fully trained DQN agent and Random Ensemble Mixture (REM), a robust Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple Q-value estimates is presented.

...read moreread less

Posted Content

A Theoretical Analysis of Deep Q-Learning

Jianqing Fan, +3 more

- 01 Jan 2019 -

arXiv: Learning

TL;DR: In this paper, the authors make the first attempt to theoretically understand the deep Q-network (DQN) algorithm (Mnih et al., 2015) from both algorithmic and statistical perspectives, focusing on a slight simplification of DQN that fully captures its key features.

...read moreread less

Proceedings Article

Critic Regularized Regression

Ziyu Wang, +10 more

TL;DR: This paper proposes a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR), and finds that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014 -

arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

Book

Dynamic Programming

Richard Ernest Bellman

TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.

...read moreread less

Journal ArticleDOI

Robust Estimation of a Location Parameter

Peter J. Huber

- 01 Mar 1964 -

Annals of Mathematical Statistics

TL;DR: In this article, a new approach toward a theory of robust estimation is presented, which treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators that are asyptotically most robust (in a sense to be specified) among all translation invariant estimators.

...read moreread less