scispace - formally typeset
Open AccessPosted Content

Benchmarking Batch Deep Reinforcement Learning Algorithms.

Reads0
Chats0
TLDR
This paper benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy, and finds that many of these algorithms underperform DQN trained online with the same amount of data.
Abstract
Widely-used deep reinforcement learning algorithms have been shown to fail in the batch setting--learning from a fixed data set without interaction with the environment. Following this result, there have been several papers showing reasonable performances under a variety of environments and batch settings. In this paper, we benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy. We find that under these conditions, many of these algorithms underperform DQN trained online with the same amount of data, as well as the partially-trained behavioral policy. To introduce a strong baseline, we adapt the Batch-Constrained Q-learning algorithm to a discrete-action setting, and show it outperforms all existing algorithms at this task.

read more

Citations
More filters
Journal Article

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

TL;DR: This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.
Posted Content

An Optimistic Perspective on Offline Reinforcement Learning

TL;DR: It is demonstrated that recent off-policy deep RL algorithms, even when trained solely on this replay dataset, outperform the fully trained DQN agent and Random Ensemble Mixture (REM), a robust Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple Q-value estimates is presented.
Posted Content

Acme: A Research Framework for Distributed Reinforcement Learning

TL;DR: It is shown that the design decisions behind Acme lead to agents that can be scaled both up and down and that, for the most part, greater levels of parallelization result in agents with equivalent performance, just faster.
Posted Content

A Theoretical Analysis of Deep Q-Learning

TL;DR: In this paper, the authors make the first attempt to theoretically understand the deep Q-network (DQN) algorithm (Mnih et al., 2015) from both algorithmic and statistical perspectives, focusing on a slight simplification of DQN that fully captures its key features.
Proceedings Article

Critic Regularized Regression

TL;DR: This paper proposes a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR), and finds that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Posted Content

Adam: A Method for Stochastic Optimization

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
Journal ArticleDOI

Human-level control through deep reinforcement learning

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Book

Dynamic Programming

TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.
Journal ArticleDOI

Robust Estimation of a Location Parameter

TL;DR: In this article, a new approach toward a theory of robust estimation is presented, which treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators that are asyptotically most robust (in a sense to be specified) among all translation invariant estimators.
Related Papers (5)