Mastering Atari with Discrete World Models

Open AccessPosted Content

Mastering Atari with Discrete World Models

- 05 Oct 2020 -

TLDR

DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model, and exceeds the final performance of the top single-GPU agents IQN and Rainbow.

Abstract:

Intelligent agents need to generalize from past experience to achieve goals in complex environments. World models facilitate such generalization and allow learning behaviors from imagined outcomes to increase sample-efficiency. While learning world models from image inputs has recently become feasible for some tasks, modeling Atari games accurately enough to derive successful behaviors has remained an open challenge for many years. We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model. The world model uses discrete representations and is trained separately from the policy. DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model. With the same computational budget and wall-clock time, DreamerV2 reaches 200M frames and exceeds the final performance of the top single-GPU agents IQN and Rainbow.

Citations

PDF

Open Access

More filters

Posted Content

Decision Transformer: Reinforcement Learning via Sequence Modeling

Lili Chen, +8 more

- 02 Jun 2021 -

arXiv: Learning

TL;DR: Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

...read moreread less

Proceedings ArticleDOI

Planning with Diffusion for Flexible Behavior Synthesis

Michael Janner, +3 more

TL;DR: This paper considers what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical.

...read moreread less

Proceedings ArticleDOI

Multi-Game Decision Transformers

Kuang-Huei Lee, +10 more

TL;DR: It is shown that a single transformer-based model – with a single set of weights – trained purely ofﬂine can play a suite of up to 46 Atari games simultaneously at close-to-human performance.

...read moreread less

Proceedings ArticleDOI

DayDreamer: World Models for Physical Robot Learning

Philipp Wu, +4 more

TL;DR: This paper applies Dreamer to four robots and tasks to learn online and directly in the real world, without any simulators, suggesting that Dreamer is capable of online learning in thereal world, establishing a strong baseline.

...read moreread less

Journal Article

Can Wikipedia Help Offline Reinforcement Learning?

Machel Reid, +2 more

- 28 Jan 2022 -

arXiv.org

TL;DR: This work looks to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains when ﬁnetuned on ofﬂine RL tasks (control, games), and proposes techniques to improve transfer between these domains.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Proceedings Article

Auto-Encoding Variational Bayes

Diederik P. Kingma, +1 more

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

...read moreread less

Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less