Exploration by random network distillation

Open AccessProceedings Article

Exploration by random network distillation

- pp 1-17

TLDR

An exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed and a method to flexibly combine intrinsic and extrinsic rewards that enables significant progress on several hard exploration Atari games is introduced.

Abstract:

We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access to the underlying state of the game, and occasionally completes the first level.

Citations

PDF

Open Access

More filters

Posted Content

Solving Rubik's Cube with a Robot Hand.

OpenAI, +18 more

- 16 Oct 2019 -

arXiv: Learning

TL;DR: It is demonstrated that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot, made possible by a novel algorithm, which is called automatic domain randomization (ADR), and a robot platform built for machine learning.

...read moreread less

Journal ArticleDOI

Deep Learning for Anomaly Detection: A Review

Guansong Pang, +3 more

- 05 Mar 2021 -

ACM Computing Surveys

TL;DR: A comprehensive survey of deep anomaly detection with a comprehensive taxonomy is presented in this paper, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods.

...read moreread less

Posted Content

Emergent Tool Use From Multi-Agent Autocurricula.

Bowen Baker, +6 more

- 17 Sep 2019 -

arXiv: Learning

TL;DR: This work finds clear evidence of six emergent phases in agent strategy in the authors' environment, each of which creates a new pressure for the opposing team to adapt, and compares hide-and-seek agents to both intrinsic motivation and random initialization baselines in a suite of domain-specific intelligence tests.

...read moreread less

Journal ArticleDOI

Deep Learning for Anomaly Detection: A Review

Guansong Pang, +3 more

- 06 Jul 2020 -

arXiv: Learning

TL;DR: This article surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods and discusses how they address the aforementioned challenges.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

Posted Content

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less