Dota 2 with Large Scale Deep Reinforcement Learning

Open AccessPosted Content

Dota 2 with Large Scale Deep Reinforcement Learning

- 01 Jan 2019 -

TLDR

By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

Abstract:

On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

Citations

PDF

Open Access

More filters

Posted Content

Learning without Forgetting

Zhizhong Li, +1 more

- 29 Jun 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.

...read moreread less

Journal ArticleDOI

Toward Causal Representation Learning

Bernhard Schölkopf, +6 more

TL;DR: The authors reviewed fundamental concepts of causal inference and related them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research.

...read moreread less

Posted Content

Jukebox: A Generative Model for Music

Prafulla Dhariwal, +5 more

- 30 Apr 2020 -

arXiv: Audio and Speech Processing

TL;DR: It is shown that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes, and can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable.

...read moreread less

Proceedings ArticleDOI

Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey.

Wenshuai Zhao, +2 more

- 24 Sep 2020 -

arXiv: Learning

TL;DR: The fundamental background behind sim-to-real transfer in deep reinforcement learning is covered and the main methods being utilized at the moment: domain randomization, domain adaptation, imitation learning, meta-learning and knowledge distillation are overviewed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Posted Content

Distilling the Knowledge in a Neural Network

Geoffrey E. Hinton, +2 more

- 09 Mar 2015 -

arXiv: Machine Learning

TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

...read moreread less

Posted Content

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less

Posted Content

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, +6 more

- 19 Dec 2013 -

arXiv: Learning

TL;DR: This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

...read moreread less

Collapse

Related Papers (5)

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

Dota 2 with Large Scale Deep Reinforcement Learning

Citations

Learning without Forgetting

Toward Causal Representation Learning

Jukebox: A Generative Model for Music

Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey.

TensorFlow Quantum: A Software Framework for Quantum Machine Learning

References

Adam: A Method for Stochastic Optimization

Mastering the game of Go with deep neural networks and tree search

Distilling the Knowledge in a Neural Network

Proximal Policy Optimization Algorithms

Playing Atari with Deep Reinforcement Learning

Related Papers (5)

Human-level control through deep reinforcement learning

Proximal Policy Optimization Algorithms

Mastering the game of Go with deep neural networks and tree search

Reinforcement Learning: An Introduction

Mastering the game of Go without human knowledge