Open AccessPosted Content
Dota 2 with Large Scale Deep Reinforcement Learning
Christopher Berner,Greg Brockman,Brooke Chan,Vicki Cheung,Przemyslaw Debiak,Christy Dennison,David Farhi,Quirin Fischer,Shariq Hashme,Christopher Hesse,Rafal Jozefowicz,Scott Gray,Catherine Olsson,Jakub Pachocki,Michael Petrov,Henrique Ponde de Oliveira Pinto,Jonathan Raiman,Tim Salimans,Jeremy Schlatter,Jonas Schneider,Szymon Sidor,Ilya Sutskever,Jie Tang,Filip Wolski,Susan Zhang +24 more
TLDR
By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.Abstract:
On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.read more
Citations
More filters
Posted Content
Learning without Forgetting
Zhizhong Li,Derek Hoiem +1 more
TL;DR: This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.
Journal ArticleDOI
Toward Causal Representation Learning
Bernhard Schölkopf,Francesco Locatello,Stefan Bauer,Nan Rosemary Ke,Nal Kalchbrenner,Anirudh Goyal,Yoshua Bengio +6 more
TL;DR: The authors reviewed fundamental concepts of causal inference and related them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research.
Posted Content
Jukebox: A Generative Model for Music
TL;DR: It is shown that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes, and can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable.
Proceedings ArticleDOI
Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey.
TL;DR: The fundamental background behind sim-to-real transfer in deep reinforcement learning is covered and the main methods being utilized at the moment: domain randomization, domain adaptation, imitation learning, meta-learning and knowledge distillation are overviewed.
Posted Content
TensorFlow Quantum: A Software Framework for Quantum Machine Learning
Michael Broughton,Guillaume Verdon,Trevor McCourt,Antonio Martinez,Jae Hyeon Yoo,Sergei V. Isakov,Philip Massey,Murphy Yuezhen Niu,Ramin Halavati,Evan Peters,Martin Leib,Andrea Skolik,Michael Streif,David Von Dollen,Jarrod R. McClean,Sergio Boixo,Dave Bacon,Alan K. Ho,Hartmut Neven,Masoud Mohseni +19 more
TL;DR: This framework offers high-level abstractions for the design and training of both discriminative and generative quantum models under TensorFlow and supports high-performance quantum circuit simulators.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Posted Content
Distilling the Knowledge in a Neural Network
TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
Posted Content
Proximal Policy Optimization Algorithms
TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Posted Content
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Alex Graves,Ioannis Antonoglou,Daan Wierstra,Martin Riedmiller +6 more
TL;DR: This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Related Papers (5)
Human-level control through deep reinforcement learning
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more