Open AccessPosted Content
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
David Silver,Thomas Hubert,Julian Schrittwieser,Ioannis Antonoglou,Matthew Lai,Arthur Guez,Marc Lanctot,Laurent Sifre,Dharshan Kumaran,Thore Graepel,Timothy P. Lillicrap,Karen Simonyan,Demis Hassabis +12 more
Reads0
Chats0
TLDR
This paper generalises the approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains, and convincingly defeated a world-champion program in each case.Abstract:
The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.read more
Citations
More filters
Proceedings ArticleDOI
Training language models to follow instructions with human feedback
Long Ouyang,Jeffrey Wu,Xu Jiang,Diogo Almeida,Carroll L. Wainwright,Pamela Mishkin,Chong Zhang,Sandhini Agarwal,Katarina Slama,Alex Ray,John Schulman,Jacob Hilton,Fraser Kelton,Luke E. Miller,Maddie Simens,Amanda Askell,Peter Welinder,Paul F. Christiano,Jan Leike,Ryan Lowe +19 more
TL;DR: The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.
Journal ArticleDOI
Adversarial Examples: Attacks and Defenses for Deep Learning
TL;DR: In this paper, the authors review recent findings on adversarial examples for DNNs, summarize the methods for generating adversarial samples, and propose a taxonomy of these methods.
Posted Content
Solving Rubik's Cube with a Robot Hand.
OpenAI,Ilge Akkaya,Marcin Andrychowicz,Maciek Chociej,Mateusz Litwin,Bob McGrew,Arthur Petron,Alex Paino,Matthias Plappert,Glenn Powell,Raphael Ribas,Jonas Schneider,Nikolas Tezak,Jerry Tworek,Peter Welinder,Lilian Weng,Qiming Yuan,Wojciech Zaremba,Lei M. Zhang +18 more
TL;DR: It is demonstrated that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot, made possible by a novel algorithm, which is called automatic domain randomization (ADR), and a robot platform built for machine learning.
Journal ArticleDOI
Machine learning & artificial intelligence in the quantum domain: a review of recent progress.
TL;DR: In this article, the authors describe the main ideas, recent developments and progress in a broad spectrum of research investigating ML and AI in the quantum domain, and discuss the fundamental issue of quantum generalizations of learning and AI concepts.
References
More filters
Journal ArticleDOI
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Journal ArticleDOI
Mastering the game of Go without human knowledge
David Silver,Julian Schrittwieser,Karen Simonyan,Ioannis Antonoglou,Aja Huang,Arthur Guez,Thomas Hubert,Lucas Baker,Matthew Lai,Adrian Bolton,Yutian Chen,Timothy P. Lillicrap,Fan Hui,Laurent Sifre,George van den Driessche,Thore Graepel,Demis Hassabis +16 more
TL;DR: An algorithm based solely on reinforcement learning is introduced, without human data, guidance or domain knowledge beyond game rules, that achieves superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.
Posted Content
In-Datacenter Performance Analysis of a Tensor Processing Unit
Norman P. Jouppi,Cliff Young,Nishant Patil,David A. Patterson,Gaurav Agrawal,Raminder Bajwa,Sarah Bates,Suresh Bhatia,Nan Boden,Albert T. Borchers,Rick Boyle,Pierre-luc Cantin,Clifford Chao,Christopher Aaron Clark,Jeremy Coriell,Michael J. Daley,Matt Dau,Jeffrey Dean,Ben Gelb,Tara Vazir Ghaemmaghami,Rajendra Gottipati,William John Gulland,Robert Hagmann,C. Richard Ho,Doug Hogberg,John Hu,Robert Hundt,D. Hurt,Julian Ibarz,Aaron Jaffey,Alek Jaworski,Alexander Kaplan,Khaitan Harshit,Andy Koch,Naveen Kumar,Steve Lacy,James Laudon,James Law,Diemthu Le,Chris Leary,Zhuyuan Liu,Kyle Lucke,Alan Lundin,Gordon MacKean,Adriana Maggiore,Maire Mahony,Kieran Miller,Rahul Nagarajan,Ravi Narayanaswami,Ray Ni,Kathy Nix,Thomas Norrie,Mark Omernick,Narayana Penukonda,Andrew Everett Phelps,Jonathan Ross,Matt Ross,Amir Salek,Emad Samadiani,Chris Severn,Gregory Sizikov,Matthew Snelham,Jed Souter,Dan Steinberg,Andy Swing,Mercedes Tan,Gregory Michael Thorson,Bo Tian,Horia Toma,Erick Tuttle,Vijay K. Vasudevan,Richard Walter,Walter Wang,Eric Wilcox,Doe Hyun Yoon +74 more
TL;DR: This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.
Journal ArticleDOI
Deep Blue
TL;DR: Deep Blue as discussed by the authors is the chess machine that defeated then-reigning World Chess Champion Garry Kasparov in a six-game match in 1997 and won the first World Chess Championship.
Journal ArticleDOI
An analysis of alpha-beta pruning
Donald E. Knuth,Ronald W. Moore +1 more
TL;DR: The alpha-beta procedure for searching game trees is shown to be optimal in a certain sense, and bounds are obtained for its running time with various kinds of random data.
Related Papers (5)
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more