Open AccessPosted Content
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
TLDR
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.Abstract:
We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.read more
Citations
More filters
Posted Content
Relational inductive biases, deep learning, and graph networks
Peter W. Battaglia,Jessica B. Hamrick,Victor Bapst,Alvaro Sanchez-Gonzalez,Vinicius Zambaldi,Mateusz Malinowski,Andrea Tacchetti,David Raposo,Adam Santoro,Ryan Faulkner,Caglar Gulcehre,H. Francis Song,Andrew J. Ballard,Justin Gilmer,George E. Dahl,Ashish Vaswani,Kelsey R. Allen,Charlie Nash,Victoria Langston,Chris Dyer,Nicolas Heess,Daan Wierstra,Pushmeet Kohli,Matthew Botvinick,Oriol Vinyals,Yujia Li,Razvan Pascanu +26 more
TL;DR: It is argued that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective.
Journal ArticleDOI
A brief survey of deep reinforcement learning
TL;DR: This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL.
Posted Content
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
TL;DR: An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.
Proceedings Article
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
TL;DR: In this article, an actor-critic method was used to learn multi-agent coordination policies in cooperative and competitive multi-player RL games, where agent populations are able to discover various physical and informational coordination strategies.
Proceedings ArticleDOI
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
TL;DR: In this article, the authors demonstrate a simple method to bridge the "reality gap" by randomizing the dynamics of the simulator during training and develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained.
References
More filters
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings Article
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Journal ArticleDOI
Human-level control through deep reinforcement learning
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Andrei Rusu,Joel Veness,Marc G. Bellemare,Alex Graves,Martin Riedmiller,Andreas K. Fidjeland,Georg Ostrovski,Stig Petersen,Charles Beattie,Amir Sadik,Ioannis Antonoglou,Helen King,Dharshan Kumaran,Daan Wierstra,Shane Legg,Demis Hassabis +18 more
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Posted Content
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Related Papers (5)
Human-level control through deep reinforcement learning
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more