Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Open AccessPosted Content

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

- 21 May 2016 -

TLDR

By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.

Abstract:

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A brief survey of deep reinforcement learning

Kai Arulkumaran, +3 more

- 09 Nov 2017 -

arXiv: Learning

TL;DR: This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL.

...read moreread less

Posted Content

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Ryan Lowe, +5 more

- 07 Jun 2017 -

arXiv: Learning

TL;DR: An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.

...read moreread less

Proceedings Article

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Ryan Lowe, +5 more

TL;DR: In this article, an actor-critic method was used to learn multi-agent coordination policies in cooperative and competitive multi-player RL games, where agent populations are able to discover various physical and informational coordination strategies.

...read moreread less

Proceedings ArticleDOI

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

Xue Bin Peng, +3 more

TL;DR: In this article, the authors demonstrate a simple method to bridge the "reality gap" by randomizing the dynamics of the simulator during training and develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Posted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

- 11 Feb 2015 -

arXiv: Learning

TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.

...read moreread less

Collapse

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Citations

Relational inductive biases, deep learning, and graph networks

A brief survey of deep reinforcement learning

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

References

Long short-term memory

Gradient-based learning applied to document recognition

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Human-level control through deep reinforcement learning

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Related Papers (5)

Human-level control through deep reinforcement learning

Reinforcement Learning: An Introduction

Mastering the game of Go with deep neural networks and tree search

Markov games as a framework for multi-agent reinforcement learning

Asynchronous methods for deep reinforcement learning