scispace - formally typeset
Open AccessPosted Content

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

TLDR
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.
Abstract
We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.

read more

Citations
More filters
Posted Content

Deep Reinforcement Learning: An Overview

Yuxi Li
- 25 Jan 2017 - 
TL;DR: This work discusses core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration, and important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn.
Book ChapterDOI

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

TL;DR: This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two.
Posted Content

Counterfactual Multi-Agent Policy Gradients

TL;DR: A new multi-agent actor-critic method called counterfactual multi- agent (COMA) policy gradients, which uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
Posted Content

Applications of Deep Reinforcement Learning in Communications and Networking: A Survey

TL;DR: In this paper, a comprehensive literature review on applications of deep reinforcement learning in communications and networking is presented, which includes dynamic network access, data rate control, wireless caching, data offloading, network security, and connectivity preservation.
Posted Content

Mean Field Multi-Agent Reinforcement Learning

TL;DR: In this paper, a mean field Q-learning and mean field Actor-Critic algorithms are proposed to solve the Ising model via model-free reinforcement learning methods. But the authors admit that the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics change according to the collective patterns of individual policies.
References
More filters
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Journal ArticleDOI

Human-level control through deep reinforcement learning

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Journal ArticleDOI

Mastering the game of Go with deep neural networks and tree search

TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Journal ArticleDOI

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.
Related Papers (5)