The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Open AccessPosted Content

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Chao Yu, +5 more

- 02 Mar 2021 -

arXiv: Learning

Chats0

TLDR

In this paper, a variant of optimal policy optimization (PPO) called MAPPO (Multi-Agent PPO) is proposed for multi-agent settings, which is used for particle-world, Starcraft and Hanabi games.

Abstract:

Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due the belief that on-policy methods are significantly less sample efficient than their off-policy counterparts in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a variant of PPO which is specialized for multi-agent settings. Using a 1-GPU desktop, we show that MAPPO achieves surprisingly strong performance in three popular multi-agent testbeds: the particle-world environments, the Starcraft multi-agent challenge, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves strong results while exhibiting comparable sample efficiency. Finally, through ablation studies, we present the implementation and algorithmic factors which are most influential to MAPPO's practical performance.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Review of Deep Reinforcement Learning for Smart Building Energy Management

Liang Yu, +5 more

- 01 Aug 2021 -

IEEE Internet of Things Journal

TL;DR: A comprehensive review of DRL for SBEM from the perspective of system scale is provided and the existing unresolved issues are identified and possible future research directions are pointed out.

...read moreread less

Posted Content

Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms.

Georgios Papoudakis, +3 more

TL;DR: This work evaluates and compares three different classes of MARL algorithms in a diverse range of multi-agent learning tasks and shows that algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks.

...read moreread less

Journal ArticleDOI

Deep reinforcement learning for dynamic control of fuel injection timing in multi-pulse compression ignition engines:

Marc Henry de Frahan, +3 more

- 24 May 2021 -

International Journal of Engine Research

TL;DR: In this article, the authors proposed a compression-ignition (CI) engine with high thermal efficiencies and torque across a wide range of loads, but often require extensive exhaust gas treatment that decreases

...read moreread less

Journal ArticleDOI

Self-attention-based multi-agent continuous control method in cooperative environments

Kai Liu, +3 more

- 27 Nov 2021 -

Information Sciences

TL;DR: In this paper, a new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network and the value decomposition method used to solve the uneven problem.

...read moreread less

Posted Content

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

Chi Jin, +2 more

- 29 Sep 2021 -

arXiv: Learning

TL;DR: In this article, the authors consider two-player zero-sum Markov games and propose an algorithm that can find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension.

...read moreread less

References

PDF

Open Access

More filters

Posted Content

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less

Proceedings ArticleDOI

MuJoCo: A physics engine for model-based control

Emanuel Todorov, +2 more

TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.

...read moreread less

Book ChapterDOI

Markov games as a framework for multi-agent reinforcement learning

Michael L. Littman

TL;DR: A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated.

...read moreread less

Journal ArticleDOI

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Oriol Vinyals, +41 more

- 30 Oct 2019 -

Nature

TL;DR: The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

...read moreread less

Proceedings Article

Prioritized Experience Replay

Tom Schaul, +3 more

TL;DR: Prioritized experience replay as mentioned in this paper is a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, achieving human-level performance across many Atari games.

...read moreread less

Collapse

Related Papers (5)

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Citations

A Review of Deep Reinforcement Learning for Smart Building Energy Management

Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms.

Deep reinforcement learning for dynamic control of fuel injection timing in multi-pulse compression ignition engines:

Self-attention-based multi-agent continuous control method in cooperative environments

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

References

Proximal Policy Optimization Algorithms

MuJoCo: A physics engine for model-based control

Markov games as a framework for multi-agent reinforcement learning

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Prioritized Experience Replay

Related Papers (5)

Proximal Policy Optimization Algorithms

Asynchronous methods for deep reinforcement learning

Trust Region Policy Optimization

Markov games as a framework for multi-agent reinforcement learning

Human-level control through deep reinforcement learning