scispace - formally typeset
Open AccessPosted Content

The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Reads0
Chats0
TLDR
In this paper, a variant of optimal policy optimization (PPO) called MAPPO (Multi-Agent PPO) is proposed for multi-agent settings, which is used for particle-world, Starcraft and Hanabi games.
Abstract
Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due the belief that on-policy methods are significantly less sample efficient than their off-policy counterparts in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a variant of PPO which is specialized for multi-agent settings. Using a 1-GPU desktop, we show that MAPPO achieves surprisingly strong performance in three popular multi-agent testbeds: the particle-world environments, the Starcraft multi-agent challenge, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves strong results while exhibiting comparable sample efficiency. Finally, through ablation studies, we present the implementation and algorithmic factors which are most influential to MAPPO's practical performance.

read more

Citations
More filters
Journal ArticleDOI

A Review of Deep Reinforcement Learning for Smart Building Energy Management

TL;DR: A comprehensive review of DRL for SBEM from the perspective of system scale is provided and the existing unresolved issues are identified and possible future research directions are pointed out.
Posted Content

Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms.

TL;DR: This work evaluates and compares three different classes of MARL algorithms in a diverse range of multi-agent learning tasks and shows that algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks.
Journal ArticleDOI

Deep reinforcement learning for dynamic control of fuel injection timing in multi-pulse compression ignition engines:

TL;DR: In this article, the authors proposed a compression-ignition (CI) engine with high thermal efficiencies and torque across a wide range of loads, but often require extensive exhaust gas treatment that decreases
Journal ArticleDOI

Self-attention-based multi-agent continuous control method in cooperative environments

TL;DR: In this paper, a new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network and the value decomposition method used to solve the uneven problem.
Posted Content

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

TL;DR: In this article, the authors consider two-player zero-sum Markov games and propose an algorithm that can find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension.
References
More filters
Posted Content

Proximal Policy Optimization Algorithms

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Proceedings ArticleDOI

MuJoCo: A physics engine for model-based control

TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.
Book ChapterDOI

Markov games as a framework for multi-agent reinforcement learning

TL;DR: A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated.
Proceedings Article

Prioritized Experience Replay

TL;DR: Prioritized experience replay as mentioned in this paper is a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, achieving human-level performance across many Atari games.
Related Papers (5)