Open AccessPosted Content
The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games
Reads0
Chats0
TLDR
In this paper, a variant of optimal policy optimization (PPO) called MAPPO (Multi-Agent PPO) is proposed for multi-agent settings, which is used for particle-world, Starcraft and Hanabi games.Abstract:
Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due the belief that on-policy methods are significantly less sample efficient than their off-policy counterparts in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a variant of PPO which is specialized for multi-agent settings. Using a 1-GPU desktop, we show that MAPPO achieves surprisingly strong performance in three popular multi-agent testbeds: the particle-world environments, the Starcraft multi-agent challenge, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves strong results while exhibiting comparable sample efficiency. Finally, through ablation studies, we present the implementation and algorithmic factors which are most influential to MAPPO's practical performance.read more
Citations
More filters
Journal ArticleDOI
A Review of Deep Reinforcement Learning for Smart Building Energy Management
TL;DR: A comprehensive review of DRL for SBEM from the perspective of system scale is provided and the existing unresolved issues are identified and possible future research directions are pointed out.
Posted Content
Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms.
TL;DR: This work evaluates and compares three different classes of MARL algorithms in a diverse range of multi-agent learning tasks and shows that algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks.
Journal ArticleDOI
Deep reinforcement learning for dynamic control of fuel injection timing in multi-pulse compression ignition engines:
TL;DR: In this article, the authors proposed a compression-ignition (CI) engine with high thermal efficiencies and torque across a wide range of loads, but often require extensive exhaust gas treatment that decreases
Journal ArticleDOI
Self-attention-based multi-agent continuous control method in cooperative environments
TL;DR: In this paper, a new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network and the value decomposition method used to solve the uneven problem.
Posted Content
The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
Chi Jin,Qinghua Liu,Tiancheng Yu +2 more
TL;DR: In this article, the authors consider two-player zero-sum Markov games and propose an algorithm that can find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension.
References
More filters
Posted Content
Proximal Policy Optimization Algorithms
TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Proceedings ArticleDOI
MuJoCo: A physics engine for model-based control
TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.
Book ChapterDOI
Markov games as a framework for multi-agent reinforcement learning
TL;DR: A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated.
Journal ArticleDOI
Grandmaster level in StarCraft II using multi-agent reinforcement learning.
Oriol Vinyals,Igor Babuschkin,Wojciech Marian Czarnecki,Michael Mathieu,Andrew Dudzik,Junyoung Chung,David H. Choi,Richard E. Powell,Timo Ewalds,Petko Georgiev,Junhyuk Oh,Dan Horgan,Manuel Kroiss,Ivo Danihelka,Aja Huang,Laurent Sifre,Trevor Cai,John P. Agapiou,Max Jaderberg,Alexander Vezhnevets,Rémi Leblond,Tobias Pohlen,Valentin Dalibard,David Budden,Yury Sulsky,James Molloy,Tom Le Paine,Caglar Gulcehre,Ziyu Wang,Tobias Pfaff,Yuhuai Wu,Roman Ring,Dani Yogatama,Dario Wünsch,Katrina McKinney,Oliver Smith,Tom Schaul,Timothy P. Lillicrap,Koray Kavukcuoglu,Demis Hassabis,Chris Apps,David Silver +41 more
TL;DR: The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.
Proceedings Article
Prioritized Experience Replay
TL;DR: Prioritized experience replay as mentioned in this paper is a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, achieving human-level performance across many Atari games.