scispace - formally typeset
Open AccessPosted Content

Mean Field Multi-Agent Reinforcement Learning

Reads0
Chats0
TLDR
In this paper, a mean field Q-learning and mean field Actor-Critic algorithms are proposed to solve the Ising model via model-free reinforcement learning methods. But the authors admit that the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics change according to the collective patterns of individual policies.
Abstract
Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present \emph{Mean Field Reinforcement Learning} where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

read more

Citations
More filters
Posted Content

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

TL;DR: In this article, the authors propose a value-based method that can train decentralised policies in a centralised end-to-end fashion in simulated or laboratory settings, where global state information is available and communication constraints are lifted.
Book ChapterDOI

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

TL;DR: This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two.
Proceedings Article

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

TL;DR: QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.
Proceedings Article

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning.

TL;DR: In this article, value-based solutions for multi-agent reinforcement learning (MARL) tasks in the centralized training with decentralized execution (CTDE) regime popularized recently are explored.
Proceedings Article

The StarCraft Multi-Agent Challenge

TL;DR: The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened.
References
More filters
Book

Phase Transitions and Critical Phenomena

TL;DR: The field of phase transitions and critical phenomena continues to be active in research, producing a steady stream of interesting and fruitful results as discussed by the authors, and the major aim of this serial is to provide review articles that can serve as standard references for research workers in the field.
Book

Introductory functional analysis with applications

TL;DR: In this paper, the spectral theory of linear operators in normed spaces and their spectrum has been studied in the context of bounded self-and-adjoint linear operators and their applications in quantum mechanics.
Book ChapterDOI

Markov games as a framework for multi-agent reinforcement learning

TL;DR: A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated.
Journal ArticleDOI

Stochastic Games

TL;DR: In a stochastic game the play proceeds by steps from position to position, according to transition probabilities controlled jointly by the two players, and the expected total gain or loss is bounded by M, which depends on N 2 + N matrices.
Related Papers (5)