Journal•

arXiv: Multiagent Systems

About: arXiv: Multiagent Systems is an academic journal. The journal publishes majorly in the area(s): Reinforcement learning & Multi-agent system. Over the lifetime, 1573 publications have been published receiving 9327 citations.

...read moreread less

Topics: Reinforcement learning, Multi-agent system, Computer science, Population, Swarm behaviour ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Mean Field Multi-Agent Reinforcement Learning

[...]

Yaodong Yang¹, Rui Luo¹, Minne Li¹, Ming Zhou², Weinan Zhang³, Jun Wang⁴ - Show less +2 more•Institutions (4)

University College London¹, Harbin Institute of Technology², Shanghai Jiao Tong University³, East China Normal University⁴

15 Feb 2018-arXiv: Multiagent Systems

TL;DR: In this paper, a mean field Q-learning and mean field Actor-Critic algorithms are proposed to solve the Ising model via model-free reinforcement learning methods. But the authors admit that the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics change according to the collective patterns of individual policies.

...read moreread less

Abstract: Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present \emph{Mean Field Reinforcement Learning} where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

...read moreread less

331 citations

Posted Content•

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

[...]

Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel - Show less +1 more

10 Feb 2017-arXiv: Multiagent Systems

TL;DR: In this paper, the authors introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemma but also require agents to learn policies that implement their strategic intentions.

...read moreread less

Abstract: Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

...read moreread less

293 citations

Posted Content•

A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

[...]

Pablo Hernandez-Leal, Michael Kaisers, Tim Baarslag, Enrique Munoz de Cote

28 Jul 2017-arXiv: Multiagent Systems

TL;DR: This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind.

...read moreread less

Abstract: The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research.

...read moreread less

208 citations

Posted Content•

Justified Representation in Approval-Based Committee Voting

[...]

Haris Aziz¹, Markus Brill², Vincent Conitzer², Edith Elkind³, Rupert Freeman², Toby Walsh¹ - Show less +2 more•Institutions (3)

NICTA¹, Duke University², University of Oxford³

31 Jul 2014-arXiv: Multiagent Systems

TL;DR: In this article, a natural axiom for committee voting, called justified representation (JR), was proposed, which requires that if a large enough group of voters exhibits agreement by supporting the same candidate, then at least one voter in this group has an approved candidate in the winning committee.

...read moreread less

Abstract: We consider approval-based committee voting, i.e. the setting where each voter approves a subset of candidates, and these votes are then used to select a fixed-size set of winners (committee). We propose a natural axiom for this setting, which we call justified representation (JR). This axiom requires that if a large enough group of voters exhibits agreement by supporting the same candidate, then at least one voter in this group has an approved candidate in the winning committee. We show that for every list of ballots it is possible to select a committee that provides JR. However, it turns out that several prominent approval-based voting rules may fail to output such a committee. In particular, while Proportional Approval Voting (PAV) always outputs a committee that provides JR, Reweighted Approval Voting (RAV), a tractable approximation to PAV, does not have this property. We then introduce a stronger version of the JR axiom, which we call extended justified representation (EJR), and show that PAV satisfies EJR, while other rules we consider do not; indeed, EJR can be used to characterize PAV within the class of weighted PAV rules. We also consider several other questions related to JR and EJR, including the relationship between JR/EJR and core stability, and the complexity of the associated algorithmic problems.

...read moreread less

185 citations

Posted Content•DOI•

Distributed Parameter Estimation in Sensor Networks: Nonlinear Observation Models and Imperfect Communication

[...]

Soummya Kar¹, Jose M. F. Moura¹, Kavita Ramanan²•Institutions (2)

Carnegie Mellon University¹, Brown University²

29 Aug 2008-arXiv: Multiagent Systems

TL;DR: This paper proves consistency (all sensors reach consensus almost surely and converge to the true parameter value), efficiency, and asymptotic unbiasedness, and provides convergence rate guarantees in distributed static parameter (vector) estimation in sensor networks with nonlinear observation models and noisy intersensor communication.

...read moreread less

Abstract: The paper studies distributed static parameter (vector) estimation in sensor networks with nonlinear observation models and noisy inter-sensor communication. It introduces \emph{separably estimable} observation models that generalize the observability condition in linear centralized estimation to nonlinear distributed estimation. It studies two distributed estimation algorithms in separably estimable models, the $\mathcal{NU}$ (with its linear counterpart $\mathcal{LU}$) and the $\mathcal{NLU}$. Their update rule combines a \emph{consensus} step (where each sensor updates the state by weight averaging it with its neighbors' states) and an \emph{innovation} step (where each sensor processes its local current observation.) This makes the three algorithms of the \textit{consensus + innovations} type, very different from traditional consensus. The paper proves consistency (all sensors reach consensus almost surely and converge to the true parameter value,) efficiency, and asymptotic unbiasedness. For $\mathcal{LU}$ and $\mathcal{NU}$, it proves asymptotic normality and provides convergence rate guarantees. The three algorithms are characterized by appropriately chosen decaying weight sequences. Algorithms $\mathcal{LU}$ and $\mathcal{NU}$ are analyzed in the framework of stochastic approximation theory; algorithm $\mathcal{NLU}$ exhibits mixed time-scale behavior and biased perturbations, and its analysis requires a different approach that is developed in the paper.

...read moreread less

168 citations

Collapse

Network Information

Related Journals (5)

arXiv: Artificial Intelligence

13.6K papers, 186.5K citations

89% related

arXiv: Learning

45K papers, 837.1K citations

1.5K papers, 158.2K citations

3.5K papers, 213.4K citations

80% related

arXiv: Optimization and Control

21.7K papers, 187.1K citations

80% related

Performance

Metrics

1,573

Papers

12,691

Citations

No. of papers from the Journal in previous years
Year	Papers
2021	249
2020	246
2019	258
2018	177
2017	113
2016	79