scispace - formally typeset
Search or ask a question
Journal

arXiv: Multiagent Systems 

About: arXiv: Multiagent Systems is an academic journal. The journal publishes majorly in the area(s): Reinforcement learning & Multi-agent system. Over the lifetime, 1573 publications have been published receiving 9327 citations.


Papers
More filters
Posted Content
TL;DR: In this paper, a mean field Q-learning and mean field Actor-Critic algorithms are proposed to solve the Ising model via model-free reinforcement learning methods. But the authors admit that the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics change according to the collective patterns of individual policies.
Abstract: Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present \emph{Mean Field Reinforcement Learning} where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

331 citations

Posted Content
TL;DR: In this paper, the authors introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemma but also require agents to learn policies that implement their strategic intentions.
Abstract: Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

293 citations

Posted Content
TL;DR: This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind.
Abstract: The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research.

208 citations

Posted Content
TL;DR: In this article, a natural axiom for committee voting, called justified representation (JR), was proposed, which requires that if a large enough group of voters exhibits agreement by supporting the same candidate, then at least one voter in this group has an approved candidate in the winning committee.
Abstract: We consider approval-based committee voting, i.e. the setting where each voter approves a subset of candidates, and these votes are then used to select a fixed-size set of winners (committee). We propose a natural axiom for this setting, which we call justified representation (JR). This axiom requires that if a large enough group of voters exhibits agreement by supporting the same candidate, then at least one voter in this group has an approved candidate in the winning committee. We show that for every list of ballots it is possible to select a committee that provides JR. However, it turns out that several prominent approval-based voting rules may fail to output such a committee. In particular, while Proportional Approval Voting (PAV) always outputs a committee that provides JR, Reweighted Approval Voting (RAV), a tractable approximation to PAV, does not have this property. We then introduce a stronger version of the JR axiom, which we call extended justified representation (EJR), and show that PAV satisfies EJR, while other rules we consider do not; indeed, EJR can be used to characterize PAV within the class of weighted PAV rules. We also consider several other questions related to JR and EJR, including the relationship between JR/EJR and core stability, and the complexity of the associated algorithmic problems.

185 citations

Posted ContentDOI
TL;DR: This paper proves consistency (all sensors reach consensus almost surely and converge to the true parameter value), efficiency, and asymptotic unbiasedness, and provides convergence rate guarantees in distributed static parameter (vector) estimation in sensor networks with nonlinear observation models and noisy intersensor communication.
Abstract: The paper studies distributed static parameter (vector) estimation in sensor networks with nonlinear observation models and noisy inter-sensor communication. It introduces \emph{separably estimable} observation models that generalize the observability condition in linear centralized estimation to nonlinear distributed estimation. It studies two distributed estimation algorithms in separably estimable models, the $\mathcal{NU}$ (with its linear counterpart $\mathcal{LU}$) and the $\mathcal{NLU}$. Their update rule combines a \emph{consensus} step (where each sensor updates the state by weight averaging it with its neighbors' states) and an \emph{innovation} step (where each sensor processes its local current observation.) This makes the three algorithms of the \textit{consensus + innovations} type, very different from traditional consensus. The paper proves consistency (all sensors reach consensus almost surely and converge to the true parameter value,) efficiency, and asymptotic unbiasedness. For $\mathcal{LU}$ and $\mathcal{NU}$, it proves asymptotic normality and provides convergence rate guarantees. The three algorithms are characterized by appropriately chosen decaying weight sequences. Algorithms $\mathcal{LU}$ and $\mathcal{NU}$ are analyzed in the framework of stochastic approximation theory; algorithm $\mathcal{NLU}$ exhibits mixed time-scale behavior and biased perturbations, and its analysis requires a different approach that is developed in the paper.

168 citations

Network Information
Related Journals (5)
arXiv: Artificial Intelligence
13.6K papers, 186.5K citations
89% related
arXiv: Learning
45K papers, 837.1K citations
84% related
Journal of Artificial Intelligence Research
1.5K papers, 158.2K citations
84% related
Transportation Research Part C-emerging Technologies
3.5K papers, 213.4K citations
80% related
arXiv: Optimization and Control
21.7K papers, 187.1K citations
80% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
2021249
2020246
2019258
2018177
2017113
201679