Showing papers in "arXiv: Multiagent Systems in 2020"

PDF

Open Access

Posted Content•

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

[...]

01 Nov 2020-arXiv: Multiagent Systems

TL;DR: This work provides a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective and expects this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.

...read moreread less

Abstract: Following the remarkable success of the AlphaGO series, 2019 was a booming year that witnessed significant advances in multi-agent reinforcement learning (MARL) techniques. MARL corresponds to the learning problem in a multi-agent system in which multiple agents learn simultaneously. It is an interdisciplinary domain with a long history that includes game theory, machine learning, stochastic control, psychology, and optimisation. Although MARL has achieved considerable empirical success in solving real-world games, there is a lack of a self-contained overview in the literature that elaborates the game theoretical foundations of modern MARL methods and summarises the recent advances. In fact, the majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier. The goal of our monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.

...read moreread less

103 citations

Posted Content•

SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving.

[...]

19 Oct 2020-arXiv: Multiagent Systems

TL;DR: The design goals of SMARTS (Scalable Multi-Agent RL Training School) are described, its basic architecture and its key features are explained, and its use is illustrated through concrete multi-agent experiments on interactive scenarios.

...read moreread less

Abstract: Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse and competent driving interactions. To meet this need, we develop a dedicated simulation platform called SMARTS (Scalable Multi-Agent RL Training School). SMARTS supports the training, accumulation, and use of diverse behavior models of road users. These are in turn used to create increasingly more realistic and diverse interactions that enable deeper and broader research on multi-agent interaction. In this paper, we describe the design goals of SMARTS, explain its basic architecture and its key features, and illustrate its use through concrete multi-agent experiments on interactive scenarios. We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving. Our code is available at this https URL.

...read moreread less

98 citations

Posted Content•

Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

[...]

Yaodong Yang, Jianye Hao, Ben Liao, Kun Shao, Guangyong Chen, Wulong Liu, Hongyao Tang - Show less +3 more

09 Jun 2020-arXiv: Multiagent Systems

TL;DR: Extensive experiments demonstrate that the theoretically derive a general formula of Q_{tot} in terms of $Q^{i}$, based on which a multi-head attention formation to approximate $Q_{Tot}$ can naturally implement, resulting in not only a refined representation of $Tot$ with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies.

...read moreread less

Abstract: In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in such challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value $Q_{tot}$ into individual Q-values $Q^{i}$ to guide individuals' behaviors, i.e. VDN imposing an additive formation and QMIX adopting a monotonic assumption using an implicit mixing method. However, most of the previous efforts impose certain assumptions between $Q_{tot}$ and $Q^{i}$ and lack theoretical groundings. Besides, they do not explicitly consider the agent-level impact of individuals to the whole system when transforming individual $Q^{i}$s into $Q_{tot}$. In this paper, we theoretically derive a general formula of $Q_{tot}$ in terms of $Q^{i}$, based on which we can naturally implement a multi-head attention formation to approximate $Q_{tot}$, resulting in not only a refined representation of $Q_{tot}$ with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies. Extensive experiments demonstrate that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmark across different scenarios, and attention analysis is further conducted with valuable insights.

...read moreread less

89 citations

Posted Content•

ROMA: Multi-Agent Reinforcement Learning with Emergent Roles

[...]

Tonghan Wang¹, Heng Dong¹, Victor Lesser², Chongjie Zhang¹•Institutions (2)

Tsinghua University¹, University of Massachusetts Amherst²

18 Mar 2020-arXiv: Multiagent Systems

TL;DR: Experiments show that the proposed role-oriented MARL framework (ROMA) can learn specialized, dynamic, and identifiable roles, which help the method push forward the state of the art on the StarCraft II micromanagement benchmark.

...read moreread less

Abstract: The role concept provides a useful tool to design and understand complex multi-agent systems, which allows agents with a similar role to share similar behaviors. However, existing role-based methods use prior domain knowledge and predefine role structures and behaviors. In contrast, multi-agent reinforcement learning (MARL) provides flexibility and adaptability, but less efficiency in complex tasks. In this paper, we synergize these two paradigms and propose a role-oriented MARL framework (ROMA). In this framework, roles are emergent, and agents with similar roles tend to share their learning and to be specialized on certain sub-tasks. To this end, we construct a stochastic role embedding space by introducing two novel regularizers and conditioning individual policies on roles. Experiments show that our method can learn specialized, dynamic, and identifiable roles, which help our method push forward the state of the art on the StarCraft II micromanagement benchmark. Demonstrative videos are available at this https URL.

...read moreread less

71 citations

Journal Article•DOI•

Multi-UAV Path Planning for Wireless Data Harvesting with Deep Reinforcement Learning

[...]

Harald Bayerlein¹, Mirco Theile², Marco Caccamo², David Gesbert¹•Institutions (2)

Institut Eurécom¹, Technische Universität München²

23 Oct 2020-arXiv: Multiagent Systems

TL;DR: This work formulate the path planning problem for a cooperative, non-communicating, and homogeneous team of UAVs tasked with maximizing collected data from distributed IoT sensor nodes subject to flying time and collision avoidance constraints, and solves the problem through a deep reinforcement learning (DRL) approach.

...read moreread less

Abstract: Harvesting data from distributed Internet of Things (IoT) devices with multiple autonomous unmanned aerial vehicles (UAVs) is a challenging problem requiring flexible path planning methods. We propose a multi-agent reinforcement learning (MARL) approach that, in contrast to previous work, can adapt to profound changes in the scenario parameters defining the data harvesting mission, such as the number of deployed UAVs, number, position and data amount of IoT devices, or the maximum flying time, without the need to perform expensive recomputations or relearn control policies. We formulate the path planning problem for a cooperative, non-communicating, and homogeneous team of UAVs tasked with maximizing collected data from distributed IoT sensor nodes subject to flying time and collision avoidance constraints. The path planning problem is translated into a decentralized partially observable Markov decision process (Dec-POMDP), which we solve through a deep reinforcement learning (DRL) approach, approximating the optimal UAV control policy without prior knowledge of the challenging wireless channel characteristics in dense urban environments. By exploiting a combination of centered global and local map representations of the environment that are fed into convolutional layers of the agents, we show that our proposed network architecture enables the agents to cooperate effectively by carefully dividing the data collection task among themselves, adapt to large complex environments and state spaces, and make movement decisions that balance data collection goals, flight-time efficiency, and navigation constraints. Finally, learning a control policy that generalizes over the scenario parameter space enables us to analyze the influence of individual parameters on collection performance and provide some intuition about system-level benefits.

...read moreread less

61 citations

Posted Content•

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

[...]

Filippos Christianos¹, Lukas Schäfer¹, Stefano V. Albrecht¹•Institutions (1)

University of Edinburgh¹

12 Jun 2020-arXiv: Multiagent Systems

TL;DR: This work proposes a general method for efficient exploration by sharing experience amongst agents by applying experience sharing in an actor-critic framework and finds that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns.

...read moreread less

Abstract: Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.

...read moreread less

53 citations

Posted Content•

A Separation-Based Methodology to Consensus Tracking of Switched High-Order Nonlinear Multi-Agent Systems

[...]

Maolong Lv, Wenwu Yu, Jinde Cao, Simone Baldi

10 Jun 2020-arXiv: Multiagent Systems

TL;DR: In this paper, the authors investigated a reduced complexity adaptive methodology to consensus tracking for a team of uncertain high-order nonlinear systems with switched (possibly asynchronous) dynamics, where the control gain of each virtual control law does not have to be incorporated in the next virtual controller law iteratively, thus leading to a simpler expression of the control laws.

...read moreread less

Abstract: This work investigates a reduced-complexity adaptive methodology to consensus tracking for a team of uncertain high-order nonlinear systems with switched (possibly asynchronous) dynamics. It is well known that high-order nonlinear systems are intrinsically challenging as feedback linearization and backstepping methods successfully developed for low-order systems fail to work. At the same time, even the adding-one power-integrator methodology, well explored for the single-agent high-order case, presents some complexity issues and is unsuited for distributed control. At the core of the proposed distributed methodology is a newly proposed definition for separable functions: this definition allows the formulation of a separation-based lemma to handle the high-order terms with reduced complexity in the control design. Complexity is reduced in a twofold sense: the control gain of each virtual control law does not have to be incorporated in the next virtual control law iteratively, thus leading to a simpler expression of the control laws; the order of the virtual control gains increases only proportionally (rather than exponentially) with the order of the systems, dramatically reducing high-gain issues.

...read moreread less

37 citations

Journal Article•DOI•

An Overview on Optimal Flocking

[...]

Logan E. Beaver¹, Andreas A. Malikopoulos¹•Institutions (1)

University of Delaware¹

29 Sep 2020-arXiv: Multiagent Systems

TL;DR: An overview of the literature focusing on optimization approaches to achieve flocking behavior that provide strong safety guarantees is presented, and several approaches aimed at minimizing flocking communication and computational requirements in real systems via neighbor filtering and event-driven planning are presented.

...read moreread less

Abstract: The study of robotic flocking has received considerable attention in the past twenty years. As we begin to deploy flocking control algorithms on physical multi-agent and swarm systems, there is an increasing necessity for rigorous promises on safety and performance. In this paper, we present an overview the literature focusing on optimization approaches to achieve flocking behavior that provide strong safety guarantees. We separate the literature into cluster and line flocking, and categorize cluster flocking with respect to the system objective, which may be realized by a reactive, or planning, control algorithm. We also present several approaches aimed at minimizing flocking communication and computational requirements in real systems via neighbor filtering and event-driven planning. We conclude the overview with our perspective on the outlook and future research direction of optimal flocking algorithms.

...read moreread less

37 citations

Posted Content•

Optimizing Multi-UAV Deployment in 3D Space to Minimize Task Completion Time in UAV-Enabled Mobile Edge Computing Systems

[...]

Sujunjie Sun¹, Guopeng Zhang¹, Haibo Mei², Kezhi Wang³, Kun Yang⁴ - Show less +1 more•Institutions (4)

China University of Mining and Technology¹, University of Electronic Science and Technology of China², Northumbria University³, University of Essex⁴

24 Oct 2020-arXiv: Multiagent Systems

TL;DR: The simulation results show that the joint optimization of the horizontal and the vertical position of a group of UAVs can achieve better performance than the traditional algorithms.

...read moreread less

Abstract: In Unmanned Aerial Vehicle (UAV)-enabled mobile edge computing (MEC) systems, UAVs can carry edge servers to help ground user equipment (UEs) offloading their computing tasks to the UAVs for execution. This paper aims to minimize the total time required for the UAVs to complete the offloaded tasks, while optimizing the three-dimensional (3D) deployment of UAVs, including their flying height and horizontal positions. Although the formulated optimization is a mixed integer nonlinear programmming, we convert it to a convex problem and develop a successive convex approximation (SCA) based algorithm to effectively solve it. The simulation results show that the joint optimization of the horizontal and the vertical position of a group of UAVs can achieve better performance than the traditional algorithms.

...read moreread less

29 citations

Journal Article•DOI•

Decentralized Multi-Agent Pursuit using Deep Reinforcement Learning

[...]

Cristino de Souza¹, Rhys Newbury, Akansel Cosgun, Pedro Castillo, Boris Vidolov, Dana Kulic - Show less +2 more•Institutions (1)

University of Technology of Compiègne¹

16 Oct 2020-arXiv: Multiagent Systems

TL;DR: This work uses shared experience to train a policy for a given number of pursuers, executed independently by each agent at run-time, for pursuing an omnidirectional target with multiple, homogeneous agents that are subject to unicycle kinematic constraints.

...read moreread less

Abstract: Pursuit-evasion is the problem of capturing mobile targets with one or more pursuers. We use deep reinforcement learning for pursuing an omni-directional target with multiple, homogeneous agents that are subject to unicycle kinematic constraints. We use shared experience to train a policy for a given number of pursuers that is executed independently by each agent at run-time. The training benefits from curriculum learning, a sweeping-angle ordering to locally represent neighboring agents and encouraging good formations with reward structure that combines individual and group rewards. Simulated experiments with a reactive evader and up to eight pursuers show that our learning-based approach, with non-holonomic agents, performs on par with classical algorithms with omni-directional agents, and outperforms their non-holonomic adaptations. The learned policy is successfully transferred to the real world in a proof-of-concept demonstration with three motion-constrained pursuer drones.

...read moreread less

27 citations

Posted Content•

Opinion Diffusion and Campaigning on Society Graphs.

[...]

Piotr Faliszewski¹, Rica Gonen², Martin Koutecký³, Nimrod Talmon⁴•Institutions (4)

AGH University of Science and Technology¹, Open University of Israel², Charles University in Prague³, Ben-Gurion University of the Negev⁴

01 Oct 2020-arXiv: Multiagent Systems

TL;DR: In this paper, the authors study the effects of campaigning, where the society is partitioned into voter clusters and a diffusion process propagates opinions in a network connecting the clusters, and show that computing the cheapest campaign for rigging a given election can usually be done efficiently, even with arbitrarily many voters.

...read moreread less

Abstract: We study the effects of campaigning, where the society is partitioned into voter clusters and a diffusion process propagates opinions in a network connecting the clusters. Our model is very powerful and can incorporate many campaigning actions, various partitions of the society into clusters, and very general diffusion processes. Perhaps surprisingly, we show that computing the cheapest campaign for rigging a given election can usually be done efficiently, even with arbitrarily-many voters. Moreover, we report on certain computational simulations.

...read moreread less

Posted Content•DOI•

Reward Machines for Cooperative Multi-Agent Reinforcement Learning

[...]

Cyrus Neary¹, Zhe Xu², Bo Wu, Ufuk Topcu¹•Institutions (2)

University of Texas at Austin¹, Arizona State University²

03 Jul 2020-arXiv: Multiagent Systems

TL;DR: The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies and independencies, allowing the team-level task to be decomposed into sub-tasks for individual agents, and provides a natural approach to decentralized learning.

...read moreread less

Abstract: In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies and independencies, allowing the team-level task to be decomposed into sub-tasks for individual agents. We define such a notion of RM decomposition and present algorithmically verifiable conditions guaranteeing that distributed completion of the sub-tasks leads to team behavior accomplishing the original task. This framework for task decomposition provides a natural approach to decentralized learning: agents may learn to accomplish their sub-tasks while observing only their local state and abstracted representations of their teammates. We accordingly propose a decentralized q-learning algorithm. Furthermore, in the case of undiscounted rewards, we use local value functions to derive lower and upper bounds for the global value function corresponding to the team task. Experimental results in three discrete settings exemplify the effectiveness of the proposed RM decomposition approach, which converges to a successful team policy two orders of magnitude faster than a centralized learner and significantly outperforms hierarchical and independent q-learning approaches.

...read moreread less

Posted Content•

Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment

[...]

Ivan Stelmakh¹, Nihar B. Shah¹, Aarti Singh¹•Institutions (1)

Carnegie Mellon University¹

08 Oct 2020-arXiv: Multiagent Systems

TL;DR: This paper designs a principled test for detecting strategic behaviour, designs an experiment that elicits strategic behaviour from subjects and releases a dataset of patterns of strategic behaviour that may be of independent interest, and proves that the test has strong false alarm guarantees.

...read moreread less

Abstract: We consider the issue of strategic behaviour in various peer-assessment tasks, including peer grading of exams or homeworks and peer review in hiring or promotions. When a peer-assessment task is competitive (e.g., when students are graded on a curve), agents may be incentivized to misreport evaluations in order to improve their own final standing. Our focus is on designing methods for detection of such manipulations. Specifically, we consider a setting in which agents evaluate a subset of their peers and output rankings that are later aggregated to form a final ordering. In this paper, we investigate a statistical framework for this problem and design a principled test for detecting strategic behaviour. We prove that our test has strong false alarm guarantees and evaluate its detection ability in practical settings. For this, we design and execute an experiment that elicits strategic behaviour from subjects and release a dataset of patterns of strategic behaviour that may be of independent interest. We then use the collected data to conduct a series of real and semi-synthetic evaluations that demonstrate a strong detection power of our test.

...read moreread less

Posted Content•

A Review of Platforms for the Development of Agent Systems.

[...]

Constantin-Valentin Pal, Florin Leon, Marcin Paprzycki, Maria Ganzha

17 Jul 2020-arXiv: Multiagent Systems

TL;DR: This work details the main characteristics of the included agent platforms, together with links to specific projects where they have been used and classifies the agent platforms as general purpose ones, free or commercial, and specialized ones, which can be used for particular types of applications.

...read moreread less

Abstract: Agent-based computing is an active field of research with the goal of building autonomous software of hardware entities. This task is often facilitated by the use of dedicated, specialized frameworks. For almost thirty years, many such agent platforms have been developed. Meanwhile, some of them have been abandoned, others continue their development and new platforms are released. This paper presents a up-to-date review of the existing agent platforms and also a historical perspective of this domain. It aims to serve as a reference point for people interested in developing agent systems. This work details the main characteristics of the included agent platforms, together with links to specific projects where they have been used. It distinguishes between the active platforms and those no longer under development or with unclear status. It also classifies the agent platforms as general purpose ones, free or commercial, and specialized ones, which can be used for particular types of applications.

...read moreread less

Journal Article•DOI•

Federated Multi-Agent Actor-Critic Learning for Age Sensitive Mobile Edge Computing

[...]

Zheqi Zhu¹, Shuo Wan¹, Pingyi Fan¹, Khaled Ben Letaief²•Institutions (2)

Tsinghua University¹, Hong Kong University of Science and Technology²

28 Dec 2020-arXiv: Multiagent Systems

TL;DR: In this paper, a heterogeneous multi-agent actor critic (H-MAAC) is proposed as a paradigm for joint collaboration in the investigated MEC systems, where edge devices and center controller learn the interactive strategies through their own observations.

...read moreread less

Abstract: As an emerging technique, mobile edge computing (MEC) introduces a new processing scheme for various distributed communication-computing systems such as industrial Internet of Things (IoT), vehicular communication, smart city, etc. In this work, we mainly focus on the timeliness of the MEC systems where the freshness of the data and computation tasks is significant. Firstly, we formulate a kind of age-sensitive MEC models and define the average age of information (AoI) minimization problems of interests. Then, a novel policy based multi-agent deep reinforcement learning (RL) framework, called heterogeneous multi-agent actor critic (H-MAAC), is proposed as a paradigm for joint collaboration in the investigated MEC systems, where edge devices and center controller learn the interactive strategies through their own observations. To improves the system performance, we develop the corresponding online algorithm by introducing an edge federated learning mode into the multi-agent cooperation whose advantages on learning convergence can be guaranteed theoretically. To the best of our knowledge, it's the first joint MEC collaboration algorithm that combines the edge federated mode with the multi-agent actor-critic reinforcement learning. Furthermore, we evaluate the proposed approach and compare it with classical RL based methods. As a result, the proposed framework not only outperforms the baseline on average system age, but also promotes the stability of training process. Besides, the simulation results provide some innovative perspectives for the system design under the edge federated collaboration.

...read moreread less

Journal Article•DOI•

Coordination of Autonomous Vehicles: Taxonomy and Survey

[...]

Stefano Mariani¹, Giacomo Cabri¹, Franco Zambonelli¹•Institutions (1)

University of Modena and Reggio Emilia¹

08 Jan 2020-arXiv: Multiagent Systems

TL;DR: The general problems associated with coordination of autonomous vehicles are introduced by identifying and framing the key classes of coordination problems and the different approaches that can be adopted to deal with such problems are overviewed.

...read moreread less

Abstract: In the near future, our streets will be populated by myriads of autonomous self-driving vehicles to serve our diverse mobility needs. This will raise the need to coordinate their movements in order to properly handle both access to shared resources (e.g., intersections and parking slots) and the execution of mobility tasks (e.g., platooning and ramp merging). In this paper, we firstly introduce the general issues associated to coordination of autonomous vehicles, by identifying and framing the key classes of coordination problems. Following, we overview the different approaches that can be adopted to manage such coordination problems, by classifying them in terms of the degree of autonomy in decision making that is left to autonomous vehicles during coordination. Finally, we overview some further peculiar challenges that research will have to address before autonomously coordinated vehicles can safely hit our streets.

...read moreread less

Posted Content•

Multi Type Mean Field Reinforcement Learning

[...]

Sriram Subramanian¹, Pascal Poupart, Matthew D. Taylor, Nidhi Hegde•Institutions (1)

University of Waterloo¹

06 Feb 2020-arXiv: Multiagent Systems

TL;DR: New algorithms for each type of game are introduced and demonstrate their superior performance over state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.

...read moreread less

Abstract: Mean field theory provides an effective way of scaling multiagent reinforcement learning algorithms to environments with many agents that can be abstracted by a virtual mean agent. In this paper, we extend mean field multiagent algorithms to multiple types. The types enable the relaxation of a core assumption in mean field games, which is that all agents in the environment are playing almost similar strategies and have the same goal. We conduct experiments on three different testbeds for the field of many agent reinforcement learning, based on the standard MAgents framework. We consider two different kinds of mean field games: a) Games where agents belong to predefined types that are known a priori and b) Games where the type of each agent is unknown and therefore must be learned based on observations. We introduce new algorithms for each type of game and demonstrate their superior performance over state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.

...read moreread less

Proceedings Article•DOI•

BARK: Open Behavior Benchmarking in Multi-Agent Environments

[...]

Julian Bernhard, Klemens Esterle, Patrick Hart, Tobias Kessler

05 Mar 2020-arXiv: Multiagent Systems

TL;DR: This work introduces BARK, an open-source behavior benchmarking environment designed to mitigate the shortcomings of current behavior models and shows that BARK provides a suitable framework for a systematic development of behavior models.

...read moreread less

Abstract: Predicting and planning interactive behaviors in complex traffic situations presents a challenging task. Especially in scenarios involving multiple traffic participants that interact densely, autonomous vehicles still struggle to interpret situations and to eventually achieve their own mission goal. As driving tests are costly and challenging scenarios are hard to find and reproduce, simulation is widely used to develop, test, and benchmark behavior models. However, most simulations rely on datasets and simplistic behavior models for traffic participants and do not cover the full variety of real-world, interactive human behaviors. In this work, we introduce BARK, an open-source behavior benchmarking environment designed to mitigate the shortcomings stated above. In BARK, behavior models are (re-)used for planning, prediction, and simulation. A range of models is currently available, such as Monte-Carlo Tree Search and Reinforcement Learning-based behavior models. We use a public dataset and sampling-based scenario generation to show the inter-exchangeability of behavior models in BARK. We evaluate how well the models used cope with interactions and how robust they are towards exchanging behavior models. Our evaluation shows that BARK provides a suitable framework for a systematic development of behavior models.

...read moreread less

Posted Content•

A Distributed Model-Free Ride-Sharing Approach for Joint Matching, Pricing, and Dispatching using Deep Reinforcement Learning

[...]

Marina Haliem¹, Ganapathy Mani¹, Vaneet Aggarwal¹, Bharat Bhargava¹•Institutions (1)

Purdue University¹

05 Oct 2020-arXiv: Multiagent Systems

TL;DR: A dynamic, demand aware, and pricing-based vehicle-passenger matching and route planning framework that dynamically generates optimal routes for each vehicle based on online demand, pricing associated with each ride, vehicle capacities and locations is presented.

...read moreread less

Abstract: Significant development of ride-sharing services presents a plethora of opportunities to transform urban mobility by providing personalized and convenient transportation while ensuring efficiency of large-scale ride pooling. However, a core problem for such services is route planning for each driver to fulfill the dynamically arriving requests while satisfying given constraints. Current models are mostly limited to static routes with only two rides per vehicle (optimally) or three (with heuristics). In this paper, we present a dynamic, demand aware, and pricing-based vehicle-passenger matching and route planning framework that (1) dynamically generates optimal routes for each vehicle based on online demand, pricing associated with each ride, vehicle capacities and locations. This matching algorithm starts greedily and optimizes over time using an insertion operation, (2) involves drivers in the decision-making process by allowing them to propose a different price based on the expected reward for a particular ride as well as the destination locations for future rides, which is influenced by supply-and demand computed by the Deep Q-network, (3) allows customers to accept or reject rides based on their set of preferences with respect to pricing and delay windows, vehicle type and carpooling preferences, and (4) based on demand prediction, our approach re-balances idle vehicles by dispatching them to the areas of anticipated high demand using deep Reinforcement Learning (RL). Our framework is validated using the New York City Taxi public dataset; however, we consider different vehicle types and designed customer utility functions to validate the setup and study different settings. Experimental results show the effectiveness of our approach in real-time and large scale settings.

...read moreread less

Posted Content•

Resilient Task Allocation in Heterogeneous Multi-Robot Systems

[...]

Siddharth Mayya¹, Diego S. D'Antonio², David Saldaña², Vijay Kumar¹•Institutions (2)

University of Pennsylvania¹, Lehigh University²

09 Sep 2020-arXiv: Multiagent Systems

TL;DR: This letter presents a resilient mechanism to allocate heterogeneous robots to tasks under difficult environmental conditions such as weather events or adversarial attacks, and relaxes the resource constraints corresponding to some tasks, thus exhibiting a graceful degradation of performance.

...read moreread less

Abstract: For a multi-robot system equipped with heterogeneous capabilities, this paper presents a mechanism to allocate robots to tasks in a resilient manner when anomalous environmental conditions such as weather events or adversarial attacks affect the performance of robots within the tasks. Our primary objective is to ensure that each task is assigned the requisite level of resources, measured as the aggregated capabilities of the robots allocated to the task. By keeping track of task performance deviations under external perturbations, our framework quantifies the extent to which robot capabilities (e.g., visual sensing or aerial mobility) are affected by environmental conditions. This enables an optimization-based framework to flexibly reallocate robots to tasks based on the most degraded capabilities within each task. In the face of resource limitations and adverse environmental conditions, our algorithm minimally relaxes the resource constraints corresponding to some tasks, thus exhibiting a graceful degradation of performance. Simulated experiments in a multi-robot coverage and target tracking scenario demonstrate the efficacy of the proposed approach.

...read moreread less

Posted Content•

Situating Agent-Based Modelling in Population Health Research

[...]

Eric Silverman¹, Umberto Gostoli¹, Stefano Picascia¹, Jonatan Almagor¹, Mark McCann¹, Richard J. Shaw¹, Claudio Angione² - Show less +3 more•Institutions (2)

University of Glasgow¹, Teesside University²

06 Feb 2020-arXiv: Multiagent Systems

TL;DR: For ABM to be most effective in the field it should be used as a means for answering questions normally inaccessible to the traditional epidemiological toolkit, and why simulations are essential to the study of complex systems theory.

...read moreread less

Abstract: Today's most troublesome population health challenges are often driven by social and environmental determinants, which are difficult to model using traditional epidemiological methods. We agree with those who have argued for the wider adoption of agent-based modelling (ABM) in taking on these challenges. However, while ABM has been used occasionally in population health, we argue that for ABM to be most effective in the field it should be used as a means for answering questions normally inaccessible to the traditional epidemiological toolkit. In an effort to clearly illustrate the utility of ABM for population health research, and to clear up persistent misunderstandings regarding the method's conceptual underpinnings, we offer a detailed presentation of the core concepts of complex systems theory, and summarise why simulations are essential to the study of complex systems. We then examine the current state of the art in ABM for population health, and propose they are well-suited for the study of the `wicked' problems in population health, and could make significant contributions to theory and intervention development in these areas.

...read moreread less

Posted Content•

Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous

[...]

Rose E. Wang¹, J. Chase Kew², Dennis Lee², Tsang-Wei Edward Lee², Tingnan Zhang², Brian Ichter², Jie Tan², Aleksandra Faust² - Show less +4 more•Institutions (2)

Stanford University¹, Google²

15 Mar 2020-arXiv: Multiagent Systems

TL;DR: This work proposes hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous that removes the need for a centralized operator in multiagent systems by combining model- based RL and inference methods, enabling agents to dynamically align plans.

...read moreread less

Abstract: Collaboration requires agents to align their goals on the fly. Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans. We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous. Starting with pretrained, single-agent point to point navigation policies and using noisy, high-dimensional sensor inputs like lidar, we first learn via self-supervision motion predictions of all agents on the team. Next, HPP uses the prediction models to propose and evaluate navigation subgoals for completing the rendezvous task without explicit communication among agents. We evaluate HPP in a suite of unseen environments, with increasing complexity and numbers of obstacles. We show that HPP outperforms alternative reinforcement learning, path planning, and heuristic-based baselines on challenging, unseen environments. Experiments in the real world demonstrate successful transfer of the prediction models from sim to real world without any additional fine-tuning. Altogether, HPP removes the need for a centralized operator in multiagent systems by combining model-based RL and inference methods, enabling agents to dynamically align plans.

...read moreread less

Proceedings Article•DOI•

Biased Opinion Dynamics: When the Devil Is in the Details.

[...]

Aris Anagnostopoulos¹, Luca Becchetti¹, Emilio Cruciani², Francesco Pasquale³, Sara Rizzo - Show less +1 more•Institutions (3)

Sapienza University of Rome¹, French Institute for Research in Computer Science and Automation², University of Rome Tor Vergata³

31 Aug 2020-arXiv: Multiagent Systems

TL;DR: It is believed that the model proposed is at the same time simple, rich, and modular, affording mathematical characterization of the interplay between bias, underlying opinion dynamics, and social structure in a unified setting.

...read moreread less

Abstract: We investigate opinion dynamics in multi-agent networks when a bias toward one of two possible opinions exists; for example, reflecting a status quo vs a superior alternative. Starting with all agents sharing an initial opinion representing the status quo, the system evolves in steps. In each step, one agent selected uniformly at random adopts the superior opinion with some probability $\alpha$, and with probability $1 - \alpha$ it follows an underlying update rule to revise its opinion on the basis of those held by its neighbors. We analyze convergence of the resulting process under two well-known update rules, namely majority and voter. The framework we propose exhibits a rich structure, with a non-obvious interplay between topology and underlying update rule. For example, for the voter rule we show that the speed of convergence bears no significant dependence on the underlying topology, whereas the picture changes completely under the majority rule, where network density negatively affects convergence. We believe that the model we propose is at the same time simple, rich, and modular, affording mathematical characterization of the interplay between bias, underlying opinion dynamics, and social structure in a unified setting.

...read moreread less

Posted Content•

Scalable Reinforcement Learning Policies for Multi-Agent Control.

[...]

Christopher D. Hsu¹, Heejin Jeong¹, George J. Pappas¹, Pratik Chaudhari¹•Institutions (1)

University of Pennsylvania¹

16 Nov 2020-arXiv: Multiagent Systems

TL;DR: A masking heuristic is developed that allows training on smaller problems with few pursuers-targets and execution on much larger problems and is discussed how it enables a hedging behavior between pursuers that leads to a weak form of cooperation in spite of completely decentralized control execution.

...read moreread less

Abstract: This paper develops a stochastic Multi-Agent Reinforcement Learning (MARL) method to learn control policies that can handle an arbitrary number of external agents; our policies can be executed for tasks consisting of 1000 pursuers and 1000 evaders. We model pursuers as agents with limited on-board sensing and formulate the problem as a decentralized, partially-observable Markov Decision Process. An attention mechanism is used to build a permutation and input-size invariant embedding of the observations for learning a stochastic policy and value function using techniques in entropy-regularized off-policy methods. Simulation experiments on a large number of problems show that our control policies are dramatically scalable and display cooperative behavior in spite of being executed in a decentralized fashion; our methods offer a simple solution to classical multi-agent problems using techniques in reinforcement learning.

...read moreread less

Posted Content•

A Microscopic Epidemic Model and Pandemic Prediction Using Multi-Agent Reinforcement Learning.

[...]

Changliu Liu

27 Apr 2020-arXiv: Multiagent Systems

TL;DR: A microscopic approach to model epidemics, which can explicitly consider the consequences of individual's decisions on the spread of the disease, and shows that there are negative externalities in the sense that infected agents do not have enough incentives to protect others, which necessitates external interventions to regulate agents' behaviors.

...read moreread less

Abstract: This paper introduces a microscopic approach to model epidemics, which can explicitly consider the consequences of individual's decisions on the spread of the disease. We first formulate a microscopic multi-agent epidemic model where every agent can choose its activity level that affects the spread of the disease. Then by minimizing agents' cost functions, we solve for the optimal decisions for individual agents in the framework of game theory and multi-agent reinforcement learning. Given the optimal decisions of all agents, we can make predictions about the spread of the disease. We show that there are negative externalities in the sense that infected agents do not have enough incentives to protect others, which then necessitates external interventions to regulate agents' behaviors. In the discussion section, future directions are pointed out to make the model more realistic.

...read moreread less

Journal Article•DOI•

Resilient Distributed Diffusion in Networks with Adversaries.

[...]

Jiani Li¹, Waseem Abbas¹, Xenofon Koutsoukos¹•Institutions (1)

Vanderbilt University¹

23 Mar 2020-arXiv: Multiagent Systems

TL;DR: A resilient distributed diffusion algorithm is presented that is resilient to any data falsification attack in which the number of compromised agents in the local neighborhood of a normal agent is bounded and the proposed algorithm guarantees that all normal agents converge to their true target states if appropriate parameters are selected.

...read moreread less

Abstract: In this paper, we study resilient distributed diffusion for multi-task estimation in the presence of adversaries where networked agents must estimate distinct but correlated states of interest by processing streaming data. We show that in general diffusion strategies are not resilient to malicious agents that do not adhere to the diffusion-based information processing rules. In particular, by exploiting the adaptive weights used for diffusing information, we develop time-dependent attack models that drive normal agents to converge to states selected by the attacker. We show that an attacker that has complete knowledge of the system can always drive its targeted agents to its desired estimates. Moreover, an attacker that does not have complete knowledge of the system including streaming data of targeted agents or the parameters they use in diffusion algorithms, can still be successful in deploying an attack by approximating the needed information. The attack models can be used for both stationary and non-stationary state estimation.In addition, we present and analyze a resilient distributed diffusion algorithm that is resilient to any data falsification attack in which the number of compromised agents in the local neighborhood of a normal agent is bounded. The proposed algorithm guarantees that all normal agents converge to their true target states if appropriate parameters are selected. We also analyze trade-off between the resilience of distributed diffusion and its performance in terms of steady-state mean-square-deviation (MSD) from the correct estimates. Finally, we evaluate the proposed attack models and resilient distributed diffusion algorithm using stationary and non-stationary multi-target localization.

...read moreread less

Posted Content•

Cyber-Physical Mobility Lab An Open-Source Platform for Networked and Autonomous Vehicles.

[...]

Maximilian Kloock, Janis Maczijewski, Patrick Scheffe, Alexandru Kampmann, Armin Mokhtarian, Stefan Kowalewski, Bassam Alrifaee¹ - Show less +3 more•Institutions (1)

RWTH Aachen University¹

21 Apr 2020-arXiv: Multiagent Systems

TL;DR: This paper introduces the Cyber-Physical Mobility Lab, an open-source development environment for networked and autonomous vehicles with focus on networked decision-making, trajectory planning, and control, and a four-layered architecture that enables the seamless use of the same software in simulations and in experiments without any further adaptions.

...read moreread less

Abstract: This paper introduces our Cyber-Physical Mobility Lab (CPM Lab). It is an open-source development environment for networked and autonomous vehicles with focus on networked decision-making, trajectory planning, and control. The CPM Lab hosts 20 physical model-scale vehicles ({\mu}Cars) which we can seamlessly extend by unlimited simulated vehicles. The code and construction plans are publicly available to enable rebuilding the CPM Lab. Our four-layered architecture enables the seamless use of the same software in simulations and in experiments without any further adaptions. A Data Distribution Service (DDS) based middleware allows adapting the number of vehicles during experiments in a seamless manner. The middleware is also responsible for synchronizing all entities following a logical execution time approach to achieve determinism and reproducibility of experiments. This approach makes the CPM Lab a unique platform for rapid functional prototyping of networked decision-making algorithms. The CPM Lab allows researchers as well as students from different disciplines to see their ideas developing into reality. We demonstrate its capabilities using two example experiments. We are working on a remote access to the CPM Lab via a webinterface.

...read moreread less

Posted Content•

Analysing the combined health, social and economic impacts of the corovanvirus pandemic using agent-based social simulation

[...]

Frank Dignum¹, Virginia Dignum¹, Paul Davidsson², Amineh Ghorbani³, Mijke van der Hurk⁴, Maarten Jensen¹, Christian Kammler¹, Fabian Lorig², Luis Gustavo Ludescher¹, Alexander Melchior⁴, René Mellema¹, Cezara Pastrav¹, Loïs Vanhée⁵, Harko Verhagen⁶ - Show less +10 more•Institutions (6)

Umeå University¹, Malmö University², Delft University of Technology³, Utrecht University⁴, University of Caen Lower Normandy⁵, Stockholm University⁶

23 Apr 2020-arXiv: Multiagent Systems

TL;DR: An agent-based social simulation tool, ASSOCC, is proposed that supports decision makers understand possible consequences of policy interventions, but exploring the combined social, health and economic consequences of these interventions.

...read moreread less

Abstract: During the COVID-19 crisis there have been many difficult decisions governments and other decision makers had to make. E.g. do we go for a total lock down or keep schools open? How many people and which people should be tested? Although there are many good models from e.g. epidemiologists on the spread of the virus under certain conditions, these models do not directly translate into the interventions that can be taken by government. Neither can these models contribute to understand the economic and/or social consequences of the interventions. However, effective and sustainable solutions need to take into account this combination of factors. In this paper, we propose an agent-based social simulation tool, ASSOCC, that supports decision makers understand possible consequences of policy interventions, bu exploring the combined social, health and economic consequences of these interventions.

...read moreread less

Proceedings Article•DOI•

Decentralized Game-Theoretic Control for Dynamic Task Allocation Problems for Multi-Agent Systems

[...]

Efstathios Bakolas¹, Yoonjae Lee¹•Institutions (1)

University of Texas at Austin¹

18 Sep 2020-arXiv: Multiagent Systems

TL;DR: A greedy solution approach in which the agents negotiate with each other to find a mutually agreeable task assignment profile based on evaluations of the task utilities that reflect their current states is proposed.

...read moreread less

Abstract: We propose a decentralized game-theoretic framework for dynamic task allocation problems for multi-agent systems. In our problem formulation, the agents' utilities depend on both the rewards and the costs associated with the successful completion of the tasks assigned to them. The rewards reflect how likely is for the agents to accomplish their assigned tasks whereas the costs reflect the effort needed to complete these tasks (this effort is determined by the solution of corresponding optimal control problems). The task allocation problem considered herein corresponds to a dynamic game whose solution depends on the states of the agents in contrast with classic static (or single-act) game formulations. We propose a greedy solution approach in which the agents negotiate with each other to find a mutually agreeable (or individually rational) task assignment profile based on evaluations of the task utilities that reflect their current states. We illustrate the main ideas of this work by means of extensive numerical simulations.

...read moreread less

Posted Content•

Multi-Agent Interactions Modeling with Correlated Policies

[...]

Minghuan Liu¹, Ming Zhou², Weinan Zhang¹, Yuzheng Zhuang³, Jun Wang⁴, Wulong Liu³, Yong Yu¹ - Show less +3 more•Institutions (4)

Shanghai Jiao Tong University¹, Microsoft², Huawei³, University College London⁴

04 Jan 2020-arXiv: Multiagent Systems

TL;DR: A Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution and outperforms state-of-the-art multi-agent imitation learning methods.

...read moreread less

Abstract: In multi-agent systems, complex interacting behaviors arise due to the high correlations among agents. However, previous work on modeling multi-agent interactions from demonstrations is primarily constrained by assuming the independence among policies and their reward structures. In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents' policies, which can recover agents' policies that can regenerate similar interactions. Consequently, we develop a Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution. Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods. Our code is available at \url{this https URL}.

...read moreread less

Collapse