scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Multiagent Systems in 2020"


Posted Content
TL;DR: This work provides a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective and expects this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.
Abstract: Following the remarkable success of the AlphaGO series, 2019 was a booming year that witnessed significant advances in multi-agent reinforcement learning (MARL) techniques. MARL corresponds to the learning problem in a multi-agent system in which multiple agents learn simultaneously. It is an interdisciplinary domain with a long history that includes game theory, machine learning, stochastic control, psychology, and optimisation. Although MARL has achieved considerable empirical success in solving real-world games, there is a lack of a self-contained overview in the literature that elaborates the game theoretical foundations of modern MARL methods and summarises the recent advances. In fact, the majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier. The goal of our monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.

103 citations


Posted Content
TL;DR: The design goals of SMARTS (Scalable Multi-Agent RL Training School) are described, its basic architecture and its key features are explained, and its use is illustrated through concrete multi-agent experiments on interactive scenarios.
Abstract: Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse and competent driving interactions. To meet this need, we develop a dedicated simulation platform called SMARTS (Scalable Multi-Agent RL Training School). SMARTS supports the training, accumulation, and use of diverse behavior models of road users. These are in turn used to create increasingly more realistic and diverse interactions that enable deeper and broader research on multi-agent interaction. In this paper, we describe the design goals of SMARTS, explain its basic architecture and its key features, and illustrate its use through concrete multi-agent experiments on interactive scenarios. We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving. Our code is available at this https URL.

98 citations


Posted Content
TL;DR: Extensive experiments demonstrate that the theoretically derive a general formula of Q_{tot} in terms of $Q^{i}$, based on which a multi-head attention formation to approximate $Q_{Tot}$ can naturally implement, resulting in not only a refined representation of $Tot$ with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies.
Abstract: In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in such challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value $Q_{tot}$ into individual Q-values $Q^{i}$ to guide individuals' behaviors, i.e. VDN imposing an additive formation and QMIX adopting a monotonic assumption using an implicit mixing method. However, most of the previous efforts impose certain assumptions between $Q_{tot}$ and $Q^{i}$ and lack theoretical groundings. Besides, they do not explicitly consider the agent-level impact of individuals to the whole system when transforming individual $Q^{i}$s into $Q_{tot}$. In this paper, we theoretically derive a general formula of $Q_{tot}$ in terms of $Q^{i}$, based on which we can naturally implement a multi-head attention formation to approximate $Q_{tot}$, resulting in not only a refined representation of $Q_{tot}$ with an agent-level attention mechanism, but also a tractable maximization algorithm of decentralized policies. Extensive experiments demonstrate that our method outperforms state-of-the-art MARL methods on the widely adopted StarCraft benchmark across different scenarios, and attention analysis is further conducted with valuable insights.

89 citations


Posted Content
TL;DR: Experiments show that the proposed role-oriented MARL framework (ROMA) can learn specialized, dynamic, and identifiable roles, which help the method push forward the state of the art on the StarCraft II micromanagement benchmark.
Abstract: The role concept provides a useful tool to design and understand complex multi-agent systems, which allows agents with a similar role to share similar behaviors. However, existing role-based methods use prior domain knowledge and predefine role structures and behaviors. In contrast, multi-agent reinforcement learning (MARL) provides flexibility and adaptability, but less efficiency in complex tasks. In this paper, we synergize these two paradigms and propose a role-oriented MARL framework (ROMA). In this framework, roles are emergent, and agents with similar roles tend to share their learning and to be specialized on certain sub-tasks. To this end, we construct a stochastic role embedding space by introducing two novel regularizers and conditioning individual policies on roles. Experiments show that our method can learn specialized, dynamic, and identifiable roles, which help our method push forward the state of the art on the StarCraft II micromanagement benchmark. Demonstrative videos are available at this https URL.

71 citations


Journal ArticleDOI
TL;DR: This work formulate the path planning problem for a cooperative, non-communicating, and homogeneous team of UAVs tasked with maximizing collected data from distributed IoT sensor nodes subject to flying time and collision avoidance constraints, and solves the problem through a deep reinforcement learning (DRL) approach.
Abstract: Harvesting data from distributed Internet of Things (IoT) devices with multiple autonomous unmanned aerial vehicles (UAVs) is a challenging problem requiring flexible path planning methods. We propose a multi-agent reinforcement learning (MARL) approach that, in contrast to previous work, can adapt to profound changes in the scenario parameters defining the data harvesting mission, such as the number of deployed UAVs, number, position and data amount of IoT devices, or the maximum flying time, without the need to perform expensive recomputations or relearn control policies. We formulate the path planning problem for a cooperative, non-communicating, and homogeneous team of UAVs tasked with maximizing collected data from distributed IoT sensor nodes subject to flying time and collision avoidance constraints. The path planning problem is translated into a decentralized partially observable Markov decision process (Dec-POMDP), which we solve through a deep reinforcement learning (DRL) approach, approximating the optimal UAV control policy without prior knowledge of the challenging wireless channel characteristics in dense urban environments. By exploiting a combination of centered global and local map representations of the environment that are fed into convolutional layers of the agents, we show that our proposed network architecture enables the agents to cooperate effectively by carefully dividing the data collection task among themselves, adapt to large complex environments and state spaces, and make movement decisions that balance data collection goals, flight-time efficiency, and navigation constraints. Finally, learning a control policy that generalizes over the scenario parameter space enables us to analyze the influence of individual parameters on collection performance and provide some intuition about system-level benefits.

61 citations


Posted Content
TL;DR: This work proposes a general method for efficient exploration by sharing experience amongst agents by applying experience sharing in an actor-critic framework and finds that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns.
Abstract: Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.

53 citations


Posted Content
TL;DR: In this paper, the authors investigated a reduced complexity adaptive methodology to consensus tracking for a team of uncertain high-order nonlinear systems with switched (possibly asynchronous) dynamics, where the control gain of each virtual control law does not have to be incorporated in the next virtual controller law iteratively, thus leading to a simpler expression of the control laws.
Abstract: This work investigates a reduced-complexity adaptive methodology to consensus tracking for a team of uncertain high-order nonlinear systems with switched (possibly asynchronous) dynamics. It is well known that high-order nonlinear systems are intrinsically challenging as feedback linearization and backstepping methods successfully developed for low-order systems fail to work. At the same time, even the adding-one power-integrator methodology, well explored for the single-agent high-order case, presents some complexity issues and is unsuited for distributed control. At the core of the proposed distributed methodology is a newly proposed definition for separable functions: this definition allows the formulation of a separation-based lemma to handle the high-order terms with reduced complexity in the control design. Complexity is reduced in a twofold sense: the control gain of each virtual control law does not have to be incorporated in the next virtual control law iteratively, thus leading to a simpler expression of the control laws; the order of the virtual control gains increases only proportionally (rather than exponentially) with the order of the systems, dramatically reducing high-gain issues.

37 citations


Journal ArticleDOI
TL;DR: An overview of the literature focusing on optimization approaches to achieve flocking behavior that provide strong safety guarantees is presented, and several approaches aimed at minimizing flocking communication and computational requirements in real systems via neighbor filtering and event-driven planning are presented.
Abstract: The study of robotic flocking has received considerable attention in the past twenty years. As we begin to deploy flocking control algorithms on physical multi-agent and swarm systems, there is an increasing necessity for rigorous promises on safety and performance. In this paper, we present an overview the literature focusing on optimization approaches to achieve flocking behavior that provide strong safety guarantees. We separate the literature into cluster and line flocking, and categorize cluster flocking with respect to the system objective, which may be realized by a reactive, or planning, control algorithm. We also present several approaches aimed at minimizing flocking communication and computational requirements in real systems via neighbor filtering and event-driven planning. We conclude the overview with our perspective on the outlook and future research direction of optimal flocking algorithms.

37 citations


Posted Content
TL;DR: The simulation results show that the joint optimization of the horizontal and the vertical position of a group of UAVs can achieve better performance than the traditional algorithms.
Abstract: In Unmanned Aerial Vehicle (UAV)-enabled mobile edge computing (MEC) systems, UAVs can carry edge servers to help ground user equipment (UEs) offloading their computing tasks to the UAVs for execution. This paper aims to minimize the total time required for the UAVs to complete the offloaded tasks, while optimizing the three-dimensional (3D) deployment of UAVs, including their flying height and horizontal positions. Although the formulated optimization is a mixed integer nonlinear programmming, we convert it to a convex problem and develop a successive convex approximation (SCA) based algorithm to effectively solve it. The simulation results show that the joint optimization of the horizontal and the vertical position of a group of UAVs can achieve better performance than the traditional algorithms.

29 citations


Journal ArticleDOI
TL;DR: This work uses shared experience to train a policy for a given number of pursuers, executed independently by each agent at run-time, for pursuing an omnidirectional target with multiple, homogeneous agents that are subject to unicycle kinematic constraints.
Abstract: Pursuit-evasion is the problem of capturing mobile targets with one or more pursuers. We use deep reinforcement learning for pursuing an omni-directional target with multiple, homogeneous agents that are subject to unicycle kinematic constraints. We use shared experience to train a policy for a given number of pursuers that is executed independently by each agent at run-time. The training benefits from curriculum learning, a sweeping-angle ordering to locally represent neighboring agents and encouraging good formations with reward structure that combines individual and group rewards. Simulated experiments with a reactive evader and up to eight pursuers show that our learning-based approach, with non-holonomic agents, performs on par with classical algorithms with omni-directional agents, and outperforms their non-holonomic adaptations. The learned policy is successfully transferred to the real world in a proof-of-concept demonstration with three motion-constrained pursuer drones.

27 citations


Posted Content
TL;DR: In this paper, the authors study the effects of campaigning, where the society is partitioned into voter clusters and a diffusion process propagates opinions in a network connecting the clusters, and show that computing the cheapest campaign for rigging a given election can usually be done efficiently, even with arbitrarily many voters.
Abstract: We study the effects of campaigning, where the society is partitioned into voter clusters and a diffusion process propagates opinions in a network connecting the clusters. Our model is very powerful and can incorporate many campaigning actions, various partitions of the society into clusters, and very general diffusion processes. Perhaps surprisingly, we show that computing the cheapest campaign for rigging a given election can usually be done efficiently, even with arbitrarily-many voters. Moreover, we report on certain computational simulations.

Posted ContentDOI
TL;DR: The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies and independencies, allowing the team-level task to be decomposed into sub-tasks for individual agents, and provides a natural approach to decentralized learning.
Abstract: In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies and independencies, allowing the team-level task to be decomposed into sub-tasks for individual agents. We define such a notion of RM decomposition and present algorithmically verifiable conditions guaranteeing that distributed completion of the sub-tasks leads to team behavior accomplishing the original task. This framework for task decomposition provides a natural approach to decentralized learning: agents may learn to accomplish their sub-tasks while observing only their local state and abstracted representations of their teammates. We accordingly propose a decentralized q-learning algorithm. Furthermore, in the case of undiscounted rewards, we use local value functions to derive lower and upper bounds for the global value function corresponding to the team task. Experimental results in three discrete settings exemplify the effectiveness of the proposed RM decomposition approach, which converges to a successful team policy two orders of magnitude faster than a centralized learner and significantly outperforms hierarchical and independent q-learning approaches.

Posted Content
TL;DR: This paper designs a principled test for detecting strategic behaviour, designs an experiment that elicits strategic behaviour from subjects and releases a dataset of patterns of strategic behaviour that may be of independent interest, and proves that the test has strong false alarm guarantees.
Abstract: We consider the issue of strategic behaviour in various peer-assessment tasks, including peer grading of exams or homeworks and peer review in hiring or promotions. When a peer-assessment task is competitive (e.g., when students are graded on a curve), agents may be incentivized to misreport evaluations in order to improve their own final standing. Our focus is on designing methods for detection of such manipulations. Specifically, we consider a setting in which agents evaluate a subset of their peers and output rankings that are later aggregated to form a final ordering. In this paper, we investigate a statistical framework for this problem and design a principled test for detecting strategic behaviour. We prove that our test has strong false alarm guarantees and evaluate its detection ability in practical settings. For this, we design and execute an experiment that elicits strategic behaviour from subjects and release a dataset of patterns of strategic behaviour that may be of independent interest. We then use the collected data to conduct a series of real and semi-synthetic evaluations that demonstrate a strong detection power of our test.

Posted Content
TL;DR: This work details the main characteristics of the included agent platforms, together with links to specific projects where they have been used and classifies the agent platforms as general purpose ones, free or commercial, and specialized ones, which can be used for particular types of applications.
Abstract: Agent-based computing is an active field of research with the goal of building autonomous software of hardware entities. This task is often facilitated by the use of dedicated, specialized frameworks. For almost thirty years, many such agent platforms have been developed. Meanwhile, some of them have been abandoned, others continue their development and new platforms are released. This paper presents a up-to-date review of the existing agent platforms and also a historical perspective of this domain. It aims to serve as a reference point for people interested in developing agent systems. This work details the main characteristics of the included agent platforms, together with links to specific projects where they have been used. It distinguishes between the active platforms and those no longer under development or with unclear status. It also classifies the agent platforms as general purpose ones, free or commercial, and specialized ones, which can be used for particular types of applications.

Journal ArticleDOI
TL;DR: In this paper, a heterogeneous multi-agent actor critic (H-MAAC) is proposed as a paradigm for joint collaboration in the investigated MEC systems, where edge devices and center controller learn the interactive strategies through their own observations.
Abstract: As an emerging technique, mobile edge computing (MEC) introduces a new processing scheme for various distributed communication-computing systems such as industrial Internet of Things (IoT), vehicular communication, smart city, etc. In this work, we mainly focus on the timeliness of the MEC systems where the freshness of the data and computation tasks is significant. Firstly, we formulate a kind of age-sensitive MEC models and define the average age of information (AoI) minimization problems of interests. Then, a novel policy based multi-agent deep reinforcement learning (RL) framework, called heterogeneous multi-agent actor critic (H-MAAC), is proposed as a paradigm for joint collaboration in the investigated MEC systems, where edge devices and center controller learn the interactive strategies through their own observations. To improves the system performance, we develop the corresponding online algorithm by introducing an edge federated learning mode into the multi-agent cooperation whose advantages on learning convergence can be guaranteed theoretically. To the best of our knowledge, it's the first joint MEC collaboration algorithm that combines the edge federated mode with the multi-agent actor-critic reinforcement learning. Furthermore, we evaluate the proposed approach and compare it with classical RL based methods. As a result, the proposed framework not only outperforms the baseline on average system age, but also promotes the stability of training process. Besides, the simulation results provide some innovative perspectives for the system design under the edge federated collaboration.

Journal ArticleDOI
TL;DR: The general problems associated with coordination of autonomous vehicles are introduced by identifying and framing the key classes of coordination problems and the different approaches that can be adopted to deal with such problems are overviewed.
Abstract: In the near future, our streets will be populated by myriads of autonomous self-driving vehicles to serve our diverse mobility needs. This will raise the need to coordinate their movements in order to properly handle both access to shared resources (e.g., intersections and parking slots) and the execution of mobility tasks (e.g., platooning and ramp merging). In this paper, we firstly introduce the general issues associated to coordination of autonomous vehicles, by identifying and framing the key classes of coordination problems. Following, we overview the different approaches that can be adopted to manage such coordination problems, by classifying them in terms of the degree of autonomy in decision making that is left to autonomous vehicles during coordination. Finally, we overview some further peculiar challenges that research will have to address before autonomously coordinated vehicles can safely hit our streets.

Posted Content
TL;DR: New algorithms for each type of game are introduced and demonstrate their superior performance over state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.
Abstract: Mean field theory provides an effective way of scaling multiagent reinforcement learning algorithms to environments with many agents that can be abstracted by a virtual mean agent. In this paper, we extend mean field multiagent algorithms to multiple types. The types enable the relaxation of a core assumption in mean field games, which is that all agents in the environment are playing almost similar strategies and have the same goal. We conduct experiments on three different testbeds for the field of many agent reinforcement learning, based on the standard MAgents framework. We consider two different kinds of mean field games: a) Games where agents belong to predefined types that are known a priori and b) Games where the type of each agent is unknown and therefore must be learned based on observations. We introduce new algorithms for each type of game and demonstrate their superior performance over state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.

Proceedings ArticleDOI
TL;DR: This work introduces BARK, an open-source behavior benchmarking environment designed to mitigate the shortcomings of current behavior models and shows that BARK provides a suitable framework for a systematic development of behavior models.
Abstract: Predicting and planning interactive behaviors in complex traffic situations presents a challenging task. Especially in scenarios involving multiple traffic participants that interact densely, autonomous vehicles still struggle to interpret situations and to eventually achieve their own mission goal. As driving tests are costly and challenging scenarios are hard to find and reproduce, simulation is widely used to develop, test, and benchmark behavior models. However, most simulations rely on datasets and simplistic behavior models for traffic participants and do not cover the full variety of real-world, interactive human behaviors. In this work, we introduce BARK, an open-source behavior benchmarking environment designed to mitigate the shortcomings stated above. In BARK, behavior models are (re-)used for planning, prediction, and simulation. A range of models is currently available, such as Monte-Carlo Tree Search and Reinforcement Learning-based behavior models. We use a public dataset and sampling-based scenario generation to show the inter-exchangeability of behavior models in BARK. We evaluate how well the models used cope with interactions and how robust they are towards exchanging behavior models. Our evaluation shows that BARK provides a suitable framework for a systematic development of behavior models.

Posted Content
TL;DR: A dynamic, demand aware, and pricing-based vehicle-passenger matching and route planning framework that dynamically generates optimal routes for each vehicle based on online demand, pricing associated with each ride, vehicle capacities and locations is presented.
Abstract: Significant development of ride-sharing services presents a plethora of opportunities to transform urban mobility by providing personalized and convenient transportation while ensuring efficiency of large-scale ride pooling. However, a core problem for such services is route planning for each driver to fulfill the dynamically arriving requests while satisfying given constraints. Current models are mostly limited to static routes with only two rides per vehicle (optimally) or three (with heuristics). In this paper, we present a dynamic, demand aware, and pricing-based vehicle-passenger matching and route planning framework that (1) dynamically generates optimal routes for each vehicle based on online demand, pricing associated with each ride, vehicle capacities and locations. This matching algorithm starts greedily and optimizes over time using an insertion operation, (2) involves drivers in the decision-making process by allowing them to propose a different price based on the expected reward for a particular ride as well as the destination locations for future rides, which is influenced by supply-and demand computed by the Deep Q-network, (3) allows customers to accept or reject rides based on their set of preferences with respect to pricing and delay windows, vehicle type and carpooling preferences, and (4) based on demand prediction, our approach re-balances idle vehicles by dispatching them to the areas of anticipated high demand using deep Reinforcement Learning (RL). Our framework is validated using the New York City Taxi public dataset; however, we consider different vehicle types and designed customer utility functions to validate the setup and study different settings. Experimental results show the effectiveness of our approach in real-time and large scale settings.

Posted Content
TL;DR: This letter presents a resilient mechanism to allocate heterogeneous robots to tasks under difficult environmental conditions such as weather events or adversarial attacks, and relaxes the resource constraints corresponding to some tasks, thus exhibiting a graceful degradation of performance.
Abstract: For a multi-robot system equipped with heterogeneous capabilities, this paper presents a mechanism to allocate robots to tasks in a resilient manner when anomalous environmental conditions such as weather events or adversarial attacks affect the performance of robots within the tasks. Our primary objective is to ensure that each task is assigned the requisite level of resources, measured as the aggregated capabilities of the robots allocated to the task. By keeping track of task performance deviations under external perturbations, our framework quantifies the extent to which robot capabilities (e.g., visual sensing or aerial mobility) are affected by environmental conditions. This enables an optimization-based framework to flexibly reallocate robots to tasks based on the most degraded capabilities within each task. In the face of resource limitations and adverse environmental conditions, our algorithm minimally relaxes the resource constraints corresponding to some tasks, thus exhibiting a graceful degradation of performance. Simulated experiments in a multi-robot coverage and target tracking scenario demonstrate the efficacy of the proposed approach.

Posted Content
TL;DR: For ABM to be most effective in the field it should be used as a means for answering questions normally inaccessible to the traditional epidemiological toolkit, and why simulations are essential to the study of complex systems theory.
Abstract: Today's most troublesome population health challenges are often driven by social and environmental determinants, which are difficult to model using traditional epidemiological methods. We agree with those who have argued for the wider adoption of agent-based modelling (ABM) in taking on these challenges. However, while ABM has been used occasionally in population health, we argue that for ABM to be most effective in the field it should be used as a means for answering questions normally inaccessible to the traditional epidemiological toolkit. In an effort to clearly illustrate the utility of ABM for population health research, and to clear up persistent misunderstandings regarding the method's conceptual underpinnings, we offer a detailed presentation of the core concepts of complex systems theory, and summarise why simulations are essential to the study of complex systems. We then examine the current state of the art in ABM for population health, and propose they are well-suited for the study of the `wicked' problems in population health, and could make significant contributions to theory and intervention development in these areas.

Posted Content
TL;DR: This work proposes hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous that removes the need for a centralized operator in multiagent systems by combining model- based RL and inference methods, enabling agents to dynamically align plans.
Abstract: Collaboration requires agents to align their goals on the fly. Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans. We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous. Starting with pretrained, single-agent point to point navigation policies and using noisy, high-dimensional sensor inputs like lidar, we first learn via self-supervision motion predictions of all agents on the team. Next, HPP uses the prediction models to propose and evaluate navigation subgoals for completing the rendezvous task without explicit communication among agents. We evaluate HPP in a suite of unseen environments, with increasing complexity and numbers of obstacles. We show that HPP outperforms alternative reinforcement learning, path planning, and heuristic-based baselines on challenging, unseen environments. Experiments in the real world demonstrate successful transfer of the prediction models from sim to real world without any additional fine-tuning. Altogether, HPP removes the need for a centralized operator in multiagent systems by combining model-based RL and inference methods, enabling agents to dynamically align plans.

Proceedings ArticleDOI
TL;DR: It is believed that the model proposed is at the same time simple, rich, and modular, affording mathematical characterization of the interplay between bias, underlying opinion dynamics, and social structure in a unified setting.
Abstract: We investigate opinion dynamics in multi-agent networks when a bias toward one of two possible opinions exists; for example, reflecting a status quo vs a superior alternative. Starting with all agents sharing an initial opinion representing the status quo, the system evolves in steps. In each step, one agent selected uniformly at random adopts the superior opinion with some probability $\alpha$, and with probability $1 - \alpha$ it follows an underlying update rule to revise its opinion on the basis of those held by its neighbors. We analyze convergence of the resulting process under two well-known update rules, namely majority and voter. The framework we propose exhibits a rich structure, with a non-obvious interplay between topology and underlying update rule. For example, for the voter rule we show that the speed of convergence bears no significant dependence on the underlying topology, whereas the picture changes completely under the majority rule, where network density negatively affects convergence. We believe that the model we propose is at the same time simple, rich, and modular, affording mathematical characterization of the interplay between bias, underlying opinion dynamics, and social structure in a unified setting.

Posted Content
TL;DR: A masking heuristic is developed that allows training on smaller problems with few pursuers-targets and execution on much larger problems and is discussed how it enables a hedging behavior between pursuers that leads to a weak form of cooperation in spite of completely decentralized control execution.
Abstract: This paper develops a stochastic Multi-Agent Reinforcement Learning (MARL) method to learn control policies that can handle an arbitrary number of external agents; our policies can be executed for tasks consisting of 1000 pursuers and 1000 evaders. We model pursuers as agents with limited on-board sensing and formulate the problem as a decentralized, partially-observable Markov Decision Process. An attention mechanism is used to build a permutation and input-size invariant embedding of the observations for learning a stochastic policy and value function using techniques in entropy-regularized off-policy methods. Simulation experiments on a large number of problems show that our control policies are dramatically scalable and display cooperative behavior in spite of being executed in a decentralized fashion; our methods offer a simple solution to classical multi-agent problems using techniques in reinforcement learning.

Posted Content
TL;DR: A microscopic approach to model epidemics, which can explicitly consider the consequences of individual's decisions on the spread of the disease, and shows that there are negative externalities in the sense that infected agents do not have enough incentives to protect others, which necessitates external interventions to regulate agents' behaviors.
Abstract: This paper introduces a microscopic approach to model epidemics, which can explicitly consider the consequences of individual's decisions on the spread of the disease. We first formulate a microscopic multi-agent epidemic model where every agent can choose its activity level that affects the spread of the disease. Then by minimizing agents' cost functions, we solve for the optimal decisions for individual agents in the framework of game theory and multi-agent reinforcement learning. Given the optimal decisions of all agents, we can make predictions about the spread of the disease. We show that there are negative externalities in the sense that infected agents do not have enough incentives to protect others, which then necessitates external interventions to regulate agents' behaviors. In the discussion section, future directions are pointed out to make the model more realistic.

Journal ArticleDOI
TL;DR: A resilient distributed diffusion algorithm is presented that is resilient to any data falsification attack in which the number of compromised agents in the local neighborhood of a normal agent is bounded and the proposed algorithm guarantees that all normal agents converge to their true target states if appropriate parameters are selected.
Abstract: In this paper, we study resilient distributed diffusion for multi-task estimation in the presence of adversaries where networked agents must estimate distinct but correlated states of interest by processing streaming data. We show that in general diffusion strategies are not resilient to malicious agents that do not adhere to the diffusion-based information processing rules. In particular, by exploiting the adaptive weights used for diffusing information, we develop time-dependent attack models that drive normal agents to converge to states selected by the attacker. We show that an attacker that has complete knowledge of the system can always drive its targeted agents to its desired estimates. Moreover, an attacker that does not have complete knowledge of the system including streaming data of targeted agents or the parameters they use in diffusion algorithms, can still be successful in deploying an attack by approximating the needed information. The attack models can be used for both stationary and non-stationary state estimation.In addition, we present and analyze a resilient distributed diffusion algorithm that is resilient to any data falsification attack in which the number of compromised agents in the local neighborhood of a normal agent is bounded. The proposed algorithm guarantees that all normal agents converge to their true target states if appropriate parameters are selected. We also analyze trade-off between the resilience of distributed diffusion and its performance in terms of steady-state mean-square-deviation (MSD) from the correct estimates. Finally, we evaluate the proposed attack models and resilient distributed diffusion algorithm using stationary and non-stationary multi-target localization.

Posted Content
TL;DR: This paper introduces the Cyber-Physical Mobility Lab, an open-source development environment for networked and autonomous vehicles with focus on networked decision-making, trajectory planning, and control, and a four-layered architecture that enables the seamless use of the same software in simulations and in experiments without any further adaptions.
Abstract: This paper introduces our Cyber-Physical Mobility Lab (CPM Lab). It is an open-source development environment for networked and autonomous vehicles with focus on networked decision-making, trajectory planning, and control. The CPM Lab hosts 20 physical model-scale vehicles ({\mu}Cars) which we can seamlessly extend by unlimited simulated vehicles. The code and construction plans are publicly available to enable rebuilding the CPM Lab. Our four-layered architecture enables the seamless use of the same software in simulations and in experiments without any further adaptions. A Data Distribution Service (DDS) based middleware allows adapting the number of vehicles during experiments in a seamless manner. The middleware is also responsible for synchronizing all entities following a logical execution time approach to achieve determinism and reproducibility of experiments. This approach makes the CPM Lab a unique platform for rapid functional prototyping of networked decision-making algorithms. The CPM Lab allows researchers as well as students from different disciplines to see their ideas developing into reality. We demonstrate its capabilities using two example experiments. We are working on a remote access to the CPM Lab via a webinterface.

Posted Content
TL;DR: An agent-based social simulation tool, ASSOCC, is proposed that supports decision makers understand possible consequences of policy interventions, but exploring the combined social, health and economic consequences of these interventions.
Abstract: During the COVID-19 crisis there have been many difficult decisions governments and other decision makers had to make. E.g. do we go for a total lock down or keep schools open? How many people and which people should be tested? Although there are many good models from e.g. epidemiologists on the spread of the virus under certain conditions, these models do not directly translate into the interventions that can be taken by government. Neither can these models contribute to understand the economic and/or social consequences of the interventions. However, effective and sustainable solutions need to take into account this combination of factors. In this paper, we propose an agent-based social simulation tool, ASSOCC, that supports decision makers understand possible consequences of policy interventions, bu exploring the combined social, health and economic consequences of these interventions.

Proceedings ArticleDOI
TL;DR: A greedy solution approach in which the agents negotiate with each other to find a mutually agreeable task assignment profile based on evaluations of the task utilities that reflect their current states is proposed.
Abstract: We propose a decentralized game-theoretic framework for dynamic task allocation problems for multi-agent systems. In our problem formulation, the agents' utilities depend on both the rewards and the costs associated with the successful completion of the tasks assigned to them. The rewards reflect how likely is for the agents to accomplish their assigned tasks whereas the costs reflect the effort needed to complete these tasks (this effort is determined by the solution of corresponding optimal control problems). The task allocation problem considered herein corresponds to a dynamic game whose solution depends on the states of the agents in contrast with classic static (or single-act) game formulations. We propose a greedy solution approach in which the agents negotiate with each other to find a mutually agreeable (or individually rational) task assignment profile based on evaluations of the task utilities that reflect their current states. We illustrate the main ideas of this work by means of extensive numerical simulations.

Posted Content
TL;DR: A Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution and outperforms state-of-the-art multi-agent imitation learning methods.
Abstract: In multi-agent systems, complex interacting behaviors arise due to the high correlations among agents. However, previous work on modeling multi-agent interactions from demonstrations is primarily constrained by assuming the independence among policies and their reward structures. In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents' policies, which can recover agents' policies that can regenerate similar interactions. Consequently, we develop a Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution. Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods. Our code is available at \url{this https URL}.