scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Multiagent Systems in 2017"


Posted Content
TL;DR: In this paper, the authors introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemma but also require agents to learn policies that implement their strategic intentions.
Abstract: Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

293 citations


Posted Content
TL;DR: This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind.
Abstract: The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research.

208 citations


Posted Content
TL;DR: A structured overview on the state of knowledge of collaborative vehicle routing is given and three major streams of research are identified: (i) centralized collaborative planning, ( ii) decentralized planning without auctions, and (ii) auction-based decentralized planning.
Abstract: In horizontal collaborations, carriers form coalitions in order to perform parts of their logistics operations jointly. By exchanging transportation requests among each other, they can operate more efficiently and in a more sustainable way. Collaborative vehicle routing has been extensively discussed in the literature. We identify three major streams of research: (i) centralized collaborative planning, (ii) decentralized planning without auctions, and (ii) auction-based decentralized planning. For each of them we give a structured overview on the state of knowledge and discuss future research directions.

155 citations


Posted Content
TL;DR: This paper investigates how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms, where the actors base their decisions only on locally sensed information and the critic is learned based on the true global state.
Abstract: In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.

89 citations


Posted Content
TL;DR: In this paper, Lenient-DQN (LDQN) is proposed to map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM.
Abstract: Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11]. In this work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic rewards. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.

72 citations


Journal ArticleDOI
TL;DR: The analysis first positions liquid democracy within the theory of binary aggregation, then focuses on two issues of the system: the occurrence of delegation cycles; and the effect of delegations on individual rationality when voting on logically interdependent propositions.
Abstract: The paper provides an analysis of the voting method known as delegable proxy voting, or liquid democracy. The analysis first positions liquid democracy within the theory of binary aggregation. It then focuses on two issues of the system: the occurrence of delegation cycles; and the effect of delegations on individual rationality when voting on logically interdependent propositions. It finally points to proposals on how the system may be modified in order to address the above issues.

45 citations


Posted Content
TL;DR: In this article, an actor-critic policy gradient algorithm is proposed to solve the problem of allocating impressions to sellers in e-commerce websites such as Amazon, eBay or Taobao, aiming to maximize the total revenue generated by the platform.
Abstract: We study the problem of allocating impressions to sellers in e-commerce websites, such as Amazon, eBay or Taobao, aiming to maximize the total revenue generated by the platform. We employ a general framework of reinforcement mechanism design, which uses deep reinforcement learning to design efficient algorithms, taking the strategic behaviour of the sellers into account. Specifically, we model the impression allocation problem as a Markov decision process, where the states encode the history of impressions, prices, transactions and generated revenue and the actions are the possible impression allocations in each round. To tackle the problem of continuity and high-dimensionality of states and actions, we adopt the ideas of the DDPG algorithm to design an actor-critic policy gradient algorithm which takes advantage of the problem domain in order to achieve convergence and stability. We evaluate our proposed algorithm, coined IA(GRU), by comparing it against DDPG, as well as several natural heuristics, under different rationality models for the sellers - we assume that sellers follow well-known no-regret type strategies which may vary in their degree of sophistication. We find that IA(GRU) outperforms all algorithms in terms of the total revenue.

37 citations


Posted Content
TL;DR: Experimental results show that the approximation algorithms often find near-optimal control strategies, indicating that election control through social influence is a salient threat to election integrity.
Abstract: Election control considers the problem of an adversary who attempts to tamper with a voting process, in order to either ensure that their favored candidate wins (constructive control) or another candidate loses (destructive control). As online social networks have become significant sources of information for potential voters, a new tool in an attacker's arsenal is to effect control by harnessing social influence, for example, by spreading fake news and other forms of misinformation through online social media. We consider the computational problem of election control via social influence, studying the conditions under which finding good adversarial strategies is computationally feasible. We consider two objectives for the adversary in both the constructive and destructive control settings: probability and margin of victory (POV and MOV, respectively). We present several strong negative results, showing, for example, that the problem of maximizing POV is inapproximable for any constant factor. On the other hand, we present approximation algorithms which provide somewhat weaker approximation guarantees, such as bicriteria approximations for the POV objective and constant-factor approximations for MOV. Finally, we present mixed integer programming formulations for these problems. Experimental results show that our approximation algorithms often find near-optimal control strategies, indicating that election control through social influence is a salient threat to election integrity.

31 citations


Posted Content
TL;DR: In this article, the authors consider open multi-agent systems where the interactions between agents lead to pairwise gossip averages and where agents either arrive or are replaced at random times, and describe the expected system behavior by showing that the evolution of scaled moments of the state can be characterized by a 2-dimensional linear dynamical system.
Abstract: We consider open multi-agent systems. Unlike the systems usually studied in the literature, here agents may join or leave while the process studied takes place. The system composition and size evolve thus with time. We focus here on systems where the interactions between agents lead to pairwise gossip averages, and where agents either arrive or are replaced at random times. These events prevent any convergence of the system. Instead, we describe the expected system behavior by showing that the evolution of scaled moments of the state can be characterized by a 2-dimensional (possibly time-varying) linear dynamical system. We apply this technique to two cases : (i) systems with fixed size where leaving agents are immediately replaced, and (ii) systems where new agents keep arriving without ever leaving, and whose size grows thus unbounded.

30 citations


Posted Content
TL;DR: There seems to be a need for an updated definition of the term "Organic Computing", of desired properties of technical, organic systems, and the objectives of the Organic Computing initiative.
Abstract: Organic Computing is an initiative in the field of systems engineering that proposed to make use of concepts such as self-adaptation and self-organisation to increase the robustness of technical systems. Based on the observation that traditional design and operation concepts reach their limits, transferring more autonomy to the systems themselves should result in a reduction of complexity for users, administrators, and developers. However, there seems to be a need for an updated definition of the term "Organic Computing", of desired properties of technical, organic systems, and the objectives of the Organic Computing initiative. With this article, we will address these points.

24 citations


Posted Content
TL;DR: This work proposes a proportional-fairness control strategy in which a subset of DERs decrease their own power output, sacrificing the individual revenue, and the DDERs in the subset are dynamically selected based on the record of their control history.
Abstract: Residential microgrids (MGs) may host a large number of Distributed Energy Resources (DERs). The strategy that maximizes the revenue for each individual DER is the one in which the DER operates at capacity, injecting all available power into the grid. However, when the DER penetration is high and the consumption low, this strategy may lead to power surplus that causes voltage increase over recommended limits. In order to create incentives for the DER to operate below capacity, we propose a proportional-fairness control strategy in which (i) a subset of DERs decrease their own power output, sacrificing the individual revenue, and (ii) the DERs in the subset are dynamically selected based on the record of their control history. The trustworthy implementation of the scheme is carried out through a custom-designed blockchain mechanism that maintains a distributed database trusted by all DERs. In particular, the blockchain is used to stipulate and store a smart contract that enforces proportional fairness. The simulation results verify the potential of the proposed framework.

Posted ContentDOI
TL;DR: A simulation framework called "RAWSim-O" is developed and a real-world application of this framework is shown by integrating simple robot prototypes based on vacuum cleaning robots into a new type of warehousing system, Robotic Mobile Fulfillment Systems (RMFS).
Abstract: This paper deals with a new type of warehousing system, Robotic Mobile Fulfillment Systems (RMFS). In such systems, robots are sent to carry storage units, so-called "pods", from the inventory and bring them to human operators working at stations. At the stations, the items are picked according to customers' orders. There exist new decision problems in such systems, for example, the reallocation of pods after their visits at work stations or the selection of pods to fulfill orders. In order to analyze decision strategies for these decision problems and relations between them, we develop a simulation framework called "RAWSim-O" in this paper. Moreover, we show a real-world application of our simulation framework by integrating simple robot prototypes based on vacuum cleaning robots.

Posted Content
TL;DR: In this paper, a supervised coordination scheme that overrides control inputs from human drivers when they would result in an unsafe or blocked situation is presented, where safe overriding controls are chosen while ensuring they deviate minimally from those originally requested by the drivers.
Abstract: Before reaching full autonomy, vehicles will gradually be equipped with more and more advanced driver assistance systems (ADAS), effectively rendering them semi-autonomous. However, current ADAS technologies seem unable to handle complex traffic situations, notably when dealing with vehicles arriving from the sides, either at intersections or when merging on highways. The high rate of accidents in these settings prove that they constitute difficult driving situations. Moreover, intersections and merging lanes are often the source of important traffic congestion and, sometimes, deadlocks. In this article, we propose a cooperative framework to safely coordinate semi-autonomous vehicles in such settings, removing the risk of collision or deadlocks while remaining compatible with human driving. More specifically, we present a supervised coordination scheme that overrides control inputs from human drivers when they would result in an unsafe or blocked situation. To avoid unnecessary intervention and remain compatible with human driving, overriding only occurs when collisions or deadlocks are imminent. In this case, safe overriding controls are chosen while ensuring they deviate minimally from those originally requested by the drivers. Simulation results based on a realistic physics simulator show that our approach is scalable to real-world scenarios, and computations can be performed in real-time on a standard computer for up to a dozen simultaneous vehicles.

Journal ArticleDOI
TL;DR: In this paper, an agent-based model of cultural evolution was used to investigate the effect of social regulation on the novelty-generating effects of creativity and the novelty preserving effects of imitation.
Abstract: Although creativity is encouraged in the abstract it is often discouraged in educational and workplace settings. Using an agent-based model of cultural evolution, we investigated the idea that tempering the novelty-generating effects of creativity with the novelty-preserving effects of imitation is beneficial for society. In Experiment One we systematically introduced individual differences in creativity, and observed a trade-off between the ratio of creators to imitators, and how creative the creators were. Excess creativity was detrimental because creators invested in unproven ideas at the expense of propagating proven ones. Experiment Two tested the hypothesis that society as a whole benefits if individuals adjust how creative they are in accordance with their creative success. When effective creators created more, and ineffective creators created less (social regulation), the agents segregated into creators and imitators, and the mean fitness of outputs was temporarily higher. We hypothesized that the temporary nature of the effect was due to a ceiling on output fitness. In Experiment Three we made the space of possible outputs open-ended by giving agents the capacity to chain simple outputs into arbitrarily complex ones such that fitter outputs were always possible. With the capacity for chained outputs, the effect of social regulation could indeed be maintained indefinitely. The results are discussed in light of empirical data.

Posted Content
TL;DR: The connection between computational social choice (comsoc) and computational complexity is discussed and benefits to complexity that have arisen from its use in comsoc are highlighted.
Abstract: We discuss the connection between computational social choice (comsoc) and computational complexity. We stress the work so far on, and urge continued focus on, two less-recognized aspects of this connection. Firstly, this is very much a two-way street: Everyone knows complexity classification is used in comsoc, but we also highlight benefits to complexity that have arisen from its use in comsoc. Secondly, more subtle, less-known complexity tools often can be very productively used in comsoc.

Journal ArticleDOI
TL;DR: In this paper, a game-theoretical autonomous decision-making framework is proposed to address a task allocation problem for a swarm of multiple agents, where cooperation of self-interested agents is considered, and a decentralized algorithm guarantees convergence of agents with social inhibition to a Nash stable partition within polynomial time.
Abstract: This paper proposes a novel game-theoretical autonomous decision-making framework to address a task allocation problem for a swarm of multiple agents. We consider cooperation of self-interested agents, and show that our proposed decentralized algorithm guarantees convergence of agents with social inhibition to a Nash stable partition (i.e., social agreement) within polynomial time. The algorithm is simple and executable based on local interactions with neighbor agents under a strongly-connected communication network and even in asynchronous environments. We analytically present a mathematical formulation for computing the lower bound of suboptimality of the solution, and additionally show that 50% of suboptimality can be at least guaranteed if social utilities are non-decreasing functions with respect to the number of co-working agents. The results of numerical experiments confirm that the proposed framework is scalable, fast adaptable against dynamical environments, and robust even in a realistic situation.

Journal ArticleDOI
TL;DR: This article presents the consensus of a saturated second order multi-agent system with non-switching dynamics that can be represented by a directed graph that is affected by data processing and communication time-delays that are assumed to be asynchronous.
Abstract: This article presents the consensus of a saturated second order multi-agent system with non-switching dynamics that can be represented by a directed graph. The system is affected by data processing (input delay) and communication time-delays that are assumed to be asynchronous. The agents have saturation nonlinearities, each of them is approximated into separate linear and nonlinear elements. Nonlinear elements are represented by describing functions. Describing functions and stability of linear elements are used to estimate the existence of limit cycles in the system with multiple control laws. Stability analysis of the linear element is performed using Lyapunov-Krasovskii functions and frequency domain analysis. A comparison of pros and cons of both the analyses with respect to time-delay ranges, applicability and computation complexity is presented. Simulation and corresponding hardware implementation results are demonstrated to support theoretical results.

Posted Content
TL;DR: This two-part paper considers strategic topology switching for security in the second-order multi-agent system and proposes an attack detection algorithm based on the Luenberger observer, using the characterized detectability condition of zero-dynamics attack.
Abstract: This two-part paper considers strategic topology switching for security in the second-order multi-agent system. In Part II, we propose a strategy on switching topologies to detect zero-dynamics attack (ZDA), whose attack-starting time is allowed to be not the initial time. We first characterize the sufficient and necessary condition for detectability of ZDA, in terms of the network topologies to be switched to and the set of agents to be monitored. We then propose an attack detection algorithm based on the Luenberger observer, using the characterized detectability condition. Employing the strategy on switching times proposed in Part I and the strategy on switching topologies proposed here, a strategic topology-switching algorithm is derived. Its primary advantages are threefold: (i) in achieving consensus in the absence of attacks, the control protocol does not need velocity measurements and the algorithm has no constraint on the magnitudes of coupling weights; (ii) in tracking system in the absence of attacks, the Luenberger observer has no constraint on the magnitudes of observer gains and the number of monitored agents, i.e., only one monitored agent's output is sufficient; (iii) in detecting ZDA, the algorithm allows the defender to have no knowledge of the attack-starting time and the number of misbehaving agents (i.e., agents under attack). Simulations are provided to verify the effectiveness of the strategic topology-switching algorithm.

Posted Content
TL;DR: In this paper, the authors propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment, which can also transmit task-specific information, such as the shortest distance and direction to a desired target.
Abstract: Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

Posted Content
TL;DR: It is shown that after a certain threshold percentage of AVs the differences between AV and no AV lane scenarios become negligible and the introduction of an AV lane is not beneficial in terms of average commute time.
Abstract: The introduction of autonomous vehicles (AVs) will have far-reaching effects on road traffic in cities and on highways.The implementation of automated highway system (AHS), possibly with a dedicated lane only for AVs, is believed to be a requirement to maximise the benefit from the advantages of AVs. We study the ramifications of an increasing percentage of AVs on the traffic system with and without the introduction of a dedicated AV lane on highways. We conduct an analytical evaluation of a simplified scenario and a macroscopic simulation of the city of Singapore under user equilibrium conditions with a realistic traffic demand. We present findings regarding average travel time, fuel consumption, throughput and road usage. Instead of only considering the highways, we also focus on the effects on the remaining road network. Our results show a reduction of average travel time and fuel consumption as a result of increasing the portion of AVs in the system. We show that the introduction of an AV lane is not beneficial in terms of average commute time. Examining the effects of the AV population only, however, the AV lane provides a considerable reduction of travel time (approx. 25%) at the price of delaying conventional vehicles (approx. 7%). Furthermore a notable shift of travel demand away from the highways towards major and small roads is noticed in early stages of AV penetration of the system. Finally, our findings show that after a certain threshold percentage of AVs the differences between AV and no AV lane scenarios become negligible.

Posted Content
TL;DR: A multi-agent simulation model in which pedestrian positions are updated at discrete time intervals is proposed, which takes into account the major normal conditions of a simple pedestrian situated in a crowd such as preferences, realistic perception of environment, etc.
Abstract: The simulation of pedestrian crowd that reflects reality is a major challenge for researches. Several crowd simulation models have been proposed such as cellular automata model, agent-based model, fluid dynamic model, etc. It is important to note that agent-based model is able, over others approaches, to provide a natural description of the system and then to capture complex human behaviors. In this paper, we propose a multi-agent simulation model in which pedestrian positions are updated at discrete time intervals. It takes into account the major normal conditions of a simple pedestrian situated in a crowd such as preferences, realistic perception of environment, etc. Our objective is to simulate the pedestrian crowd realistically towards a simulation of believable pedestrian behaviors. Typical pedestrian phenomena, including the unidirectional and bidirectional movement in a corridor as well as the flow through bottleneck, are simulated. The conducted simulations show that our model is able to produce realistic pedestrian behaviors. The obtained fundamental diagram and flow rate at bottleneck agree very well with classic conclusions and empirical study results. It is hoped that the idea of this study may be helpful in promoting the modeling and simulation of pedestrian crowd in a simple way.

Posted Content
TL;DR: This work studies the problem of distributed maximum computation in an open multi-agent system, where agents can leave and arrive during the execution of the algorithm, and provides algorithms able to eventually compute the maximum of the values held by agents.
Abstract: We study the problem of distributed maximum computation in an open multi-agent system, where agents can leave and arrive during the execution of the algorithm. The main challenge comes from the possibility that the agent holding the largest value leaves the system, which changes the value to be computed. The algorithms must as a result be endowed with mechanisms allowing to forget outdated information. The focus is on systems in which interactions are pairwise gossips between randomly selected agents. We consider situations where leaving agents can send a last message, and situations where they cannot. For both cases, we provide algorithms able to eventually compute the maximum of the values held by agents.

Posted Content
TL;DR: Micro and macro level original analyses on data characterising pedestrian behaviour in presence of counter-flows and grouping suggest that the presence of dyads and their tendency to walk in a line-abreast formation influences the formation of lanes and, in turn, aggregated observables, such as overall specific flow.
Abstract: Although it is widely recognised that the presence of groups influences microscopic and aggregated pedestrian dynamics, a precise characterisation of the phenomenon still calls for evidences and insights. The present paper describes micro and macro level original analyses on data characterising pedestrian behaviour in presence of counter-flows and grouping, in particular dyads, acquired through controlled experiments. Results suggest that the presence of dyads and their tendency to walk in a line-abreast formation influences the formation of lanes and, in turn, aggregated observables, such as overall specific flow.

Posted Content
TL;DR: Technology is suggested, which combines decentralized P2P BOINC general-purpose computing tasks distribution, multiple-agents communication protocol and smart-contract based rewards, powered by Ethereum blockchain, which can be used as distributed P1P computing power market, protected from any central authority.
Abstract: Multi-agents systems communication is a technology, which provides a way for multiple interacting intelligent agents to communicate with each other and with environment. Multiple-agent systems are used to solve problems that are difficult for solving by individual agent. Multiple-agent communication technologies can be used for management and organization of computing fog and act as a global, distributed operating system. In present publication we suggest technology, which combines decentralized P2P BOINC general-purpose computing tasks distribution, multiple-agents communication protocol and smart-contract based rewards, powered by Ethereum blockchain. Such system can be used as distributed P2P computing power market, protected from any central authority. Such decentralized market can further be updated to system, which learns the most efficient way for software-hardware combinations usage and optimization. Once system learns to optimize software-hardware efficiency it can be updated to general-purpose distributed intelligence, which acts as combination of single-purpose AI.

Posted Content
TL;DR: LocDyn as mentioned in this paper is a convex relaxation method that takes advantage of previous estimates at each measurement acquisition step; the algorithm converges at an optimal rate for first order methods.
Abstract: How to self-localize large teams of underwater nodes using only noisy range measurements? How to do it in a distributed way, and incorporating dynamics into the problem? How to reject outliers and produce trustworthy position estimates? The stringent acoustic communication channel and the accuracy needs of our geophysical survey application demand faster and more accurate localization methods. We approach dynamic localization as a MAP estimation problem where the prior encodes dynamics, and we devise a convex relaxation method that takes advantage of previous estimates at each measurement acquisition step; The algorithm converges at an optimal rate for first order methods. LocDyn is distributed: there is no fusion center responsible for processing acquired data and the same simple computations are performed for each node. LocDyn is accurate: experiments attest to a smaller positioning error than a comparable Kalman filter. LocDyn is robust: it rejects outlier noise, while the comparing methods succumb in terms of positioning error.

Journal ArticleDOI
TL;DR: In this article, a task allocation problem for a large-scale robotic swarm, namely swarm distribution guidance problem, is addressed, where each agent requires only local consistency on information with neighboring agents, rather than the global consistency.
Abstract: This paper addresses a task allocation problem for a large-scale robotic swarm, namely swarm distribution guidance problem. Unlike most of the existing frameworks handling this problem, the proposed framework suggests utilising local information available to generate its time-varying stochastic policies. As each agent requires only local consistency on information with neighbouring agents, rather than the global consistency, the proposed framework offers various advantages, e.g., a shorter timescale for using new information and potential to incorporate an asynchronous decision-making process. We perform theoretical analysis on the properties of the proposed framework. From the analysis, it is proved that the framework can guarantee the convergence to the desired density distribution even using local information while maintaining advantages of global-information-based approaches. The design requirements for these advantages are explicitly listed in this paper. This paper also provides specific examples of how to implement the framework developed. The results of numerical experiments confirm the effectiveness and comparability of the proposed framework, compared with the global-information-based framework.

Posted Content
TL;DR: A stochastic interaction model was created using a multivariate Gaussian mixture model to simulate the movements of pedestrians reacting to an oncoming vehicle when approaching unsignalized crossings, and to evaluate the passing strategies of automated vehicles.
Abstract: Interactions between vehicles and pedestrians have always been a major problem in traffic safety. Experienced human drivers are able to analyze the environment and choose driving strategies that will help them avoid crashes. What is not yet clear, however, is how automated vehicles will interact with pedestrians. This paper proposes a new method for evaluating the safety and feasibility of the driving strategy of automated vehicles when encountering unsignalized crossings. MobilEye sensors installed on buses in Ann Arbor, Michigan, collected data on 2,973 valid crossing events. A stochastic interaction model was then created using a multivariate Gaussian mixture model. This model allowed us to simulate the movements of pedestrians reacting to an oncoming vehicle when approaching unsignalized crossings, and to evaluate the passing strategies of automated vehicles. A simulation was then conducted to demonstrate the evaluation procedure.

Posted Content
TL;DR: In this article, the authors show that deep reinforcement learning can be used instead of standard non-cooperative game theory to generate predictions for the tragedy of the commons in common-pool resource appropriation.
Abstract: Humanity faces numerous problems of common-pool resource appropriation. This class of multi-agent social dilemma includes the problems of ensuring sustainable use of fresh water, common fisheries, grazing pastures, and irrigation systems. Abstract models of common-pool resource appropriation based on non-cooperative game theory predict that self-interested agents will generally fail to find socially positive equilibria---a phenomenon called the tragedy of the commons. However, in reality, human societies are sometimes able to discover and implement stable cooperative solutions. Decades of behavioral game theory research have sought to uncover aspects of human behavior that make this possible. Most of that work was based on laboratory experiments where participants only make a single choice: how much to appropriate. Recognizing the importance of spatial and temporal resource dynamics, a recent trend has been toward experiments in more complex real-time video game-like environments. However, standard methods of non-cooperative game theory can no longer be used to generate predictions for this case. Here we show that deep reinforcement learning can be used instead. To that end, we study the emergent behavior of groups of independently learning agents in a partially observed Markov game modeling common-pool resource appropriation. Our experiments highlight the importance of trial-and-error learning in common-pool resource appropriation and shed light on the relationship between exclusion, sustainability, and inequality.

Posted Content
TL;DR: In this article, the authors present a framework where a virtual overlay multi-agent system can be used to validate simulation models, which can be applied to all kinds of agent-based models.
Abstract: Agent Based Models are very popular in a number of different areas. For example, they have been used in a range of domains ranging from modeling of tumor growth, immune systems, molecules to models of social networks, crowds and computer and mobile self-organizing networks. One reason for their success is their intuitiveness and similarity to human cognition. However, with this power of abstraction, in spite of being easily applicable to such a wide number of domains, it is hard to validate agent-based models. In addition, building valid and credible simulations is not just a challenging task but also a crucial exercise to ensure that what we are modeling is, at some level of abstraction, a model of our conceptual system; the system that we have in mind. In this paper, we address this important area of validation of agent based models by presenting a novel technique which has broad applicability and can be applied to all kinds of agent-based models. We present a framework, where a virtual overlay multi-agent system can be used to validate simulation models. In addition, since agent-based models have been typically growing, in parallel, in multiple domains, to cater for all of these, we present a new single validation technique applicable to all agent based models. Our technique, which allows for the validation of agent based simulations uses VOMAS: a Virtual Overlay Multi-agent System. This overlay multi-agent system can comprise various types of agents, which form an overlay on top of the agent based simulation model that needs to be validated. Other than being able to watch and log, each of these agents contains clearly defined constraints, which, if violated, can be logged in real time. To demonstrate its effectiveness, we show its broad applicability in a wide variety of simulation models ranging from social sciences to computer networks in spatial and non-spatial conceptual models.

Posted Content
TL;DR: The study of congestion problems is extended to a more realistic scenario, the Road Network Domain (RND), where the resources are no longer independent, but rather part of a network, thus choosing one path will also impact the load of another one having common road segments.
Abstract: Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains. In the context of Multi-agent Reinforcement Learning (MARL), approaches like difference rewards and resource abstraction have shown promising results in tackling such problems. Resource abstraction was shown to be an ideal candidate for solving large-scale resource allocation problems in a fully decentralized manner. However, its performance and applicability strongly depends on some, until now, undocumented assumptions. Two of the main congestion benchmark problems considered in the literature are: the Beach Problem Domain and the Traffic Lane Domain. In both settings the highest system utility is achieved when overcrowding one resource and keeping the rest at optimum capacity. We analyse how abstract grouping can promote this behaviour and how feasible it is to apply this approach in a real-world domain (i.e., what assumptions need to be satisfied and what knowledge is necessary). We introduce a new test problem, the Road Network Domain (RND), where the resources are no longer independent, but rather part of a network (e.g., road network), thus choosing one path will also impact the load on other paths having common road segments. We demonstrate the application of state-of-the-art MARL methods for this new congestion model and analyse their performance. RND allows us to highlight an important limitation of resource abstraction and show that the difference rewards approach manages to better capture and inform the agents about the dynamics of the environment.