scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Computer Science and Game Theory in 2020"


Journal ArticleDOI
TL;DR: Numerical results from extensive simulations show that the proposed deep-learning-based approach provides effective battery charging control in multi-drone scenarios.
Abstract: State-of-the-art drone technologies have severe flight time limitations due to weight constraints, which inevitably lead to a relatively small amount of available energy. Therefore, frequent battery replacement or recharging is necessary in applications such as delivery, exploration, or support to the wireless infrastructure. Mobile charging stations (i.e., mobile stations with charging equipment) for outdoor ad-hoc battery charging is one of the feasible solutions to address this issue. However, the ability of these platforms to charge the drones is limited in terms of the number and charging time. This paper designs an auction-based mechanism to control the charging schedule in multi-drone setting. In this paper, charging time slots are auctioned, and their assignment is determined by a bidding process. The main challenge in developing this framework is the lack of prior knowledge on the distribution of the number of drones participating in the auction. Based on optimal second-price-auction, the proposed formulation, then, relies on deep learning algorithms to learn such distribution online. Numerical results from extensive simulations show that the proposed deep learning-based approach provides effective battery charging control in multi-drone scenarios.

83 citations


Posted Content
TL;DR: Results show ReBeL leads to low exploitability in benchmark imperfect-information games and achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.
Abstract: The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.

58 citations


Journal ArticleDOI
TL;DR: In this article, the authors study the strategic properties of Knapsack Voting and show that it is strategy-proof under a natural model of utility (a dis-utility given by the distance between the outcome and the true preference of the voter), and partially strategyproof under general additive utilities.
Abstract: We address the question of aggregating the preferences of voters in the context of participatory budgeting. We scrutinize the voting method currently used in practice, underline its drawbacks, and introduce a novel scheme tailored to this setting, which we call "Knapsack Voting". We study its strategic properties - we show that it is strategy-proof under a natural model of utility (a dis-utility given by the $\ell_1$ distance between the outcome and the true preference of the voter), and "partially" strategy-proof under general additive utilities. We extend Knapsack Voting to more general settings with revenues, deficits or surpluses, and prove a similar strategy-proofness result. To further demonstrate the applicability of our scheme, we discuss its implementation on the digital voting platform that we have deployed in partnership with the local government bodies in many cities across the nation. From voting data thus collected, we present empirical evidence that Knapsack Voting works well in practice.

57 citations


Posted Content
TL;DR: This work designs a randomized truthful mechanism with strong ex-post guarantees that satisfies many desired fairness properties, such as being envy-free up to any item (EFX), and maximizing the Nash Social Welfare (NSW).
Abstract: We consider the problem of allocating a set on indivisible items to players with private preferences in an efficient and fair way. We focus on valuations that have dichotomous marginals, in which the added value of any item to a set is either 0 or 1, and aim to design truthful allocation mechanisms (without money) that maximize welfare and are fair. For the case that players have submodular valuations with dichotomous marginals, we design such a deterministic truthful allocation mechanism. The allocation output by our mechanism is Lorenz dominating, and consequently satisfies many desired fairness properties, such as being envy-free up to any item (EFX), and maximizing the Nash Social Welfare (NSW). We then show that our mechanism with random priorities is envy-free ex-ante, while having all the above properties ex-post. Furthermore, we present several impossibility results precluding similar results for the larger class of XOS valuations. To gauge the robustness of our positive results, we also study $\epsilon$-dichotomous valuations, in which the added value of any item to a set is either non-positive, or in the range $[1, 1 + \epsilon]$. We show several impossibility results in this setting, and also a positive result: for players that have additive $\epsilon$-dichotomous valuations with sufficiently small $\epsilon$, we design a randomized truthful mechanism with strong ex-post guarantees. For $\rho = \frac{1}{1 + \epsilon}$, the allocations that it produces generate at least a $\rho$-fraction of the maximum welfare, and enjoy $\rho$-approximations for various fairness properties, such as being envy-free up to one item (EF1), and giving each player at least her maximin share.

54 citations


Journal ArticleDOI
TL;DR: It is argued that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results and that decision-making under uncertainty and imperfect information is a unique forte of established game-theoretic modelling.
Abstract: Once a viable vaccine for SARS-CoV-2 has been identified, vaccination uptake will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programs for the best possible results. In the months following the introduction of vaccines, their availability and the human resources needed to run the vaccination programs will likely be scarce in many countries. Vaccine hesitancy can also be expected from some sections of the general public. We emphasize that decision making under uncertainty and imperfect information, and with only conditionally optimal outcomes, is a unique forte of established game theoretic modelling. Therefore, we can use this approach to obtain the best framework for modelling and simulating vaccination prioritisation and uptake that will be readily available to inform important policy decisions for the optimal control of the COVID-19 pandemic.

51 citations


Posted Content
TL;DR: EIP-1559 is a proposal to make several tightly coupled additions to Ethereum's transaction fee mechanism, including variable-size blocks and a burned base fee that rises and falls with demand.
Abstract: EIP-1559 is a proposal to make several tightly coupled additions to Ethereum's transaction fee mechanism, including variable-size blocks and a burned base fee that rises and falls with demand. This report assesses the game-theoretic strengths and weaknesses of the proposal and explores some alternative designs.

42 citations


Posted Content
TL;DR: This paper generalizes existing results of Poincare recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings, and investigates how adapting the reward of the game can give strong convergence guarantees in monotone games.
Abstract: In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincare recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergence guarantees in monotone games. We continue by showing how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium. Finally, we show how these insights can be directly used to build state-of-the-art model-free algorithms for zero-sum two-player Imperfect Information Games (IIG).

40 citations


Posted Content
TL;DR: This work establishes maximum Nash welfare as the ultimate allocation rule in the realm of binary additive preferences and proves that fractional MNW -- known to be group strategyproof, envy-free, and Pareto optimal -- can be implemented as a distribution over deterministic MNW allocations, which are envy- free up to one good.
Abstract: We study fair allocation of indivisible goods among agents. Prior research focuses on additive agent preferences, which leads to an impossibility when seeking truthfulness, fairness, and efficiency. We show that when agents have binary additive preferences, a compelling rule -- maximum Nash welfare (MNW) -- provides all three guarantees. Specifically, we show that deterministic MNW with lexicographic tie-breaking is group strategyproof in addition to being envy-free up to one good and Pareto optimal. We also prove that fractional MNW -- known to be group strategyproof, envy-free, and Pareto optimal -- can be implemented as a distribution over deterministic MNW allocations, which are envy-free up to one good. Our work establishes maximum Nash welfare as the ultimate allocation rule in the realm of binary additive preferences.

40 citations


Posted Content
TL;DR: It is shown that determining the existence of an envy-free allocation is NP-complete even when agents have binary additive valuations, and a polynomial-time algorithm is provided for computing an allocation that satisfies envy-freeness up to one chore (EF1) under monotone valuations.
Abstract: We study the fair allocation of undesirable indivisible items, or chores. While the case of desirable indivisible items (or goods) is extensively studied, with many results known for different notions of fairness, less is known about the fair division of chores. We study the envy-free division of chores, and make three contributions. First, we show that determining the existence of an envy-free allocation is NP-complete, even in the simple case when agents have binary additive valuations. Second, we provide a polynomial-time algorithm for computing an allocation that satisfies envy-freeness up to one chore (EF1), correcting an existing proof in the literature. A straightforward modification of our algorithm can be used to compute an EF1 allocation for doubly monotone instances (wherein each agent can partition the set of items into objective goods and objective chores). Our third result applies to a mixed resources model consisting of indivisible items and a divisible, undesirable heterogeneous resource (i.e., a bad cake). We show that there always exists an allocation that satisfies envy-freeness for mixed resources (EFM) in this setting, complementing a recent result of Bei et al. (Art. Int. 2021) for indivisible goods and divisible cake.

39 citations


Posted Content
TL;DR: A novel lemma about matching voters to candidates is proved, which is referred to as the ranking-matching lemma, and a new randomized algorithm is introduced with improved distortion compared to known results, and improved lower bounds on the distortion of all deterministic and randomized algorithms are provided.
Abstract: We study the following metric distortion problem: there are two finite sets of points, $V$ and $C$, that lie in the same metric space, and our goal is to choose a point in $C$ whose total distance from the points in $V$ is as small as possible. However, rather than having access to the underlying distance metric, we only know, for each point in $V$, a ranking of its distances to the points in $C$. We propose algorithms that choose a point in $C$ using only these rankings as input and we provide bounds on their \emph{distortion} (worst-case approximation ratio). A prominent motivation for this problem comes from voting theory, where $V$ represents a set of voters, $C$ represents a set of candidates, and the rankings correspond to ordinal preferences of the voters. A major conjecture in this framework is that the optimal deterministic algorithm has distortion $3$. We resolve this conjecture by providing a polynomial-time algorithm that achieves distortion $3$, matching a known lower bound. We do so by proving a novel lemma about matching voters to candidates, which we refer to as the \emph{ranking-matching lemma}. This lemma induces a family of novel algorithms, which may be of independent interest, and we show that a special algorithm in this family achieves distortion $3$. We also provide more refined, parameterized, bounds using the notion of $\alpha$-decisiveness, which quantifies the extent to which a voter may prefer her top choice relative to all others. Finally, we introduce a new randomized algorithm with improved distortion compared to known results, and also provide improved lower bounds on the distortion of all deterministic and randomized algorithms.

33 citations


Posted Content
TL;DR: This survey summarizes the current understanding of ABC rules from the viewpoint of computational social choice, with main focus on axiomatic analysis, algorithmic results, and relevant applications.
Abstract: Approval-based committee (ABC) rules are voting rules that output a fixed-size subset of candidates, a so-called committee. ABC rules select committees based on dichotomous preferences, i.e., a voter either approves or disapproves a candidate. This simple type of preferences makes ABC rules widely suitable for practical use. In this survey, we summarize the current understanding of ABC rules from the viewpoint of computational social choice. The main focus is on axiomatic analysis, algorithmic results, and relevant applications.

Posted Content
TL;DR: The position that building fair decision-making systems requires overcoming limitations which, it is argued, are inherent to each field is developed, and an encompassing framework that cohesively bridges the individual frameworks of mechanism design and machine learning is built.
Abstract: Decision-making systems increasingly orchestrate our world: how to intervene on the algorithmic components to build fair and equitable systems is therefore a question of utmost importance; one that is substantially complicated by the context-dependent nature of fairness and discrimination. Modern decision-making systems that involve allocating resources or information to people (e.g., school choice, advertising) incorporate machine-learned predictions in their pipelines, raising concerns about potential strategic behavior or constrained allocation, concerns usually tackled in the context of mechanism design. Although both machine learning and mechanism design have developed frameworks for addressing issues of fairness and equity, in some complex decision-making systems, neither framework is individually sufficient. In this paper, we develop the position that building fair decision-making systems requires overcoming these limitations which, we argue, are inherent to each field. Our ultimate objective is to build an encompassing framework that cohesively bridges the individual frameworks of mechanism design and machine learning. We begin to lay the ground work towards this goal by comparing the perspective each discipline takes on fair decision-making, teasing out the lessons each field has taught and can teach the other, and highlighting application domains that require a strong collaboration between these disciplines.

Posted Content
TL;DR: P2SRO is introduced, the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games and is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.
Abstract: Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$ P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots Experiment code is available athttps://githubcom/JBLanier/pipeline-psro

Posted Content
TL;DR: It is shown that in the model-based and model-free cases (without knowledge of agent payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.
Abstract: We present fictitious play dynamics for stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves players forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response using estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

Posted Content
TL;DR: Ex-ante group fairness is able to be achieved, which generalizes both envy-freeness and Pareto optimality, in conjunction with two ex-post fairness properties that are incomparable but are both implied by EF1: proportionality up to one good or Prop1 and envy- freeness up toone good more-and-less.
Abstract: We study the problem of allocating indivisible goods among agents with additive valuations. When randomization is allowed, it is possible to achieve compelling notions of fairness such as envy-freeness, which states that no agent should prefer any other agent's allocation to her own. When allocations must be deterministic, achieving exact fairness is impossible but approximate notions such as envy-freeness up to one good can be guaranteed. Our goal in this work is to achieve both simultaneously, by constructing a randomized allocation that is exactly fair ex-ante and approximately fair ex-post. The key question we address is whether ex-ante envy-freeness can be achieved in combination with ex-post envy-freeness up to one good. We settle this positively by designing an efficient algorithm that achieves both properties simultaneously. If we additionally require economic efficiency, we obtain an impossibility result. However, we show that economic efficiency and ex-ante envy-freeness can be simultaneously achieved if we slightly relax our ex-post fairness guarantee. On our way, we characterize the well-known Maximum Nash Welfare allocation rule in terms of a recently introduced fairness guarantee that applies to groups of agents, not just individuals.

Posted Content
TL;DR: This work derives exact expected MSE values for problems in linear regression and mean estimation and uses these values to analyze the resulting game in the framework of hedonic game theory; it constructively shows that there always exists a stable partition of players into coalitions.
Abstract: Federated learning is a setting where agents, each with access to their own data source, combine models from local data to create a global model. If agents are drawing their data from different distributions, though, federated learning might produce a biased global model that is not optimal for each agent. This means that agents face a fundamental question: should they choose the global model or their local model? We show how this situation can be naturally analyzed through the framework of coalitional game theory. We propose the following game: there are heterogeneous players with different model parameters governing their data distribution and different amounts of data they have noisily drawn from their own distribution. Each player's goal is to obtain a model with minimal expected mean squared error (MSE) on their own distribution. They have a choice of fitting a model based solely on their own data, or combining their learned parameters with those of some subset of the other players. Combining models reduces the variance component of their error through access to more data, but increases the bias because of the heterogeneity of distributions. Here, we derive exact expected MSE values for problems in linear regression and mean estimation. We then analyze the resulting game in the framework of hedonic game theory; we study how players might divide into coalitions, where each set of players within a coalition jointly construct model(s). We analyze three methods of federation, modeling differing degrees of customization. In uniform federation, the agents collectively produce a single model. In coarse-grained federation, each agent can weight the global model together with their local model. In fine-grained federation, each agent can flexibly combine models from all other agents in the federation. For each method, we analyze the stable partitions of players into coalitions.

Posted Content
TL;DR: In this article, the authors proposed an evaluation problem where the inputs are controlled by strategic individuals who can modify their features at a cost, and the goal is to design an evaluation mechanism that maximizes the overall quality score, i.e., welfare, in the population, taking any strategic updating into account.
Abstract: Motivated by applications such as college admission and insurance rate determination, we propose an evaluation problem where the inputs are controlled by strategic individuals who can modify their features at a cost. A learner can only partially observe the features, and aims to classify individuals with respect to a quality score. The goal is to design an evaluation mechanism that maximizes the overall quality score, i.e., welfare, in the population, taking any strategic updating into account. We further study the algorithmic aspect of finding the welfare maximizing evaluation mechanism under two specific settings in our model. When scores are linear and mechanisms use linear scoring rules on the observable features, we show that the optimal evaluation mechanism is an appropriate projection of the quality score. When mechanisms must use linear thresholds, we design a polynomial time algorithm with a (1/4)-approximation guarantee when the underlying feature distribution is sufficiently smooth and admits an oracle for finding dense regions. We extend our results to settings where the prior distribution is unknown and must be learned from samples.

Posted Content
TL;DR: It is shown that an optimal ex ante persuasive signaling scheme can be computed in polynomial time when players are symmetric and have affine cost functions, even in non-Bayesian settings, and that symmetry is a crucial property for its solution.
Abstract: Network congestion games are a well-understood model of multi-agent strategic interactions. Despite their ubiquitous applications, it is not clear whether it is possible to design information structures to ameliorate the overall experience of the network users. We focus on Bayesian games with atomic players, where network vagaries are modeled via a (random) state of nature which determines the costs incurred by the players. A third-party entity---the sender---can observe the realized state of the network and exploit this additional information to send a signal to each player. A natural question is the following: is it possible for an informed sender to reduce the overall social cost via the strategic provision of information to players who update their beliefs rationally? The paper focuses on the problem of computing optimal ex ante persuasive signaling schemes, showing that symmetry is a crucial property for its solution. Indeed, we show that an optimal ex ante persuasive signaling scheme can be computed in polynomial time when players are symmetric and have affine cost functions. Moreover, the problem becomes NP-hard when players are asymmetric, even in non-Bayesian settings.

Posted Content
TL;DR: This paper resolves the price of two well-studied fairness notions for the allocation of indivisible goods: envy-freeness up to one good (EF1), and approximate maximin share (MMS).
Abstract: In the allocation of resources to a set of agents, how do fairness guarantees impact the social welfare? A quantitative measure of this impact is the price of fairness, which measures the worst-case loss of social welfare due to fairness constraints. While initially studied for divisible goods, recent work on the price of fairness also studies the setting of indivisible goods. In this paper, we resolve the price of two well-studied fairness notions for the allocation of indivisible goods: envy-freeness up to one good (EF1), and approximate maximin share (MMS). For both EF1 and 1/2-MMS guarantees, we show, via different techniques, that the price of fairness is $O(\sqrt{n})$, where $n$ is the number of agents. From previous work, it follows that our bounds are tight. Our bounds are obtained via efficient algorithms. For 1/2-MMS, our bound holds for additive valuations, whereas for EF1, our bound holds for the more general class of subadditive valuations. This resolves an open problem posed by Bei et al. (2019).

Posted Content
TL;DR: It is demonstrated that permutation-equivariant architectures are not only capable of recovering previous results, they also have better generalization properties, which is not possible with the previous architecture.
Abstract: Designing an incentive compatible auction that maximizes expected revenue is a central problem in Auction Design. Theoretical approaches to the problem have hit some limits in the past decades and analytical solutions are known for only a few simple settings. Computational approaches to the problem through the use of LPs have their own set of limitations. Building on the success of deep learning, a new approach was recently proposed by Duetting et al. (2019) in which the auction is modeled by a feed-forward neural network and the design problem is framed as a learning problem. The neural architectures used in that work are general purpose and do not take advantage of any of the symmetries the problem could present, such as permutation equivariance. In this work, we consider auction design problems that have permutation-equivariant symmetry and construct a neural architecture that is capable of perfectly recovering the permutation-equivariant optimal mechanism, which we show is not possible with the previous architecture. We demonstrate that permutation-equivariant architectures are not only capable of recovering previous results, they also have better generalization properties.

Posted Content
TL;DR: A unifying game-theoretic model that can model uncertainty over attacker types and the nuances of an MTD system and a Bayesian Strong Stackelberg Q-learning approach that can learn the optimal movement policy for BSMGs within a reasonable time are proposed.
Abstract: The field of cybersecurity has mostly been a cat-and-mouse game with the discovery of new attacks leading the way. To take away an attacker's advantage of reconnaissance, researchers have proposed proactive defense methods such as Moving Target Defense (MTD). To find good movement strategies, researchers have modeled MTD as leader-follower games between the defender and a cyber-adversary. We argue that existing models are inadequate in sequential settings when there is incomplete information about a rational adversary and yield sub-optimal movement strategies. Further, while there exists an array of work on learning defense policies in sequential settings for cyber-security, they are either unpopular due to scalability issues arising out of incomplete information or tend to ignore the strategic nature of the adversary simplifying the scenario to use single-agent reinforcement learning techniques. To address these concerns, we propose (1) a unifying game-theoretic model, called the Bayesian Stackelberg Markov Games (BSMGs), that can model uncertainty over attacker types and the nuances of an MTD system and (2) a Bayesian Strong Stackelberg Q-learning (BSS-Q) approach that can, via interaction, learn the optimal movement policy for BSMGs within a reasonable time. We situate BSMGs in the landscape of incomplete-information Markov games and characterize the notion of Strong Stackelberg Equilibrium (SSE) in them. We show that our learning approach converges to an SSE of a BSMG and then highlight that the learned movement policy (1) improves the state-of-the-art in MTD for web-application security and (2) converges to an optimal policy in MTD domains with incomplete information about adversaries even when prior information about rewards and transitions is absent.

Posted Content
TL;DR: Polynomial-time algorithms for the fair and efficient allocation of indivisible goods among agents that have subadditive valuations over the goods and approximation guarantees are essentially tight for XOS and, hence, subadditives valuations are developed.
Abstract: We develop polynomial-time algorithms for the fair and efficient allocation of indivisible goods among $n$ agents that have subadditive valuations over the goods. We first consider the Nash social welfare as our objective and design a polynomial-time algorithm that, in the value oracle model, finds an $8n$-approximation to the Nash optimal allocation. Subadditive valuations include XOS (fractionally subadditive) and submodular valuations as special cases. Our result, even for the special case of submodular valuations, improves upon the previously best known $O(n \log n)$-approximation ratio of Garg et al. (2020). More generally, we study maximization of $p$-mean welfare. The $p$-mean welfare is parameterized by an exponent term $p \in (-\infty, 1]$ and encompasses a range of welfare functions, such as social welfare ($p = 1$), Nash social welfare ($p \to 0$), and egalitarian welfare ($p \to -\infty$). We give an algorithm that, for subadditive valuations and any given $p \in (-\infty, 1]$, computes (in the value oracle model and in polynomial time) an allocation with $p$-mean welfare at least $\frac{1}{8n}$ times the optimal. Further, we show that our approximation guarantees are essentially tight for XOS and, hence, subadditive valuations. We adapt a result of Dobzinski et al. (2010) to show that, under XOS valuations, an $O \left(n^{1-\varepsilon} \right)$ approximation for the $p$-mean welfare for any $p \in (-\infty,1]$ (including the Nash social welfare) requires exponentially many value queries; here, $\varepsilon>0$ is any fixed constant.

Posted Content
TL;DR: In this paper, the authors consider the classic problem of fairly allocating indivisible goods among agents with additive valuation functions and explore the connection between two prominent fairness notions: maximum Nash welfare and envy-freeness up to any good (EFX).
Abstract: We consider the classic problem of fairly allocating indivisible goods among agents with additive valuation functions and explore the connection between two prominent fairness notions: maximum Nash welfare (MNW) and envy-freeness up to any good (EFX). We establish that an MNW allocation is always EFX as long as there are at most two possible values for the goods, whereas this implication is no longer true for three or more distinct values. As a notable consequence, this proves the existence of EFX allocations for these restricted valuation functions. While the efficient computation of an MNW allocation for two possible values remains an open problem, we present a novel algorithm for directly constructing EFX allocations in this setting. Finally, we study the question of whether an MNW allocation implies any EFX guarantee for general additive valuation functions under a natural new interpretation of approximate EFX allocations.

Posted Content
TL;DR: This study proposes a game-theoretic framework to study the strategic behaviors of agents taking part in cross-chain atomic swaps implemented with HTLCs, and demonstrates that both agents might decide not to follow the protocol in an attempt to exploit price variations so as to maximize one's own utility.
Abstract: To achieve interoperability between unconnected ledgers, hash time lock contracts (HTLCs) are commonly used for cross-chain asset exchange. The solution tolerates transaction failure, and can "make the best out of worst" by allowing transacting agents to at least keep their original assets in case of an abort. Nonetheless, as an undesired outcome, reoccurring transaction failures prompt a critical and analytical examination of the protocol. In this study, we propose a game-theoretic framework to study the strategic behaviors of agents taking part in cross-chain atomic swaps implemented with HTLCs. We study the success rate of the transaction as a function of the exchange rate of the swap, the token price and its volatility, among other variables. We demonstrate that in an attempt to maximize one's own utility as asset price changes, either agent might withdraw from the swap. An extension of our model confirms that collateral deposits can improve the transaction success rate, motivating further research towards collateralization without a trusted third party. A second model variation suggests that a swap is more likely to succeed when agents dynamically adjust the exchange rate in response to price fluctuations.

Posted Content
TL;DR: It is proved that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality.
Abstract: Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.

Posted Content
TL;DR: These trust-based strategies can outcompete strategies that are always conditional, such as Tit-for-Tat, when the opportunity cost is non-negligible, and are expected to be used more frequently in interactions with intelligent agents.
Abstract: The actions of intelligent agents, such as chatbots, recommender systems, and virtual assistants are typically not fully transparent to the user. Consequently, using such an agent involves the user exposing themselves to the risk that the agent may act in a way opposed to the user's goals. It is often argued that people use trust as a cognitive shortcut to reduce the complexity of such interactions. Here we formalise this by using the methods of evolutionary game theory to study the viability of trust-based strategies in repeated games. These are reciprocal strategies that cooperate as long as the other player is observed to be cooperating. Unlike classic reciprocal strategies, once mutual cooperation has been observed for a threshold number of rounds they stop checking their co-player's behaviour every round, and instead only check with some probability. By doing so, they reduce the opportunity cost of verifying whether the action of their co-player was actually cooperative. We demonstrate that these trust-based strategies can outcompete strategies that are always conditional, such as Tit-for-Tat, when the opportunity cost is non-negligible. We argue that this cost is likely to be greater when the interaction is between people and intelligent agents, because of the reduced transparency of the agent. Consequently, we expect people to use trust-based strategies more frequently in interactions with intelligent agents. Our results provide new, important insights into the design of mechanisms for facilitating interactions between humans and intelligent agents, where trust is an essential factor.

Posted Content
TL;DR: A multi-agent variant of the classical multi-armed bandit problem, in which there are N agents and K arms, and pulling an arm generates a (possibly different) stochastic reward to each agent, using the Nash social welfare as the notion of fairness.
Abstract: We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are N agents and K arms, and pulling an arm generates a (possibly different) stochastic reward to each agent. Unlike the classical multi-armed bandit problem, the goal is not to learn the "best arm", as each agent may perceive a different arm as best for her. Instead, we seek to learn a fair distribution over arms. Drawing on a long line of research in economics and computer science, we use the Nash social welfare as our notion of fairness. We design multi-agent variants of three classic multi-armed bandit algorithms, and show that they achieve sublinear regret, now measured in terms of the Nash social welfare.

Posted Content
TL;DR: P predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms (CFR+, DCFR, LCFR) across all games but two of the poker games and Liar's Dice, sometimes by two or more orders of magnitude.
Abstract: Blackwell approachability is a framework for reasoning about repeated games with vector-valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the next payoff vector is given, and the decision maker tries to achieve better performance based on the accuracy of that estimator. In order to derive algorithms that achieve predictive Blackwell approachability, we start by showing a powerful connection between four well-known algorithms. Follow-the-regularized-leader (FTRL) and online mirror descent (OMD) are the most prevalent regret minimizers in online convex optimization. In spite of this prevalence, the regret matching (RM) and regret matching+ (RM+) algorithms have been preferred in the practice of solving large-scale games (as the local regret minimizers within the counterfactual regret minimization framework). We show that RM and RM+ are the algorithms that result from running FTRL and OMD, respectively, to select the halfspace to force at all times in the underlying Blackwell approachability game. By applying the predictive variants of FTRL or OMD to this connection, we obtain predictive Blackwell approachability algorithms, as well as predictive variants of RM and RM+. In experiments across 18 common zero-sum extensive-form benchmark games, we show that predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms (CFR+, DCFR, LCFR) across all games but two of the poker games and Liar's Dice, sometimes by two or more orders of magnitude.

Posted Content
TL;DR: A polynomial-time algorithm that computes an ex-ante envy-free lottery over envy- free up to one item (EF1) deterministic allocations and answers a question raised by Freeman, Shah, and Vaish whether the outcome of the probabilistic serial rule can be implemented by ex-post EF1 allocations.
Abstract: We present a polynomial-time algorithm that computes an ex-ante envy-free lottery over envy-free up to one item (EF1) deterministic allocations. It has the following advantages over a recently proposed algorithm: it does not rely on the linear programming machinery including separation oracles; it is SD-efficient (both ex-ante and ex-post); and the ex-ante outcome is equivalent to the outcome returned by the well-known probabilistic serial rule. As a result, we answer a question raised by Freeman, Shah, and Vaish (2020) whether the outcome of the probabilistic serial rule can be implemented by ex-post EF1 allocations. In the light of a couple of impossibility results that we prove, our algorithm can be viewed as satisfying a maximal set of properties. Under binary utilities, our algorithm is also ex-ante group-strategyproof and ex-ante Pareto optimal. Finally, we also show that checking whether a given random allocation can be implemented by a lottery over EF1 and Pareto optimal allocations is NP-hard.

Posted Content
TL;DR: This work uses recent results in theoretical auction design to introduce a time-independent Lagrangian, which circumvents the need for an expensive hyper-parameter search, and provides a principled metric to compare the performance of two auctions.
Abstract: Designing an incentive compatible auction that maximizes expected revenue is a central problem in Auction Design. While theoretical approaches to the problem have hit some limits, a recent research direction initiated by Duetting et al. (2019) consists in building neural network architectures to find optimal auctions. We propose two conceptual deviations from their approach which result in enhanced performance. First, we use recent results in theoretical auction design (Rubinstein and Weinberg, 2018) to introduce a time-independent Lagrangian. This not only circumvents the need for an expensive hyper-parameter search (as in prior work), but also provides a principled metric to compare the performance of two auctions (absent from prior work). Second, the optimization procedure in previous work uses an inner maximization loop to compute optimal misreports. We amortize this process through the introduction of an additional neural network. We demonstrate the effectiveness of our approach by learning competitive or strictly improved auctions compared to prior work. Both results together further imply a novel formulation of Auction Design as a two-player game with stationary utility functions.