Showing papers in "arXiv: Computer Science and Game Theory in 2020"

PDF

Open Access

Journal Article•DOI•

Auction-based Charging Scheduling with Deep Learning Framework for Multi-Drone Networks

[...]

MyungJae Shin¹, Joongheon Kim¹, Marco Levorato²•Institutions (2)

Chung-Ang University¹, University of California, Irvine²

09 Jan 2020-arXiv: Computer Science and Game Theory

TL;DR: Numerical results from extensive simulations show that the proposed deep-learning-based approach provides effective battery charging control in multi-drone scenarios.

...read moreread less

Abstract: State-of-the-art drone technologies have severe flight time limitations due to weight constraints, which inevitably lead to a relatively small amount of available energy. Therefore, frequent battery replacement or recharging is necessary in applications such as delivery, exploration, or support to the wireless infrastructure. Mobile charging stations (i.e., mobile stations with charging equipment) for outdoor ad-hoc battery charging is one of the feasible solutions to address this issue. However, the ability of these platforms to charge the drones is limited in terms of the number and charging time. This paper designs an auction-based mechanism to control the charging schedule in multi-drone setting. In this paper, charging time slots are auctioned, and their assignment is determined by a bidding process. The main challenge in developing this framework is the lack of prior knowledge on the distribution of the number of drones participating in the auction. Based on optimal second-price-auction, the proposed formulation, then, relies on deep learning algorithms to learn such distribution online. Numerical results from extensive simulations show that the proposed deep learning-based approach provides effective battery charging control in multi-drone scenarios.

...read moreread less

83 citations

Posted Content•

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

[...]

Noam Brown¹, Anton Bakhtin¹, Adam Lerer¹, Qucheng Gong¹•Institutions (1)

Facebook¹

27 Jul 2020-arXiv: Computer Science and Game Theory

TL;DR: Results show ReBeL leads to low exploitability in benchmark imperfect-information games and achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.

...read moreread less

Abstract: The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.

...read moreread less

58 citations

Journal Article•DOI•

Knapsack Voting for Participatory Budgeting

[...]

Ashish Goel¹, Anilesh K. Krishnaswamy¹, Sukolsak Sakshuwong¹, Tanja Aitamurto¹•Institutions (1)

Stanford University¹

15 Sep 2020-arXiv: Computer Science and Game Theory

TL;DR: In this article, the authors study the strategic properties of Knapsack Voting and show that it is strategy-proof under a natural model of utility (a dis-utility given by the distance between the outcome and the true preference of the voter), and partially strategyproof under general additive utilities.

...read moreread less

Abstract: We address the question of aggregating the preferences of voters in the context of participatory budgeting. We scrutinize the voting method currently used in practice, underline its drawbacks, and introduce a novel scheme tailored to this setting, which we call "Knapsack Voting". We study its strategic properties - we show that it is strategy-proof under a natural model of utility (a dis-utility given by the $\ell_1$ distance between the outcome and the true preference of the voter), and "partially" strategy-proof under general additive utilities. We extend Knapsack Voting to more general settings with revenues, deficits or surpluses, and prove a similar strategy-proofness result. To further demonstrate the applicability of our scheme, we discuss its implementation on the digital voting platform that we have deployed in partnership with the local government bodies in many cities across the nation. From voting data thus collected, we present empirical evidence that Knapsack Voting works well in practice.

...read moreread less

57 citations

Posted Content•

Fair and Truthful Mechanisms for Dichotomous Valuations

[...]

Moshe Babaioff¹, Tomer Ezra², Uriel Feige³•Institutions (3)

Microsoft¹, Tel Aviv University², Weizmann Institute of Science³

25 Feb 2020-arXiv: Computer Science and Game Theory

TL;DR: This work designs a randomized truthful mechanism with strong ex-post guarantees that satisfies many desired fairness properties, such as being envy-free up to any item (EFX), and maximizing the Nash Social Welfare (NSW).

...read moreread less

Abstract: We consider the problem of allocating a set on indivisible items to players with private preferences in an efficient and fair way. We focus on valuations that have dichotomous marginals, in which the added value of any item to a set is either 0 or 1, and aim to design truthful allocation mechanisms (without money) that maximize welfare and are fair. For the case that players have submodular valuations with dichotomous marginals, we design such a deterministic truthful allocation mechanism. The allocation output by our mechanism is Lorenz dominating, and consequently satisfies many desired fairness properties, such as being envy-free up to any item (EFX), and maximizing the Nash Social Welfare (NSW). We then show that our mechanism with random priorities is envy-free ex-ante, while having all the above properties ex-post. Furthermore, we present several impossibility results precluding similar results for the larger class of XOS valuations. To gauge the robustness of our positive results, we also study $\epsilon$-dichotomous valuations, in which the added value of any item to a set is either non-positive, or in the range $[1, 1 + \epsilon]$. We show several impossibility results in this setting, and also a positive result: for players that have additive $\epsilon$-dichotomous valuations with sufficiently small $\epsilon$, we design a randomized truthful mechanism with strong ex-post guarantees. For $\rho = \frac{1}{1 + \epsilon}$, the allocations that it produces generate at least a $\rho$-fraction of the maximum welfare, and enjoy $\rho$-approximations for various fairness properties, such as being envy-free up to one item (EF1), and giving each player at least her maximin share.

...read moreread less

54 citations

Journal Article•DOI•

Optimal governance and implementation of vaccination programs to contain the COVID-19 pandemic

[...]

Mahendra Piraveenan, Shailendra Sawleshwarkar, Michael Walsh, Iryna Zablotska, Samit Bhattacharyya, Habib Hassan Farooqui, Tarun Bhatnagar, Anup Karan, Manoj V Murhekar, Sanjay Zodpey, K. S. Mallikarjuna Rao, Philippa Pattison, Albert Y. Zomaya, Matjaž Perc - Show less +10 more

12 Nov 2020-arXiv: Computer Science and Game Theory

TL;DR: It is argued that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results and that decision-making under uncertainty and imperfect information is a unique forte of established game-theoretic modelling.

...read moreread less

Abstract: Once a viable vaccine for SARS-CoV-2 has been identified, vaccination uptake will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programs for the best possible results. In the months following the introduction of vaccines, their availability and the human resources needed to run the vaccination programs will likely be scarce in many countries. Vaccine hesitancy can also be expected from some sections of the general public. We emphasize that decision making under uncertainty and imperfect information, and with only conditionally optimal outcomes, is a unique forte of established game theoretic modelling. Therefore, we can use this approach to obtain the best framework for modelling and simulating vaccination prioritisation and uptake that will be readily available to inform important policy decisions for the optimal control of the COVID-19 pandemic.

...read moreread less

51 citations

Posted Content•

Transaction Fee Mechanism Design for the Ethereum Blockchain: An Economic Analysis of EIP-1559

[...]

Tim Roughgarden¹•Institutions (1)

Columbia University¹

01 Dec 2020-arXiv: Computer Science and Game Theory

TL;DR: EIP-1559 is a proposal to make several tightly coupled additions to Ethereum's transaction fee mechanism, including variable-size blocks and a burned base fee that rises and falls with demand.

...read moreread less

Abstract: EIP-1559 is a proposal to make several tightly coupled additions to Ethereum's transaction fee mechanism, including variable-size blocks and a burned base fee that rises and falls with demand. This report assesses the game-theoretic strengths and weaknesses of the proposal and explores some alternative designs.

...read moreread less

42 citations

Posted Content•

From Poincar\'e Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

[...]

Julien Perolat, Rémi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro A. Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot¹, Karl Tuyls - Show less +9 more•Institutions (1)

Google¹

19 Feb 2020-arXiv: Computer Science and Game Theory

TL;DR: This paper generalizes existing results of Poincare recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings, and investigates how adapting the reward of the game can give strong convergence guarantees in monotone games.

...read moreread less

Abstract: In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincare recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergence guarantees in monotone games. We continue by showing how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium. Finally, we show how these insights can be directly used to build state-of-the-art model-free algorithms for zero-sum two-player Imperfect Information Games (IIG).

...read moreread less

40 citations

Posted Content•

Fair Division with Binary Valuations: One Rule to Rule Them All

[...]

Daniel Halpern¹, Ariel D. Procaccia², Alexandros Psomas³, Nisarg Shah¹•Institutions (3)

University of Toronto¹, Harvard University², University of California, Berkeley³

12 Jul 2020-arXiv: Computer Science and Game Theory

TL;DR: This work establishes maximum Nash welfare as the ultimate allocation rule in the realm of binary additive preferences and proves that fractional MNW -- known to be group strategyproof, envy-free, and Pareto optimal -- can be implemented as a distribution over deterministic MNW allocations, which are envy- free up to one good.

...read moreread less

Abstract: We study fair allocation of indivisible goods among agents. Prior research focuses on additive agent preferences, which leads to an impossibility when seeking truthfulness, fairness, and efficiency. We show that when agents have binary additive preferences, a compelling rule -- maximum Nash welfare (MNW) -- provides all three guarantees. Specifically, we show that deterministic MNW with lexicographic tie-breaking is group strategyproof in addition to being envy-free up to one good and Pareto optimal. We also prove that fractional MNW -- known to be group strategyproof, envy-free, and Pareto optimal -- can be implemented as a distribution over deterministic MNW allocations, which are envy-free up to one good. Our work establishes maximum Nash welfare as the ultimate allocation rule in the realm of binary additive preferences.

...read moreread less

40 citations

Posted Content•

On Approximate Envy-Freeness for Indivisible Chores and Mixed Resources.

[...]

Umang Bhaskar¹, A. R. Sricharan², Rohit Vaish¹•Institutions (2)

Tata Institute of Fundamental Research¹, Chennai Mathematical Institute²

12 Dec 2020-arXiv: Computer Science and Game Theory

TL;DR: It is shown that determining the existence of an envy-free allocation is NP-complete even when agents have binary additive valuations, and a polynomial-time algorithm is provided for computing an allocation that satisfies envy-freeness up to one chore (EF1) under monotone valuations.

...read moreread less

Abstract: We study the fair allocation of undesirable indivisible items, or chores. While the case of desirable indivisible items (or goods) is extensively studied, with many results known for different notions of fairness, less is known about the fair division of chores. We study the envy-free division of chores, and make three contributions. First, we show that determining the existence of an envy-free allocation is NP-complete, even in the simple case when agents have binary additive valuations. Second, we provide a polynomial-time algorithm for computing an allocation that satisfies envy-freeness up to one chore (EF1), correcting an existing proof in the literature. A straightforward modification of our algorithm can be used to compute an EF1 allocation for doubly monotone instances (wherein each agent can partition the set of items into objective goods and objective chores). Our third result applies to a mixed resources model consisting of indivisible items and a divisible, undesirable heterogeneous resource (i.e., a bad cake). We show that there always exists an allocation that satisfies envy-freeness for mixed resources (EFM) in this setting, complementing a recent result of Bei et al. (Art. Int. 2021) for indivisible goods and divisible cake.

...read moreread less

39 citations

Posted Content•

Resolving the Optimal Metric Distortion Conjecture

[...]

Vasilis Gkatzelis¹, Daniel Halpern², Nisarg Shah²•Institutions (2)

Drexel University¹, University of Toronto²

16 Apr 2020-arXiv: Computer Science and Game Theory

TL;DR: A novel lemma about matching voters to candidates is proved, which is referred to as the ranking-matching lemma, and a new randomized algorithm is introduced with improved distortion compared to known results, and improved lower bounds on the distortion of all deterministic and randomized algorithms are provided.

...read moreread less

Abstract: We study the following metric distortion problem: there are two finite sets of points, $V$ and $C$, that lie in the same metric space, and our goal is to choose a point in $C$ whose total distance from the points in $V$ is as small as possible. However, rather than having access to the underlying distance metric, we only know, for each point in $V$, a ranking of its distances to the points in $C$. We propose algorithms that choose a point in $C$ using only these rankings as input and we provide bounds on their \emph{distortion} (worst-case approximation ratio). A prominent motivation for this problem comes from voting theory, where $V$ represents a set of voters, $C$ represents a set of candidates, and the rankings correspond to ordinal preferences of the voters. A major conjecture in this framework is that the optimal deterministic algorithm has distortion $3$. We resolve this conjecture by providing a polynomial-time algorithm that achieves distortion $3$, matching a known lower bound. We do so by proving a novel lemma about matching voters to candidates, which we refer to as the \emph{ranking-matching lemma}. This lemma induces a family of novel algorithms, which may be of independent interest, and we show that a special algorithm in this family achieves distortion $3$. We also provide more refined, parameterized, bounds using the notion of $\alpha$-decisiveness, which quantifies the extent to which a voter may prefer her top choice relative to all others. Finally, we introduce a new randomized algorithm with improved distortion compared to known results, and also provide improved lower bounds on the distortion of all deterministic and randomized algorithms.

...read moreread less

33 citations

Posted Content•

Approval-Based Committee Voting: Axioms, Algorithms, and Applications.

[...]

Martin Lackner, Piotr Skowron

03 Jul 2020-arXiv: Computer Science and Game Theory

TL;DR: This survey summarizes the current understanding of ABC rules from the viewpoint of computational social choice, with main focus on axiomatic analysis, algorithmic results, and relevant applications.

...read moreread less

Abstract: Approval-based committee (ABC) rules are voting rules that output a fixed-size subset of candidates, a so-called committee. ABC rules select committees based on dichotomous preferences, i.e., a voter either approves or disapproves a candidate. This simple type of preferences makes ABC rules widely suitable for practical use. In this survey, we summarize the current understanding of ABC rules from the viewpoint of computational social choice. The main focus is on axiomatic analysis, algorithmic results, and relevant applications.

...read moreread less

Posted Content•

Bridging Machine Learning and Mechanism Design towards Algorithmic Fairness

[...]

Jessie Finocchiaro¹, Roland Maio², Faidra Monachou³, Gourab K Patro⁴, Manish Raghavan⁵, Ana-Andreea Stoica², Stratis Tsirtsis⁶ - Show less +3 more•Institutions (6)

University of Colorado Boulder¹, Columbia University², Stanford University³, Indian Institute of Technology Kharagpur⁴, Cornell University⁵, Max Planck Society⁶

12 Oct 2020-arXiv: Computer Science and Game Theory

TL;DR: The position that building fair decision-making systems requires overcoming limitations which, it is argued, are inherent to each field is developed, and an encompassing framework that cohesively bridges the individual frameworks of mechanism design and machine learning is built.

...read moreread less

Abstract: Decision-making systems increasingly orchestrate our world: how to intervene on the algorithmic components to build fair and equitable systems is therefore a question of utmost importance; one that is substantially complicated by the context-dependent nature of fairness and discrimination. Modern decision-making systems that involve allocating resources or information to people (e.g., school choice, advertising) incorporate machine-learned predictions in their pipelines, raising concerns about potential strategic behavior or constrained allocation, concerns usually tackled in the context of mechanism design. Although both machine learning and mechanism design have developed frameworks for addressing issues of fairness and equity, in some complex decision-making systems, neither framework is individually sufficient. In this paper, we develop the position that building fair decision-making systems requires overcoming these limitations which, we argue, are inherent to each field. Our ultimate objective is to build an encompassing framework that cohesively bridges the individual frameworks of mechanism design and machine learning. We begin to lay the ground work towards this goal by comparing the perspective each discipline takes on fair decision-making, teasing out the lessons each field has taught and can teach the other, and highlighting application domains that require a strong collaboration between these disciplines.

...read moreread less

Posted Content•

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

[...]

Stephen McAleer¹, John B. Lanier¹, Roy Fox¹, Pierre Baldi¹•Institutions (1)

University of California, Irvine¹

15 Jun 2020-arXiv: Computer Science and Game Theory

TL;DR: P2SRO is introduced, the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games and is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.

...read moreread less

Abstract: Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$ P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots Experiment code is available athttps://githubcom/JBLanier/pipeline-psro

...read moreread less

Posted Content•

Fictitious play in zero-sum stochastic games.

[...]

Muhammed O. Sayin¹, Francesca Parise¹, Asuman Ozdaglar²•Institutions (2)

Massachusetts Institute of Technology¹, Cornell University²

08 Oct 2020-arXiv: Computer Science and Game Theory

TL;DR: It is shown that in the model-based and model-free cases (without knowledge of agent payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

...read moreread less

Abstract: We present fictitious play dynamics for stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves players forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response using estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

...read moreread less

Posted Content•

Best of Both Worlds: Ex-Ante and Ex-Post Fairness in Resource Allocation

[...]

Rupert Freeman¹, Nisarg Shah², Rohit Vaish³•Institutions (3)

Microsoft¹, University of Toronto², Rensselaer Polytechnic Institute³

28 May 2020-arXiv: Computer Science and Game Theory

TL;DR: Ex-ante group fairness is able to be achieved, which generalizes both envy-freeness and Pareto optimality, in conjunction with two ex-post fairness properties that are incomparable but are both implied by EF1: proportionality up to one good or Prop1 and envy- freeness up toone good more-and-less.

...read moreread less

Abstract: We study the problem of allocating indivisible goods among agents with additive valuations. When randomization is allowed, it is possible to achieve compelling notions of fairness such as envy-freeness, which states that no agent should prefer any other agent's allocation to her own. When allocations must be deterministic, achieving exact fairness is impossible but approximate notions such as envy-freeness up to one good can be guaranteed. Our goal in this work is to achieve both simultaneously, by constructing a randomized allocation that is exactly fair ex-ante and approximately fair ex-post. The key question we address is whether ex-ante envy-freeness can be achieved in combination with ex-post envy-freeness up to one good. We settle this positively by designing an efficient algorithm that achieves both properties simultaneously. If we additionally require economic efficiency, we obtain an impossibility result. However, we show that economic efficiency and ex-ante envy-freeness can be simultaneously achieved if we slightly relax our ex-post fairness guarantee. On our way, we characterize the well-known Maximum Nash Welfare allocation rule in terms of a recently introduced fairness guarantee that applies to groups of agents, not just individuals.

...read moreread less

Posted Content•

Model-sharing Games: Analyzing Federated Learning Under Voluntary Participation

[...]

Kate Donahue¹, Jon Kleinberg¹•Institutions (1)

Cornell University¹

02 Oct 2020-arXiv: Computer Science and Game Theory

TL;DR: This work derives exact expected MSE values for problems in linear regression and mean estimation and uses these values to analyze the resulting game in the framework of hedonic game theory; it constructively shows that there always exists a stable partition of players into coalitions.

...read moreread less

Abstract: Federated learning is a setting where agents, each with access to their own data source, combine models from local data to create a global model. If agents are drawing their data from different distributions, though, federated learning might produce a biased global model that is not optimal for each agent. This means that agents face a fundamental question: should they choose the global model or their local model? We show how this situation can be naturally analyzed through the framework of coalitional game theory. We propose the following game: there are heterogeneous players with different model parameters governing their data distribution and different amounts of data they have noisily drawn from their own distribution. Each player's goal is to obtain a model with minimal expected mean squared error (MSE) on their own distribution. They have a choice of fitting a model based solely on their own data, or combining their learned parameters with those of some subset of the other players. Combining models reduces the variance component of their error through access to more data, but increases the bias because of the heterogeneity of distributions. Here, we derive exact expected MSE values for problems in linear regression and mean estimation. We then analyze the resulting game in the framework of hedonic game theory; we study how players might divide into coalitions, where each set of players within a coalition jointly construct model(s). We analyze three methods of federation, modeling differing degrees of customization. In uniform federation, the agents collectively produce a single model. In coarse-grained federation, each agent can weight the global model together with their local model. In fine-grained federation, each agent can flexibly combine models from all other agents in the federation. For each method, we analyze the stable partitions of players into coalitions.

...read moreread less

Posted Content•

Maximizing Welfare with Incentive-Aware Evaluation Mechanisms

[...]

Nika Haghtalab¹, Nicole Immorlica¹, Brendan Lucier², Jack Z. Wang²•Institutions (2)

Cornell University¹, Microsoft²

03 Nov 2020-arXiv: Computer Science and Game Theory

TL;DR: In this article, the authors proposed an evaluation problem where the inputs are controlled by strategic individuals who can modify their features at a cost, and the goal is to design an evaluation mechanism that maximizes the overall quality score, i.e., welfare, in the population, taking any strategic updating into account.

...read moreread less

Abstract: Motivated by applications such as college admission and insurance rate determination, we propose an evaluation problem where the inputs are controlled by strategic individuals who can modify their features at a cost. A learner can only partially observe the features, and aims to classify individuals with respect to a quality score. The goal is to design an evaluation mechanism that maximizes the overall quality score, i.e., welfare, in the population, taking any strategic updating into account. We further study the algorithmic aspect of finding the welfare maximizing evaluation mechanism under two specific settings in our model. When scores are linear and mechanisms use linear scoring rules on the observable features, we show that the optimal evaluation mechanism is an appropriate projection of the quality score. When mechanisms must use linear thresholds, we design a polynomial time algorithm with a (1/4)-approximation guarantee when the underlying feature distribution is sufficiently smooth and admits an oracle for finding dense regions. We extend our results to settings where the prior distribution is unknown and must be learned from samples.

...read moreread less

Posted Content•

Signaling in Bayesian Network Congestion Games: the Subtle Power of Symmetry

[...]

Matteo Castiglioni¹, Andrea Celli¹, Alberto Marchesi¹, Nicola Gatti¹•Institutions (1)

Polytechnic University of Milan¹

12 Feb 2020-arXiv: Computer Science and Game Theory

TL;DR: It is shown that an optimal ex ante persuasive signaling scheme can be computed in polynomial time when players are symmetric and have affine cost functions, even in non-Bayesian settings, and that symmetry is a crucial property for its solution.

...read moreread less

Abstract: Network congestion games are a well-understood model of multi-agent strategic interactions. Despite their ubiquitous applications, it is not clear whether it is possible to design information structures to ameliorate the overall experience of the network users. We focus on Bayesian games with atomic players, where network vagaries are modeled via a (random) state of nature which determines the costs incurred by the players. A third-party entity---the sender---can observe the realized state of the network and exploit this additional information to send a signal to each player. A natural question is the following: is it possible for an informed sender to reduce the overall social cost via the strategic provision of information to players who update their beliefs rationally? The paper focuses on the problem of computing optimal ex ante persuasive signaling schemes, showing that symmetry is a crucial property for its solution. Indeed, we show that an optimal ex ante persuasive signaling scheme can be computed in polynomial time when players are symmetric and have affine cost functions. Moreover, the problem becomes NP-hard when players are asymmetric, even in non-Bayesian settings.

...read moreread less

Posted Content•

Optimal Bounds on the Price of Fairness for Indivisible Goods

[...]

Siddharth Barman¹, Umang Bhaskar², Nisarg Shah³•Institutions (3)

Indian Institute of Science¹, Tata Institute of Fundamental Research², University of Toronto³

13 Jul 2020-arXiv: Computer Science and Game Theory

TL;DR: This paper resolves the price of two well-studied fairness notions for the allocation of indivisible goods: envy-freeness up to one good (EF1), and approximate maximin share (MMS).

...read moreread less

Abstract: In the allocation of resources to a set of agents, how do fairness guarantees impact the social welfare? A quantitative measure of this impact is the price of fairness, which measures the worst-case loss of social welfare due to fairness constraints. While initially studied for divisible goods, recent work on the price of fairness also studies the setting of indivisible goods. In this paper, we resolve the price of two well-studied fairness notions for the allocation of indivisible goods: envy-freeness up to one good (EF1), and approximate maximin share (MMS). For both EF1 and 1/2-MMS guarantees, we show, via different techniques, that the price of fairness is $O(\sqrt{n})$, where $n$ is the number of agents. From previous work, it follows that our bounds are tight. Our bounds are obtained via efficient algorithms. For 1/2-MMS, our bound holds for additive valuations, whereas for EF1, our bound holds for the more general class of subadditive valuations. This resolves an open problem posed by Bei et al. (2019).

...read moreread less

Posted Content•

A Permutation-Equivariant Neural Network Architecture For Auction Design

[...]

Jad Rahme¹, Samy Jelassi², Joan Bruna³, S. Matthew Weinberg¹•Institutions (3)

Princeton University¹, New York University², Courant Institute of Mathematical Sciences³

02 Mar 2020-arXiv: Computer Science and Game Theory

TL;DR: It is demonstrated that permutation-equivariant architectures are not only capable of recovering previous results, they also have better generalization properties, which is not possible with the previous architecture.

...read moreread less

Abstract: Designing an incentive compatible auction that maximizes expected revenue is a central problem in Auction Design. Theoretical approaches to the problem have hit some limits in the past decades and analytical solutions are known for only a few simple settings. Computational approaches to the problem through the use of LPs have their own set of limitations. Building on the success of deep learning, a new approach was recently proposed by Duetting et al. (2019) in which the auction is modeled by a feed-forward neural network and the design problem is framed as a learning problem. The neural architectures used in that work are general purpose and do not take advantage of any of the symmetries the problem could present, such as permutation equivariance. In this work, we consider auction design problems that have permutation-equivariant symmetry and construct a neural architecture that is capable of perfectly recovering the permutation-equivariant optimal mechanism, which we show is not possible with the previous architecture. We demonstrate that permutation-equivariant architectures are not only capable of recovering previous results, they also have better generalization properties.

...read moreread less

Posted Content•

Multi-agent Reinforcement Learning in Bayesian Stackelberg Markov Games for Adaptive Moving Target Defense.

[...]

Sailik Sengupta, Subbarao Kambhampati¹•Institutions (1)

Arizona State University¹

20 Jul 2020-arXiv: Computer Science and Game Theory

TL;DR: A unifying game-theoretic model that can model uncertainty over attacker types and the nuances of an MTD system and a Bayesian Strong Stackelberg Q-learning approach that can learn the optimal movement policy for BSMGs within a reasonable time are proposed.

...read moreread less

Abstract: The field of cybersecurity has mostly been a cat-and-mouse game with the discovery of new attacks leading the way. To take away an attacker's advantage of reconnaissance, researchers have proposed proactive defense methods such as Moving Target Defense (MTD). To find good movement strategies, researchers have modeled MTD as leader-follower games between the defender and a cyber-adversary. We argue that existing models are inadequate in sequential settings when there is incomplete information about a rational adversary and yield sub-optimal movement strategies. Further, while there exists an array of work on learning defense policies in sequential settings for cyber-security, they are either unpopular due to scalability issues arising out of incomplete information or tend to ignore the strategic nature of the adversary simplifying the scenario to use single-agent reinforcement learning techniques. To address these concerns, we propose (1) a unifying game-theoretic model, called the Bayesian Stackelberg Markov Games (BSMGs), that can model uncertainty over attacker types and the nuances of an MTD system and (2) a Bayesian Strong Stackelberg Q-learning (BSS-Q) approach that can, via interaction, learn the optimal movement policy for BSMGs within a reasonable time. We situate BSMGs in the landscape of incomplete-information Markov games and characterize the notion of Strong Stackelberg Equilibrium (SSE) in them. We show that our learning approach converges to an SSE of a BSMG and then highlight that the learned movement policy (1) improves the state-of-the-art in MTD for web-application security and (2) converges to an optimal policy in MTD domains with incomplete information about adversaries even when prior information about rewards and transitions is absent.

...read moreread less

Posted Content•

Tight Approximation Algorithms for p-Mean Welfare Under Subadditive Valuations

[...]

Siddharth Barman¹, Umang Bhaskar², Anand Krishna¹, Ranjani G. Sundaram³•Institutions (3)

Indian Institute of Science¹, Tata Institute of Fundamental Research², Chennai Mathematical Institute³

15 May 2020-arXiv: Computer Science and Game Theory

TL;DR: Polynomial-time algorithms for the fair and efficient allocation of indivisible goods among agents that have subadditive valuations over the goods and approximation guarantees are essentially tight for XOS and, hence, subadditives valuations are developed.

...read moreread less

Abstract: We develop polynomial-time algorithms for the fair and efficient allocation of indivisible goods among $n$ agents that have subadditive valuations over the goods. We first consider the Nash social welfare as our objective and design a polynomial-time algorithm that, in the value oracle model, finds an $8n$-approximation to the Nash optimal allocation. Subadditive valuations include XOS (fractionally subadditive) and submodular valuations as special cases. Our result, even for the special case of submodular valuations, improves upon the previously best known $O(n \log n)$-approximation ratio of Garg et al. (2020). More generally, we study maximization of $p$-mean welfare. The $p$-mean welfare is parameterized by an exponent term $p \in (-\infty, 1]$ and encompasses a range of welfare functions, such as social welfare ($p = 1$), Nash social welfare ($p \to 0$), and egalitarian welfare ($p \to -\infty$). We give an algorithm that, for subadditive valuations and any given $p \in (-\infty, 1]$, computes (in the value oracle model and in polynomial time) an allocation with $p$-mean welfare at least $\frac{1}{8n}$ times the optimal. Further, we show that our approximation guarantees are essentially tight for XOS and, hence, subadditive valuations. We adapt a result of Dobzinski et al. (2010) to show that, under XOS valuations, an $O \left(n^{1-\varepsilon} \right)$ approximation for the $p$-mean welfare for any $p \in (-\infty,1]$ (including the Nash social welfare) requires exponentially many value queries; here, $\varepsilon>0$ is any fixed constant.

...read moreread less

Posted Content•

Maximum Nash Welfare and Other Stories About EFX

[...]

Georgios Amanatidis¹, Georgios Amanatidis², Georgios Birmpas³, Aris Filos-Ratsikas⁴, Alexandros Hollender⁵, Alexandros A. Voudouris¹ - Show less +2 more•Institutions (5)

University of Essex¹, University of Amsterdam², Sapienza University of Rome³, University of Liverpool⁴, University of Oxford⁵

27 Jan 2020-arXiv: Computer Science and Game Theory

TL;DR: In this paper, the authors consider the classic problem of fairly allocating indivisible goods among agents with additive valuation functions and explore the connection between two prominent fairness notions: maximum Nash welfare and envy-freeness up to any good (EFX).

...read moreread less

Abstract: We consider the classic problem of fairly allocating indivisible goods among agents with additive valuation functions and explore the connection between two prominent fairness notions: maximum Nash welfare (MNW) and envy-freeness up to any good (EFX). We establish that an MNW allocation is always EFX as long as there are at most two possible values for the goods, whereas this implication is no longer true for three or more distinct values. As a notable consequence, this proves the existence of EFX allocations for these restricted valuation functions. While the efficient computation of an MNW allocation for two possible values remains an open problem, we present a novel algorithm for directly constructing EFX allocations in this setting. Finally, we study the question of whether an MNW allocation implies any EFX guarantee for general additive valuation functions under a natural new interpretation of approximate EFX allocations.

...read moreread less

Posted Content•

A Game-Theoretic Analysis of Cross-Chain Atomic Swaps with HTLCs

[...]

Jiahua Xu¹, Damien Ackerer, Alevtina Dubovitskaya²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Lucerne University of Applied Sciences and Arts²

23 Nov 2020-arXiv: Computer Science and Game Theory

TL;DR: This study proposes a game-theoretic framework to study the strategic behaviors of agents taking part in cross-chain atomic swaps implemented with HTLCs, and demonstrates that both agents might decide not to follow the protocol in an attempt to exploit price variations so as to maximize one's own utility.

...read moreread less

Abstract: To achieve interoperability between unconnected ledgers, hash time lock contracts (HTLCs) are commonly used for cross-chain asset exchange. The solution tolerates transaction failure, and can "make the best out of worst" by allowing transacting agents to at least keep their original assets in case of an abort. Nonetheless, as an undesired outcome, reoccurring transaction failures prompt a critical and analytical examination of the protocol. In this study, we propose a game-theoretic framework to study the strategic behaviors of agents taking part in cross-chain atomic swaps implemented with HTLCs. We study the success rate of the transaction as a function of the exchange rate of the swap, the token price and its volatility, among other variables. We demonstrate that in an attempt to maximize one's own utility as asset price changes, either agent might withdraw from the swap. An extension of our model confirms that collateral deposits can improve the transaction success rate, motivating further research towards collateralization without a trusted third party. A second model variation suggests that a swap is more likely to succeed when agents dynamically adjust the exchange rate in response to price fluctuations.

...read moreread less

Posted Content•

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

[...]

Stefanos Leonardos¹, Georgios Piliouras¹•Institutions (1)

Singapore University of Technology and Design¹

05 Dec 2020-arXiv: Computer Science and Game Theory

TL;DR: It is proved that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality.

...read moreread less

Abstract: Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.

...read moreread less

Posted Content•

When to (or not to) trust intelligent machines: Insights from an evolutionary game theory analysis of trust in repeated games

[...]

Anh Han, Cedric Perret, Simon T. Powers¹•Institutions (1)

Edinburgh Napier University¹

22 Jul 2020-arXiv: Computer Science and Game Theory

TL;DR: These trust-based strategies can outcompete strategies that are always conditional, such as Tit-for-Tat, when the opportunity cost is non-negligible, and are expected to be used more frequently in interactions with intelligent agents.

...read moreread less

Abstract: The actions of intelligent agents, such as chatbots, recommender systems, and virtual assistants are typically not fully transparent to the user. Consequently, using such an agent involves the user exposing themselves to the risk that the agent may act in a way opposed to the user's goals. It is often argued that people use trust as a cognitive shortcut to reduce the complexity of such interactions. Here we formalise this by using the methods of evolutionary game theory to study the viability of trust-based strategies in repeated games. These are reciprocal strategies that cooperate as long as the other player is observed to be cooperating. Unlike classic reciprocal strategies, once mutual cooperation has been observed for a threshold number of rounds they stop checking their co-player's behaviour every round, and instead only check with some probability. By doing so, they reduce the opportunity cost of verifying whether the action of their co-player was actually cooperative. We demonstrate that these trust-based strategies can outcompete strategies that are always conditional, such as Tit-for-Tat, when the opportunity cost is non-negligible. We argue that this cost is likely to be greater when the interaction is between people and intelligent agents, because of the reduced transparency of the agent. Consequently, we expect people to use trust-based strategies more frequently in interactions with intelligent agents. Our results provide new, important insights into the design of mechanisms for facilitating interactions between humans and intelligent agents, where trust is an essential factor.

...read moreread less

Posted Content•

Fair Algorithms for Multi-Agent Multi-Armed Bandits

[...]

Safwan Hossain¹, Evi Micha¹, Nisarg Shah¹•Institutions (1)

University of Toronto¹

13 Jul 2020-arXiv: Computer Science and Game Theory

TL;DR: A multi-agent variant of the classical multi-armed bandit problem, in which there are N agents and K arms, and pulling an arm generates a (possibly different) stochastic reward to each agent, using the Nash social welfare as the notion of fairness.

...read moreread less

Abstract: We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are N agents and K arms, and pulling an arm generates a (possibly different) stochastic reward to each agent. Unlike the classical multi-armed bandit problem, the goal is not to learn the "best arm", as each agent may perceive a different arm as best for her. Instead, we seek to learn a fair distribution over arms. Drawing on a long line of research in economics and computer science, we use the Nash social welfare as our notion of fairness. We design multi-agent variants of three classic multi-armed bandit algorithms, and show that they achieve sublinear regret, now measured in terms of the Nash social welfare.

...read moreread less

Posted Content•

Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent

[...]

Gabriele Farina¹, Christian Kroer², Tuomas Sandholm¹•Institutions (2)

Carnegie Mellon University¹, Columbia University²

28 Jul 2020-arXiv: Computer Science and Game Theory

TL;DR: P predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms (CFR+, DCFR, LCFR) across all games but two of the poker games and Liar's Dice, sometimes by two or more orders of magnitude.

...read moreread less

Abstract: Blackwell approachability is a framework for reasoning about repeated games with vector-valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the next payoff vector is given, and the decision maker tries to achieve better performance based on the accuracy of that estimator. In order to derive algorithms that achieve predictive Blackwell approachability, we start by showing a powerful connection between four well-known algorithms. Follow-the-regularized-leader (FTRL) and online mirror descent (OMD) are the most prevalent regret minimizers in online convex optimization. In spite of this prevalence, the regret matching (RM) and regret matching+ (RM+) algorithms have been preferred in the practice of solving large-scale games (as the local regret minimizers within the counterfactual regret minimization framework). We show that RM and RM+ are the algorithms that result from running FTRL and OMD, respectively, to select the halfspace to force at all times in the underlying Blackwell approachability game. By applying the predictive variants of FTRL or OMD to this connection, we obtain predictive Blackwell approachability algorithms, as well as predictive variants of RM and RM+. In experiments across 18 common zero-sum extensive-form benchmark games, we show that predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms (CFR+, DCFR, LCFR) across all games but two of the poker games and Liar's Dice, sometimes by two or more orders of magnitude.

...read moreread less

Posted Content•

Simultaneously Achieving Ex-ante and Ex-post Fairness

[...]

Haris Aziz¹•Institutions (1)

University of New South Wales¹

06 Apr 2020-arXiv: Computer Science and Game Theory

TL;DR: A polynomial-time algorithm that computes an ex-ante envy-free lottery over envy- free up to one item (EF1) deterministic allocations and answers a question raised by Freeman, Shah, and Vaish whether the outcome of the probabilistic serial rule can be implemented by ex-post EF1 allocations.

...read moreread less

Abstract: We present a polynomial-time algorithm that computes an ex-ante envy-free lottery over envy-free up to one item (EF1) deterministic allocations. It has the following advantages over a recently proposed algorithm: it does not rely on the linear programming machinery including separation oracles; it is SD-efficient (both ex-ante and ex-post); and the ex-ante outcome is equivalent to the outcome returned by the well-known probabilistic serial rule. As a result, we answer a question raised by Freeman, Shah, and Vaish (2020) whether the outcome of the probabilistic serial rule can be implemented by ex-post EF1 allocations. In the light of a couple of impossibility results that we prove, our algorithm can be viewed as satisfying a maximal set of properties. Under binary utilities, our algorithm is also ex-ante group-strategyproof and ex-ante Pareto optimal. Finally, we also show that checking whether a given random allocation can be implemented by a lottery over EF1 and Pareto optimal allocations is NP-hard.

...read moreread less

Posted Content•

Auction learning as a two-player game

[...]

Jad Rahme¹, Samy Jelassi¹, S. Matthew Weinberg¹•Institutions (1)

Princeton University¹

10 Jun 2020-arXiv: Computer Science and Game Theory

TL;DR: This work uses recent results in theoretical auction design to introduce a time-independent Lagrangian, which circumvents the need for an expensive hyper-parameter search, and provides a principled metric to compare the performance of two auctions.

...read moreread less

Abstract: Designing an incentive compatible auction that maximizes expected revenue is a central problem in Auction Design. While theoretical approaches to the problem have hit some limits, a recent research direction initiated by Duetting et al. (2019) consists in building neural network architectures to find optimal auctions. We propose two conceptual deviations from their approach which result in enhanced performance. First, we use recent results in theoretical auction design (Rubinstein and Weinberg, 2018) to introduce a time-independent Lagrangian. This not only circumvents the need for an expensive hyper-parameter search (as in prior work), but also provides a principled metric to compare the performance of two auctions (absent from prior work). Second, the optimization procedure in previous work uses an inner maximization loop to compute optimal misreports. We amortize this process through the introduction of an additional neural network. We demonstrate the effectiveness of our approach by learning competitive or strictly improved auctions compared to prior work. Both results together further imply a novel formulation of Auction Design as a two-player game with stationary utility functions.

...read moreread less

Collapse