scispace - formally typeset
Search or ask a question

Showing papers on "Stochastic game published in 2016"


Posted Content
TL;DR: This paper identifies another restriction which is sufficient to ensure consistency between the two approaches and confirms that it holds in many economic models.
Abstract: Two-moment decision models are consistent with expected utility maximization only if the choice set or the agent's preferences are restricted This paper identifies a restriction which is sufficient to ensure this consistency and confirms that it holds in many economic models The implications for economic analysis are then derived Two different approaches to representing an agent's preferences over strategies yielding random payoffs are in wide use Under the mean-standard deviation (MS) approach, the agent is assumed to rank the alternatives according to the value of some function defined over the first two moments of the random payoff, while the expected utility (EU) criterion assumes that the expected value of some utility function defined over payoffs is used instead The fact that there are these two competing approaches has generated a considerable literature Some authors are concerned with the advantages and disadvantages of each, while others deal with conditions under which the potentially different approaches would yield the same results, or at least approximately so Space does not

595 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that individuals choose between several unordered alternatives, and that the payoff of choosing a field of study is potentially as important as the decision to enroll in college.
Abstract: Why do individuals choose different types of post-secondary education, and what are the labor market consequences of those choices? We show that answering these questions is difficult because individuals choose between several unordered alternatives. Even with a valid instrument for every type of education, instrumental variables estimation of the payoffs require information about individuals' ranking of education types or strong additional assumptions, like constant effects or restrictive preferences. These identification results motivate and guide our empirical analysis of the choice of and payoff to field of study. Our context is Norway's post-secondary education system where a centralized admission process covers almost all universities and colleges. This process creates credible instruments from discontinuities which effectively randomize applicants near unpredictable admission cutoffs into different fields of study. At the same time, it provides us with strategy-proof measures of individuals' ranking of fields. Taken together, this allows us to estimate the payoffs to different fields while correcting for selection bias and keeping the next-best alternatives as measured at the time of application fixed. We find that different fields have widely different payoffs, even after accounting for institutional differences and quality of peer groups. For many fields the payoffs rival the college wage premiums, suggesting the choice of field is potentially as important as the decision to enroll in college. The estimated payoffs are consistent with individuals choosing fields in which they have comparative advantage. We also test and reject assumptions of constant effects or restrictive preferences, suggesting that information on next-best alternatives is essential to identify payoffs to field of study.

396 citations


Proceedings ArticleDOI
21 Jul 2016
TL;DR: Two simplified forms of the stochastic game of miners participating in the bitcoin protocol are considered, in which the miners have complete information and when the computational power of a miner is large, he deviates from the expected behavior, and other Nash equilibria arise.
Abstract: We study the strategic considerations of miners participating in the bitcoin's protocol. We formulate and study the stochastic game that underlies these strategic considerations. The miners collectively build a tree of blocks, and they are paid when they create a node (mine a block) which will end up in the path of the tree that is adopted by all. Since the miners can hide newly mined nodes, they play a game with incomplete information. Here we consider two simplified forms of this game in which the miners have complete information. In the simplest game the miners release every mined block immediately, but are strategic on which blocks to mine. In the second more complicated game, when a block is mined it is announced immediately, but it may not be released so that other miners cannot continue mining from it. A miner not only decides which blocks to mine, but also when to release blocks to other miners. In both games, we show that when the computational power of each miner is relatively small, their best response matches the expected behavior of the bitcoin designer. However, when the computational power of a miner is large, he deviates from the expected behavior, and other Nash equilibria arise.

204 citations


Journal ArticleDOI
TL;DR: Simulation results demonstrate the superior performance of the proposed DR method in increasing the utility companies' profit and customers' payoff, as well as in reducing the peak-to-average ratio in the aggregate load demand.
Abstract: In smart grid, customers have access to the electricity consumption and the price data via smart meters; thus, they are able to participate in the demand response (DR) programs. In this paper, we address the interaction among multiple utility companies and multiple customers in smart grid by modeling the DR problem as two noncooperative games: the supplier and customer side games. In the first game, supply function bidding mechanism is employed to model the utility companies’ profit maximization problem. In the proposed mechanism, the utility companies submit their bids to the data center, where the electricity price is computed and is sent to the customers. In the second game, the price anticipating customers determine optimal shiftable load profile to maximize their daily payoff. The existence and uniqueness of the Nash equilibrium in the mentioned games are studied and a computationally tractable distributed algorithm is designed to determine the equilibrium. Simulation results demonstrate the superior performance of the proposed DR method in increasing the utility companies’ profit and customers’ payoff, as well as in reducing the peak-to-average ratio in the aggregate load demand. Finally, the algorithm performance is compared with a DR method in the literature to demonstrate the similarities and differences.

190 citations


Journal ArticleDOI
TL;DR: In this paper, the authors characterize the set of outcomes that can arise in Bayes Nash equilibria if players observe the given information structure but may also observe additional signals, and identify a partial order on many player information structures (individual sufficiency) under which more information shrinks the set.
Abstract: A game of incomplete information can be decomposed into a basic game and an information structure. The basic game defines the set of actions, the set of payoff states the payoff functions and the common prior over the payoff states. The information structure refers to the signals that the players receive in the game. We characterize the set of outcomes that can arise in Bayes Nash equilibrium if players observe the given information structure but may also observe additional signals. The characterization corresponds to the set of (a version of) incomplete information correlated equilibria which we dub Bayes correlated equilibria. We identify a partial order on many player information structures (individual sufficiency) under which more information shrinks the set of Bayes correlated equilibria. This order captures the role of information in imposing (incentive) constraints on behavior.

184 citations


Journal ArticleDOI
TL;DR: A novel approach for joint power control and user scheduling is proposed for optimizing energy efficiency (EE) in ultra dense small cell networks (UDNs), which yields an equilibrium control policy per SBS, which maximizes the network utility while ensuring users' quality-of-service.
Abstract: In this paper, a novel approach for joint power control and user scheduling is proposed for optimizing energy efficiency (EE), in terms of bits per unit energy, in ultra dense small cell networks (UDNs). Due to severe coupling in interference, this problem is formulated as a dynamic stochastic game (DSG) between small cell base stations (SBSs). This game enables capturing the dynamics of both the queues and channel states of the system. To solve this game, assuming a large homogeneous UDN deployment, the problem is cast as a mean-field game (MFG) in which the MFG equilibrium is analyzed with the aid of low-complexity tractable partial differential equations. Exploiting the stochastic nature of the problem, user scheduling is formulated as a stochastic optimization problem and solved using the drift plus penalty (DPP) approach in the framework of Lyapunov optimization. Remarkably, it is shown that by weaving notions from Lyapunov optimization and mean-field theory, the proposed solution yields an equilibrium control policy per SBS, which maximizes the network utility while ensuring users’ quality-of-service. Simulation results show that the proposed approach achieves up to 70.7% gains in EE and 99.5% reductions in the network’s outage probabilities compared to a baseline model, which focuses on improving EE while attempting to satisfy the users’ instantaneous quality-of-service requirements.

168 citations


Journal ArticleDOI
TL;DR: A unified approach to analyze and understand the coupled evolution of strategies and the environment is proposed, identifying an oscillatory tragedy of the commons in which the system cycles between deplete and replete environments and cooperation and defection behavior states and incentivizing cooperation when others defect in the depleted state is found.
Abstract: A tragedy of the commons occurs when individuals take actions to maximize their payoffs even as their combined payoff is less than the global maximum had the players coordinated. The originating example is that of overgrazing of common pasture lands. In game-theoretic treatments of this example, there is rarely consideration of how individual behavior subsequently modifies the commons and associated payoffs. Here, we generalize evolutionary game theory by proposing a class of replicator dynamics with feedback-evolving games in which environment-dependent payoffs and strategies coevolve. We initially apply our formulation to a system in which the payoffs favor unilateral defection and cooperation, given replete and depleted environments, respectively. Using this approach, we identify and characterize a class of dynamics: an oscillatory tragedy of the commons in which the system cycles between deplete and replete environmental states and cooperation and defection behavior states. We generalize the approach to consider outcomes given all possible rational choices of individual behavior in the depleted state when defection is favored in the replete state. In so doing, we find that incentivizing cooperation when others defect in the depleted state is necessary to avert the tragedy of the commons. In closing, we propose directions for the study of control and influence in games in which individual actions exert a substantive effect on the environmental state.

155 citations


Posted Content
TL;DR: In this paper, the authors study the strategic considerations of miners participating in the bitcoin protocol and show that when the computational power of each miner is relatively small, their best response matches the expected behavior of the bitcoin designer.
Abstract: We study the strategic considerations of miners participating in the bitcoin's protocol. We formulate and study the stochastic game that underlies these strategic considerations. The miners collectively build a tree of blocks, and they are paid when they create a node (mine a block) which will end up in the path of the tree that is adopted by all. Since the miners can hide newly mined nodes, they play a game with incomplete information. Here we consider two simplified forms of this game in which the miners have complete information. In the simplest game the miners release every mined block immediately, but are strategic on which blocks to mine. In the second more complicated game, when a block is mined it is announced immediately, but it may not be released so that other miners cannot continue mining from it. A miner not only decides which blocks to mine, but also when to release blocks to other miners. In both games, we show that when the computational power of each miner is relatively small, their best response matches the expected behavior of the bitcoin designer. However, when the computational power of a miner is large, he deviates from the expected behavior, and other Nash equilibria arise.

142 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigate a class of reinforcement learning dynamics where players adjust their strategies based on their actions' cumulative payoffs over time-specifically, by playing mixed strategies that maximize their expected cumulative payoff minus a regularization term.
Abstract: We investigate a class of reinforcement learning dynamics where players adjust their strategies based on their actions' cumulative payoffs over time-specifically, by playing mixed strategies that maximize their expected cumulative payoff minus a regularization term. A widely studied example is exponential reinforcement learning, a process induced by an entropic regularization term which leads mixed strategies to evolve according to the replicator dynamics. However, in contrast to the class of regularization functions used to define smooth best responses in models of stochastic fictitious play, the functions used in this paper need not be infinitely steep at the boundary of the simplex; in fact, dropping this requirement gives rise to an important dichotomy between steep and nonsteep cases. In this general framework, we extend several properties of exponential learning, including the elimination of dominated strategies, the asymptotic stability of strict Nash equilibria, and the convergence of time-averaged trajectories in zero-sum games with an interior Nash equilibrium.

90 citations


Journal ArticleDOI
TL;DR: The presented results support preceding research that highlights the favorable role of heterogeneity regardless of its origin, and they also emphasize the importance of the population structure in amplifying facilitators of cooperation.
Abstract: Evolutionary games on networks traditionally involve the same game at each interaction. Here we depart from this assumption by considering mixed games, where the game played at each interaction is drawn uniformly at random from a set of two different games. While in well-mixed populations the random mixture of the two games is always equivalent to the average single game, in structured populations this is not always the case. We show that the outcome is, in fact, strongly dependent on the distance of separation of the two games in the parameter space. Effectively, this distance introduces payoff heterogeneity, and the average game is returned only if the heterogeneity is small. For higher levels of heterogeneity the distance to the average game grows, which often involves the promotion of cooperation. The presented results support preceding research that highlights the favorable role of heterogeneity regardless of its origin, and they also emphasize the importance of the population structure in amplifying facilitators of cooperation.

86 citations


Proceedings ArticleDOI
19 Jun 2016
TL;DR: In this paper, the authors consider the problem of persuading a receiver to reveal a noisy signal regarding the realization of the payoffs of various actions in expectation that the receiver rationally acts to maximize his own payoff.
Abstract: Persuasion, defined as the act of exploiting an informational advantage in order to effect the decisions of others, is ubiquitous. Indeed, persuasive communication has been estimated to account for almost a third of all economic activity in the US. This paper examines persuasion through a computational lens, focusing on what is perhaps the most basic and fundamental model in this space: the celebrated Bayesian persuasion model of Kamenica and Gentzkow. Here there are two players, a sender and a receiver. The receiver must take one of a number of actions with a-priori unknown payoff, and the sender has access to additional information regarding the payoffs of the various actions for both players. The sender can commit to revealing a noisy signal regarding the realization of the payoffs of various actions, and would like to do so as to maximize her own payoff in expectation assuming that the receiver rationally acts to maximize his own payoff. When the payoffs of various actions follow a joint distribution (the common prior), the sender's problem is nontrivial, and its computational complexity depends on the representation of this prior. We examine the sender's optimization task in three of the most natural input models for this problem, and essentially pin down its computational complexity in each. When the payoff distributions of the different actions are i.i.d. and given explicitly, we exhibit a polynomial-time (exact) algorithmic solution, and a ``simple'' (1-1/e)-approximation algorithm. Our optimal scheme for the i.i.d. setting involves an analogy to auction theory, and makes use of Border's characterization of the space of reduced-forms for single-item auctions. When action payoffs are independent but non-identical with marginal distributions given explicitly, we show that it is #P-hard to compute the optimal expected sender utility. In doing so, we rule out a generalized Border's theorem, as defined by Gopalan et al, for this setting. Finally, we consider a general (possibly correlated) joint distribution of action payoffs presented by a black box sampling oracle, and exhibit a fully polynomial-time approximation scheme (FPTAS) with a bi-criteria guarantee. Our FPTAS is based on Monte-Carlo sampling, and its analysis relies on the principle of deferred decisions. Moreover, we show that this result is the best possible in the black-box model for information-theoretic reasons.

Journal ArticleDOI
TL;DR: This work identifies and addresses a spectrum of questions pertaining to belief and truth in hypothesised types, and formulate three basic ways to incorporate evidence into posterior beliefs and show when the resulting beliefs are correct, and when they may fail to be correct.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: It is proved that there exists a constant e > 0 such that, assuming the Exponential Time Hypothesis for PPAD, computing an e-approximate Nash equilibrium in a two-player (n × n) game requires quasi-polynomial time, nlog1-o(1) n.
Abstract: We prove that there exists a constant e > 0 such that, assuming the Exponential Time Hypothesis for PPAD, computing an e-approximate Nash equilibrium in a two-player (n × n) game requires quasi-polynomial time, nlog1-o(1) n. This matches (up to the o(1) term) the algorithm of Lipton, Markakis, and Mehta [54]. Our proof relies on a variety of techniques from the study of probabilistically checkable proofs (PCP), this is the first time that such ideas are used for a reduction between problems inside PPAD. En route, we also prove new hardness results for computing Nash equilibria in games with many players. In particular, we show that computing an e-approximate Nash equilibrium in a game with n players requires 2Ω(n) oracle queries to the payoff tensors. This resolves an open problem posed by Hart and Nisan [43], Babichenko [13], and Chen et al. [28]. In fact, our results for n-player games are stronger: they hold with respect to the (e,δ)-WeakNash relaxation recently introduced by Babichenko et al. [15].

Posted ContentDOI
25 Aug 2016
TL;DR: This paper focuses on learning via "dual averaging", a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then "mirror" the output back to their action sets, and introduces the notion of variational stability.
Abstract: This paper examines the convergence of no-regret learning in games with continuous action sets. For concreteness, we focus on learning via "dual averaging", a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then "mirror" the output back to their action sets. In terms of feedback, we assume that players can only estimate their payoff gradients up to a zero-mean error with bounded variance. To study the convergence of the induced sequence of play, we introduce the notion of variational stability, and we show that stable equilibria are locally attracting with high probability whereas globally stable equilibria are globally attracting with probability 1. We also discuss some applications to mixed-strategy learning in finite games, and we provide explicit estimates of the method's convergence speed.

Journal ArticleDOI
TL;DR: A novel distributed power control paradigm is proposed for dense small cell networks co-existing with a traditional macrocellular network using the Lax-Friedrichs scheme and Lagrange relaxation to solve a mean field game considering a highly dense network.
Abstract: A novel distributed power control paradigm is proposed for dense small cell networks co-existing with a traditional macrocellular network. The power control problem is first modeled as a stochastic game and the existence of the Nash Equilibrium is proven. Then, we extend the formulated stochastic game to a mean field game (MFG) considering a highly dense network. An MFG is a special type of differential game which is ideal for modeling the interactions among a large number of entities. We analyze the performance of two different cost functions for the mean field game formulation. Both of these cost functions are designed using stochastic geometry analysis in such a way that the cost functions are valid for the MFG setting. A finite difference algorithm is then developed based on the Lax-Friedrichs scheme and Lagrange relaxation to solve the corresponding MFG. Each small cell base station can independently execute the proposed algorithm offline, i.e., prior to data transmission. The output of the algorithm shows how each small cell base station should adjust its transmit power in order to minimize the cost over a predefined period of time. Moreover, sufficient conditions for the uniqueness of the mean field equilibrium for a generic cost function are also given. The effectiveness of the proposed algorithm is demonstrated via numerical results.

Journal ArticleDOI
TL;DR: In this article, the authors find that the vast majority of decisions (96%96%) constitute myopic best responses, but deviations continue to occur with probabilities that are sensitive to their costs.

Journal ArticleDOI
TL;DR: This work proves the existence of autocratic strategies that unilaterally enforce linear relationships on the payoffs for repeated games and introduces a broader class of autocracy strategies by extending zero-determinant strategies to iterated games with more general action spaces.
Abstract: The recent discovery of zero-determinant strategies for the iterated prisoner’s dilemma sparked a surge of interest in the surprising fact that a player can exert unilateral control over iterated interactions. These remarkable strategies, however, are known to exist only in games in which players choose between two alternative actions such as “cooperate” and “defect.” Here we introduce a broader class of autocratic strategies by extending zero-determinant strategies to iterated games with more general action spaces. We use the continuous donation game as an example, which represents an instance of the prisoner’s dilemma that intuitively extends to a continuous range of cooperation levels. Surprisingly, despite the fact that the opponent has infinitely many donation levels from which to choose, a player can devise an autocratic strategy to enforce a linear relationship between his or her payoff and that of the opponent even when restricting his or her actions to merely two discrete levels of cooperation. In particular, a player can use such a strategy to extort an unfair share of the payoffs from the opponent. Therefore, although the action space of the continuous donation game dwarfs that of the classic prisoner’s dilemma, players can still devise relatively simple autocratic and, in particular, extortionate strategies.

Journal ArticleDOI
TL;DR: In this paper, an analytical model to study the evolution towards equilibrium in spatial games with "memory-aware" agents, i.e., agents that accumulate their payoff over time, was introduced.
Abstract: We introduce an analytical model to study the evolution towards equilibrium in spatial games, with ‘memory-aware’ agents, i.e., agents that accumulate their payoff over time. In particular, we focus our attention on the spatial Prisoner’s Dilemma, as it constitutes an emblematic example of a game whose Nash equilibrium is defection. Previous investigations showed that, under opportune conditions, it is possible to reach, in the evolutionary Prisoner’s Dilemma, an equilibrium of cooperation. Notably, it seems that mechanisms like motion may lead a population to become cooperative. In the proposed model, we map agents to particles of a gas so that, on varying the system temperature, they randomly move. In doing so, we are able to identify a relation between the temperature and the final equilibrium of the population, explaining how it is possible to break the classical Nash equilibrium in the spatial Prisoner’s Dilemma when considering agents able to increase their payoff over time. Moreover, we introduce a formalism to study order-disorder phase transitions in these dynamics. As result, we highlight that the proposed model allows to explain analytically how a population, whose interactions are based on the Prisoner’s Dilemma, can reach an equilibrium far from the expected one; opening also the way to define a direct link between evolutionary game theory and statistical physics.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the notion of optimal payoff as that maximizing the terminal position for a chosen preference functional and investigate the relationship between both concepts, optimal and efficient payoffs, as well as the behavior of the efficient payoff under different market dynamics.
Abstract: In 1988 Dybvig introduced the payoff distribution pricing model (PDPM) as an alternative to the capital asset pricing model (CAPM). Under this new paradigm agents preferences depend on the probability distribution of the payoff and for the same distribution agents prefer the payoff that requires less investment. In this context he gave the notion of efficient payoff. Both approaches run parallel to the theory of choice of von Neumann-Morgenstern (1947), known as the Expected Utility Theory and posterior axiomatic alternatives. In this paper we consider the notion of optimal payoff as that maximizing the terminal position for a chosen preference functional and we investigate the relationship between both concepts, optimal and efficient payoffs, as well as the behavior of the efficient payoffs under different market dynamics. We also show that path-dependent options can be efficient in some simple models.

Journal ArticleDOI
TL;DR: In this paper, the authors developed an analytical framework to investigate the effects of the consumer's inequity aversion on a firm's optimal pricing and quality decisions, and they highlighted several findings.
Abstract: Consumers with inequity aversion experience some psychological disutility when buying products at unfair prices. Empirical evidence and behavioral research have suggested that consumers may perceive a firm’s price as unfair when its profit margin is too high relative to consumers’ surplus. The authors develop an analytical framework to investigate the effects of the consumer’s inequity aversion on a firm’s optimal pricing and quality decisions. They highlight several findings. First, because of the consumer’s uncertainty about the firm’s cost, the firm’s optimal quality may be nonmonotone with respect to the degree of the consumer’s inequity aversion. Second, stronger inequity aversion makes an inefficient firm worse off but may benefit an efficient firm. Third, stronger inequity aversion by the consumer can actually lower the consumer’s monetary payoff (economic surplus) because the firm may reduce its quality to a greater extent than it reduces its price. Finally, as the expected cost efficiency...

Journal ArticleDOI
TL;DR: In this article, a joint power control and user scheduling approach is proposed for optimizing energy efficiency in ultra-dense small cell networks (UDNs) in terms of bits per unit energy, where the problem is formulated as a dynamic stochastic game between small cell base stations (SBSs).
Abstract: In this paper, a novel approach for joint power control and user scheduling is proposed for optimizing energy efficiency (EE), in terms of bits per unit energy, in ultra dense small cell networks (UDNs). Due to severe coupling in interference, this problem is formulated as a dynamic stochastic game (DSG) between small cell base stations (SBSs). This game enables to capture the dynamics of both the queues and channel states of the system. To solve this game, assuming a large homogeneous UDN deployment, the problem is cast as a mean-field game (MFG) in which the MFG equilibrium is analyzed with the aid of low-complexity tractable partial differential equations. Exploiting the stochastic nature of the problem, user scheduling is formulated as a stochastic optimization problem and solved using the drift plus penalty (DPP) approach in the framework of Lyapunov optimization. Remarkably, it is shown that by weaving notions from Lyapunov optimization and mean-field theory, the proposed solution yields an equilibrium control policy per SBS which maximizes the network utility while ensuring users' quality-of-service. Simulation results show that the proposed approach achieves up to 70.7% gains in EE and 99.5% reductions in the network's outage probabilities compared to a baseline model which focuses on improving EE while attempting to satisfy the users' instantaneous quality-of-service requirements.

Journal ArticleDOI
TL;DR: In this paper, the authors consider a multi-agent Bayesian persuasion problem where an informed sender tries to persuade a group of receivers to adopt a certain product, and the payoff to the sender is a function of the subset of adopters.
Abstract: We consider a multi-agent Bayesian persuasion problem where an informed sender tries to persuade a group of receivers to adopt a certain product. The sender is allowed to commit to a signalling policy where she sends a private signal to every receiver. The payoff to the sender is a function of the subset of adopters. We characterize an optimal signaling policy and the maximal revenue to the sender for three different types of payoff functions: supermodular, symmetric submodular, and a supermajority function. Moreover, using tools from cooperative game theory we provide a necessary and sufficient condition under which a public signaling policy is optimal.

Journal ArticleDOI
TL;DR: This paper investigates the resource allocation issue for intercell scenarios where a D2D link is located in the overlapping area of two neighboring cells and develops a repeated game model that significantly enhances the system performance, including sum rate and sum rate gain.
Abstract: Device-to-device (D2D) communication is a recently emerged disruptive technology for enhancing the performance of current cellular systems. To successfully implement D2D communications underlaying cellular networks, resource allocation to D2D links is a critical issue, which is far from trivial due to the mutual interference between D2D users and cellular users. Most of the existing resource allocation research for D2D communications has primarily focused on the intracell scenario while leaving the intercell settings not considered. In this paper, we investigate the resource allocation issue for intercell scenarios where a D2D link is located in the overlapping area of two neighboring cells. Furthermore, we present three intercell D2D scenarios regarding the resource allocation problem. To address the problem, we develop a repeated game model under these scenarios. Distinct from existing works, we characterize the communication infrastructure, namely, base stations, as players competing resource allocation quota from D2D demand, and we define the utility of each player as the payoff from both cellular and D2D communications using radio resources. We also propose a resource allocation algorithm and protocol based on the Nash equilibrium derivations. Numerical results indicate that the developed model not only significantly enhances the system performance, including sum rate and sum rate gain, but also shed lights on resource configurations for intercell D2D scenarios.

Posted ContentDOI
04 Nov 2016-bioRxiv
TL;DR: This work generalizes evolutionary game theory by proposing a class of replicator dynamics with feedback-evolving games in which environment-dependent payoffs and strategies coevolve and finds that incentivizing cooperation when others defect in the depleted state is necessary to avert the tragedy of the commons.
Abstract: A tragedy of the commons occurs when individuals take actions to maximize their payoffs even as their combined payoff is less than the global maximum had the players coordinated. The originating example is that of over-grazing of common pasture lands. In game theoretic treatments of this example there is rarely consideration of how individual behavior subsequently modiffes the commons and associated payoffs. Here, we generalize evolutionary game theory by proposing a class of replicator dynamics with feedback-evolving games in which environment-dependent payoffs and strategies coevolve. We apply our formulation to a system in which the payoffs favor unilateral defection and cooperation, given replete and depleted environments respectively. Using this approach we identify a new class of dynamics: an oscillatory tragedy of the commons in which the system cycles between deplete and replete environmental states and cooperation and defection behavior states. We generalize the approach to consider outcomes given all possible rational choices of individual behavior in the depleted state when defection is favored in the replete state. In so doing we find that incentivizing cooperation when others defect in the depleted state is necessary to avert the tragedy of the commons. In closing, we propose new directions for the study of control and influence in games in which individual actions exert a substantive effect on the environmental state.

Journal ArticleDOI
TL;DR: Eye movements in symmetric games including dominance‐solvable games like prisoner's dilemma and asymmetric coordination games like stag hunt and hawk–dove found longer duration choices with more fixations when payoffs differences were more finely balanced, an emerging bias to gaze more at the payoffs for the action ultimately chosen, and that a simple count of transitions between payoffs was strongly associated with the final choice.
Abstract: In risky and other multiattribute choices, the process of choosing is well described by random walk or drift diffusion models in which evidence is accumulated over time to threshold. In strategic choices, level-k and cognitive hierarchy models have been offered as accounts of the choice process, in which people simulate the choice processes of their opponents or partners. We recorded the eye movements in 2 × 2 symmetric games including dominance-solvable games like prisoner's dilemma and asymmetric coordination games like stag hunt and hawk–dove. The evidence was most consistent with the accumulation of payoff differences over time: we found longer duration choices with more fixations when payoffs differences were more finely balanced, an emerging bias to gaze more at the payoffs for the action ultimately chosen, and that a simple count of transitions between payoffs—whether or not the comparison is strategically informative—was strongly associated with the final choice. The accumulator models do account for these strategic choice process measures, but the level-k and cognitive hierarchy models do not.

Posted Content
TL;DR: In this paper, it was shown that computing an approximate Nash equilibrium in a game with n players requires quasi-polynomial time, in the sense that the payoff tensors need to be queried every time a Nash equilibrium is reached.
Abstract: We prove that there exists a constant $\epsilon>0$ such that, assuming the Exponential Time Hypothesis for PPAD, computing an $\epsilon$-approximate Nash equilibrium in a two-player (nXn) game requires quasi-polynomial time, $n^{\log^{1-o(1)} n}$. This matches (up to the o(1) term) the algorithm of Lipton, Markakis, and Mehta [LMM03]. Our proof relies on a variety of techniques from the study of probabilistically checkable proofs (PCP); this is the first time that such ideas are used for a reduction between problems inside PPAD. En route, we also prove new hardness results for computing Nash equilibria in games with many players. In particular, we show that computing an $\epsilon$-approximate Nash equilibrium in a game with n players requires $2^{\Omega(n)}$ oracle queries to the payoff tensors. This resolves an open problem posed by Hart and Nisan [HN13], Babichenko [Bab14], and Chen et al. [CCT15]. In fact, our results for n-player games are stronger: they hold with respect to the $(\epsilon,\delta)$-WeakNash relaxation recently introduced by Babichenko et al. [BPR16].

Journal ArticleDOI
TL;DR: This article derives a weak version of a dynamic programming principle (DPP) for the corresponding value function and provides an alternative characterization of the value function as a solution of a partial differential equation (PDE) in the sense of discontinuous Viscosity solutions, along with boundary conditions both in Dirichlet and viscosity senses.

Journal ArticleDOI
TL;DR: This study sheds a new light on the relations between the microscopic dynamics of the Public Goods Game and its macroscopic behavior, strengthening the link between the field of Evolutionary Game Theory and statistical physics.
Abstract: In this work we aim to analyze the role of noise in the spatial public goods game, one of the most famous games in evolutionary game theory. The dynamics of this game is affected by a number of parameters and processes, namely the topology of interactions among the agents, the synergy factor, and the strategy revision phase. The latter is a process that allows agents to change their strategy. Notably, rational agents tend to imitate richer neighbors, in order to increase the probability to maximize their payoff. By implementing a stochastic revision process, it is possible to control the level of noise in the system, so that even irrational updates may occur. In particular, in this work we study the effect of noise on the macroscopic behavior of a finite structured population playing the public goods game. We consider both the case of a homogeneous population, where the noise in the system is controlled by tuning a parameter representing the level of stochasticity in the strategy revision phase, and a heterogeneous population composed of a variable proportion of rational and irrational agents. In both cases numerical investigations show that the public goods game has a very rich behavior which strongly depends on the amount of noise in the system and on the value of the synergy factor. To conclude, our study sheds a new light on the relations between the microscopic dynamics of the public goods game and its macroscopic behavior, strengthening the link between the field of evolutionary game theory and statistical physics.

Journal ArticleDOI
TL;DR: This paper analyzed subjects' eye movements while playing a series of two-person, 3 × 3 one-shot games in normal form and found correlations between eye-movements and choices; however, applying the Cognitive Hierarchy model to their data, only some of the subjects present both information search patterns and choices compatible with a specific cognitive level.
Abstract: Previous experimental research suggests that individuals apply rules of thumb to a simplified mental model of the “real” decision problem. We claim that this simplification is obtained either by neglecting the other players’ incentives and beliefs or by taking them into consideration only for a subset of game outcomes. We analyze subjects’ eye movements while playing a series of two-person, 3 × 3 one-shot games in normal form. Games within each class differ by a set of descriptive features (i.e., features that can be changed without altering the game equilibrium properties). Data show that subjects on average perform partial or non-strategic analysis of the payoff matrix, often ignoring the opponent´s payoffs and rarely performing the necessary steps to detect dominance. Our analysis of eye-movements supports the hypothesis that subjects use simple decision rules such as “choose the strategy with the highest average payoff” or “choose the strategy leading to an attractive and symmetric outcome” without (optimally) incorporating knowledge on the opponent’s behavior. Lookup patterns resulted being feature and game invariant, heterogeneous across subjects, but stable within subjects. Using a cluster analysis, we find correlations between eye-movements and choices; however, applying the Cognitive Hierarchy model to our data, we show that only some of the subjects present both information search patterns and choices compatible with a specific cognitive level. We also find a series of correlations between strategic behavior and individual characteristics like risk attitude, short-term memory capacity, and mathematical and logical abilities.

Journal ArticleDOI
TL;DR: Using a simple game-theoretic model, it is shown that the protector can lose if he does not take into account the possibility that the adversary can play a game other than the one the protector has in mind.
Abstract: The security community has witnessed a significant increase in the number of different types of security threats. This situation calls for the design of new techniques that can be incorporated into security protocols to meet these challenges successfully. An important tool for developing new security protocols as well as estimating their effectiveness is game theory. This game theory framework usually involves two players or agents: 1) a protector and 2) an adversary, and two patterns of agent behavior are considered: 1) selfish behavior, where each of the agents wants to maximize his payoff; and 2) leader and follower behavior, where one agent (the leader) expects that the other agent (the follower) will respond to the leader’s strategy. Such an approach assumes that the agents agree on which strategy to apply in advance. In this paper, this strong assumption is relaxed. Namely, the following question is considered: what happens if it is unknown a priori what pattern of behavior the adversary is going to use, or in other words, it is not known, what game he intends to play? Using a simple game-theoretic model, it is shown that the protector can lose if he does not take into account the possibility that the adversary can play a game other than the one the protector has in mind. Further considered is a repeated game in which the protector can learn about the presence of an adversary, and the behavior of belief probabilities is analyzed in this setting.