scispace - formally typeset
Search or ask a question

Showing papers on "Stochastic game published in 2009"


Journal ArticleDOI
04 Sep 2009-Science
TL;DR: It is shown that reward is as effective as punishment for maintaining public cooperation and leads to higher total earnings and that human cooperation in such repeated settings is best supported by positive interactions with others.
Abstract: The public goods game is the classic laboratory paradigm for studying collective action problems. Each participant chooses how much to contribute to a common pool that returns benefits to all participants equally. The ideal outcome occurs if everybody contributes the maximum amount, but the self-interested strategy is not to contribute anything. Most previous studies have found punishment to be more effective than reward for maintaining cooperation in public goods games. The typical design of these studies, however, represses future consequences for today’s actions. In an experimental setting, we compare public goods games followed by punishment, reward, or both in the setting of truly repeated games, in which player identities persist from round to round. We show that reward is as effective as punishment for maintaining public cooperation and leads to higher total earnings. Moreover, when both options are available, reward leads to increased contributions and payoff, whereas punishment has no effect on contributions and leads to lower payoff. We conclude that reward outperforms punishment in repeated public goods games and that human cooperation in such repeated settings is best supported by positive interactions with others.

659 citations


Journal ArticleDOI
TL;DR: A variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms is considered, providing the first analysis of the expected regret for such algorithms.

590 citations


Journal ArticleDOI
Minyi Huang1
TL;DR: To overcome the dimensionality difficulty and obtain decentralized strategies, the so-called Nash certainty equivalence methodology is applied and the control synthesis is preceded by a state space augmentation via a set of aggregate quantities giving the mean field approximation.
Abstract: We consider linear-quadratic-Gaussian (LQG) games with a major player and a large number of minor players. The major player has a significant influence on others. The minor players individually have negligible impact, but they collectively contribute mean field coupling terms in the individual dynamics and costs. To overcome the dimensionality difficulty and obtain decentralized strategies, the so-called Nash certainty equivalence methodology is applied. The control synthesis is preceded by a state space augmentation via a set of aggregate quantities giving the mean field approximation. Subsequently, within the population limit the LQG game is decomposed into a family of limiting two-player games as each is locally seen by a representative minor player. Next, when solving these limiting two-player games, we impose certain interaction consistency conditions such that the aggregate quantities initially assumed coincide with the ones replicated by the closed loop of a large number of minor players. This procedure leads to a set of decentralized strategies for the original LQG game, which is an $\varepsilon$-Nash equilibrium.

319 citations


Journal ArticleDOI
TL;DR: This work constructs a general mathematical approach for studying any evolutionary game in set structured populations and derives precise conditions for cooperators to be selected over defectors in the evolution of cooperation.
Abstract: Evolutionary dynamics are strongly affected by population structure. The outcome of an evolutionary process in a well-mixed population can be very different from that in a structured population. We introduce a powerful method to study dynamical population structure: evolutionary set theory. The individuals of a population are distributed over sets. Individuals interact with others who are in the same set. Any 2 individuals can have several sets in common. Some sets can be empty, whereas others have many members. Interactions occur in terms of an evolutionary game. The payoff of the game is interpreted as fitness. Both the strategy and the set memberships change under evolutionary updating. Therefore, the population structure itself is a consequence of evolutionary dynamics. We construct a general mathematical approach for studying any evolutionary game in set structured populations. As a particular example, we study the evolution of cooperation and derive precise conditions for cooperators to be selected over defectors.

241 citations


Journal ArticleDOI
TL;DR: This work introduces three different payoff-based processes for increasingly general scenarios and proves that, after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability.
Abstract: We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent cooperative control problems. A strategy adjustment process determines how players select their strategies at any stage as a function of the information gathered over previous stages. Of particular interest are “payoff-based” processes in which, at any stage, players know only their own actions and (noise corrupted) payoffs from previous stages. In particular, players do not know the actions taken by other players and do not know the structural form of payoff functions. We introduce three different payoff-based processes for increasingly general scenarios and prove that, after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability. We also show how to modify player utility functions through tolls and incentives in so-called congestion games, a special class of weakly acyclic games, to guarantee that a centralized objective can be realized as a Nash equilibrium. We illustrate the methods with a simulation of distributed routing over a network.

235 citations


Journal ArticleDOI
TL;DR: This work introduces a learning rule in which behavior is conditional on whether a player experiences an error of the first or second type, which implements Nash equilibrium behavior in any game with generic payoffs and at least one pure Nash equilibrium.

227 citations


Journal ArticleDOI
TL;DR: This work model cooperation in wireless networks through a game theoretical algorithm derived from a novel concept from coalitional game theory that enables the users to self-organize into independent disjoint coalitions and the resulting clustered network structure is characterized through novel stability notions.
Abstract: Cooperation in wireless networks allows single antenna devices to improve their performance by forming virtual multiple antenna systems. However, performing a distributed and fair cooperation constitutes a major challenge. In this work, we model cooperation in wireless networks through a game theoretical algorithm derived from a novel concept from coalitional game theory. A simple and distributed merge-and-split algorithm is constructed to form coalition groups among single antenna devices and to allow them to maximize their utilities in terms of rate while accounting for the cost of cooperation in terms of power. The proposed algorithm enables the users to self-organize into independent disjoint coalitions and the resulting clustered network structure is characterized through novel stability notions. In addition, we prove the convergence of the algorithm and we investigate how the network structure changes when different fairness criteria are chosen for apportioning the coalition worth among its members. Simulation results show that the proposed algorithm can improve the individual user's payoff up to 40.42% as well as efficiently cope with the mobility of the distributed users.

226 citations


Proceedings ArticleDOI
19 Apr 2009
TL;DR: It is shown that maximizing the number of supported connections is NP-hard, even when there is no background noise, in contrast to the problem of determining whether or not a given set of connections is feasible since that problem can be solved via linear programming.
Abstract: In this paper we consider the problem of maximizing the number of supported connections in arbitrary wireless networks where a transmission is supported if and only if the signal-to-interference-plus-noise ratio at the receiver is greater than some threshold. The aim is to choose transmission powers for each connection so as to maximize the number of connections for which this threshold is met. We believe that analyzing this problem is important both in its own right and also because it arises as a subproblem in many other areas of wireless networking. We study both the complexity of the problem and also present some game theoretic results regarding capacity that is achieved by completely distributed algorithms. We also feel that this problem is intriguing since it involves both continuous aspects (i.e. choosing the transmission powers) as well as discrete aspects (i.e. which connections should be supported). Our results are: ldr We show that maximizing the number of supported connections is NP-hard, even when there is no background noise. This is in contrast to the problem of determining whether or not a given set of connections is feasible since that problem can be solved via linear programming. ldr We present a number of approximation algorithms for the problem. All of these approximation algorithms run in polynomial time and have an approximation ratio that is independent of the number of connections. ldr We examine a completely distributed algorithm and analyze it as a game in which a connection receives a positive payoff if it is successful and a negative payoff if it is unsuccessful while transmitting with nonzero power. We show that in this game there is not necessarily a pure Nash equilibrium but if such an equilibrium does exist the corresponding price of anarchy is independent of the number of connections. We also show that a mixed Nash equilibrium corresponds to a probabilistic transmission strategy and in this case such an equilibrium always exists and has a price of anarchy that is independent of the number of connections. This work was supported by NSF contract CCF-0728980 and was performed while the second author was visiting Bell Labs in Summer, 2008.

220 citations


Journal ArticleDOI
TL;DR: It is shown that the cognitive hierarchy model is a special case of Truncated QRE, and significant evidence of payoff responsive stochastic choice, and of heterogeneity and downward looking beliefs in some games is found.

192 citations


Journal ArticleDOI
TL;DR: The authors used a modified dictator game to investigate the relationship between response times and social preferences and found that faster subjects more often chose the option with the highest payoff for themselves, while within-analysis reveals that payoff maximizing choices are reached quicker than choices expressing social preferences.

174 citations


Journal ArticleDOI
TL;DR: This article proposed a method to measure strategic uncertainty by eliciting certainty equivalents analogous to measuring risk attitudes in lotteries, and applied this method by conducting experiments on a class of one-shot coordination games.
Abstract: This paper proposes a method to measure strategic uncertainty by eliciting certainty equivalents analogous to measuring risk attitudes in lotteries. We apply this method by conducting experiments on a class of one-shot coordination games with strategic complementarities and choices between simple lotteries and sure payoff alternatives, both framed in a similar way. Despite the multiplicity of equilibria in the coordination games, aggregate behaviour is fairly predictable. The pure or mixed Nash equilibria cannot describe subjects’ behaviour. We present two global games with private information about monetary payoffs and about risk aversion. While previous literature treats the parameters of a global game as given, we estimate them and show that both models describe observed behaviour well. The global-game selection for vanishing noise of private signals offers a good recommendation for actual players, given the observed distribution of actions. We also deduce subjective beliefs and compare them with objective probabilities.

Journal ArticleDOI
TL;DR: In this article, a model for evolutionary game dynamics in a growing, network-structured population is discussed, where new players can either make connections to random preexisting players or preferentially attach to those that have been successful in the past.
Abstract: We discuss a model for evolutionary game dynamics in a growing, network-structured population. In our model, new players can either make connections to random preexisting players or preferentially attach to those that have been successful in the past. The latter depends on the dynamics of strategies in the game, which we implement following the so-called Fermi rule such that the limits of weak and strong strategy selection can be explored. Our framework allows to address general evolutionary games. With only two parameters describing the preferential attachment and the intensity of selection, we describe a wide range of network structures and evolutionary scenarios. Our results show that even for moderate payoff preferential attachment, over represented hubs arise. Interestingly, we find that while the networks are growing, high levels of cooperation are attained, but the same network structure does not promote cooperation as a static network. Therefore, the mechanism of payoff preferential attachment is different to those usually invoked to explain the promotion of cooperation in static, already-grown networks.

Journal ArticleDOI
26 Aug 2009-PLOS ONE
TL;DR: This novel economic game provides new insight into the psychological mechanisms underlying social preferences for fairness and retribution in a two-player economic game where monetary allocations are made with a “trembling hand”: that is, intentions and outcomes are sometimes mismatched.
Abstract: How do people respond to others' accidental behaviors? Reward and punishment for an accident might depend on the actor's intentions, or instead on the unintended outcomes she brings about. Yet, existing paradigms in experimental economics do not include the possibility of accidental monetary allocations. We explore the balance of outcomes and intentions in a two-player economic game where monetary allocations are made with a “trembling hand”: that is, intentions and outcomes are sometimes mismatched. Player 1 allocates $10 between herself and Player 2 by rolling one of three dice. One die has a high probability of a selfish outcome, another has a high probability of a fair outcome, and the third has a high probability of a generous outcome. Based on Player 1's choice of die, Player 2 can infer her intentions. However, any of the three die can yield any of the three possible outcomes. Player 2 is given the opportunity to respond to Player 1's allocation by adding to or subtracting from Player 1's payoff. We find that Player 2's responses are influenced substantially by the accidental outcome of Player 1's roll of the die. Comparison to control conditions suggests that in contexts where the allocation is at least partially under the control of Player 1, Player 2 will punish Player 1 accountable for unintentional negative outcomes. In addition, Player 2's responses are influenced by Player 1's intention. However, Player 2 tends to modulate his responses substantially more for selfish intentions than for generous intentions. This novel economic game provides new insight into the psychological mechanisms underlying social preferences for fairness and retribution.

Journal ArticleDOI
TL;DR: This work explores stochastic evolutionary dynamics under weak selection, but for any mutation rate, and finds one condition that holds for low mutation rate and another condition that holding for high mutation rate.

Journal ArticleDOI
TL;DR: In this paper, an incomplete information game model is proposed to study the competitive behavior among individual generating companies (GENCOs), in which each GENCO is modeled as an agent and each agent makes strategic generation capacity expansion decisions based on its incomplete information on other GENCOs.
Abstract: To study the competitive behavior among individual generating companies (GENCOs), an incomplete information game model is proposed in this paper in which each GENCO is modeled as an agent. Each agent makes strategic generation capacity expansion decisions based on its incomplete information on other GENCOs. The formation of this game model falls into a bi-level optimization problem. The upper level of this problem is the GENCOs' own decision on optimal planning strategies and energy/reserve bidding strategies. The lower-level problem is the ISO's market clearing problem that minimizes the cost to supply the load, which yields price signals for GENCOs to calculate their own payoffs. A co-evolutionary algorithm combined with pattern search is proposed to optimize the search for the Nash equilibrium of the competition game with incomplete information. The Nash equilibrium is obtained if all GENCOs reach their maximum expected payoff assuming the planning strategies of other GENCOs' remain unchanged. The physical withholding of capacity is considered in the energy market and the Herfindahl-Hirschman index is utilized to measure the market concentration. The competitive behaviors are analyzed in three policy scenarios based on different market rules for reserve procurement and compensation.

Journal ArticleDOI
TL;DR: The simulation results show that by deploying the proposed best-response learning algorithm, the wireless users can significantly improve their own bidding strategies and, hence, their performance in terms of both the application quality and the incurred cost for the used resources.
Abstract: In this paper, we model the various users in a wireless network (eg, cognitive radio network) as a collection of selfish autonomous agents that strategically interact to acquire dynamically available spectrum opportunities Our main focus is on developing solutions for wireless users to successfully compete with each other for the limited and time-varying spectrum opportunities, given experienced dynamics in the wireless network To analyze the interactions among users given the environment disturbance, we propose a stochastic game framework for modeling how the competition among users for spectrum opportunities evolves over time At each stage of the stochastic game, a central spectrum moderator (CSM) auctions the available resources, and the users strategically bid for the required resources The joint bid actions affect the resource allocation and, hence, the rewards and future strategies of all users Based on the observed resource allocations and corresponding rewards, we propose a best-response learning algorithm that can be deployed by wireless users to improve their bidding policy at each stage The simulation results show that by deploying the proposed best-response learning algorithm, the wireless users can significantly improve their own bidding strategies and, hence, their performance in terms of both the application quality and the incurred cost for the used resources

Proceedings ArticleDOI
19 Apr 2009
TL;DR: This paper first derives the structure of optimal 2-hop forwarding policies, then study interactions that may occur in the presence of several competing classes of mobiles and formulate this as a cost-coupled stochastic game.
Abstract: We study in this paper optimal stochastic control issues in delay tolerant networks. We first derive the structure of optimal 2-hop forwarding policies. In order to be implemented, such policies require the knowledge of some system parameters such as the number of mobiles or the rate of contacts between mobiles, but these could be unknown at system design time or may change over time. To address this problem, we design adaptive policies combining estimation and control that achieve optimal performance in spite of the lack of information. We then study interactions that may occur in the presence of several competing classes of mobiles and formulate this as a cost-coupled stochastic game. We show that this game has a unique Nash equilibrium such that each class adopts the optimal forwarding policy determined for the single class problem.

Journal ArticleDOI
TL;DR: A thorough overview about the properties of this evolutionary process in spatial Prisoner's Dilemma and Stag Hunt games where both the strategy distribution and the players' individual noise level could evolve to reach higher individual payoff is given.
Abstract: We studied spatial Prisoner's Dilemma and Stag Hunt games where both the strategy distribution and the players' individual noise level could evolve to reach higher individual payoff. Players are located on the sites of different two-dimensional lattices and gain their payoff from games with their neighbors by choosing unconditional cooperation or defection. The way of strategy adoption can be characterized by a single K (temperaturelike) parameter describing how strongly adoptions depend on the payoff difference. If we start the system from a random strategy distribution with many different player specific K parameters, the simultaneous evolution of strategies and K parameters drives the system to a final stationary state where only one K value remains. In the coexistence phase of cooperator and defector strategies the surviving K parameter is in good agreement with the noise level that ensures the highest cooperation level if uniform K is supposed for all players. In this paper we give a thorough overview about the properties of this evolutionary process.

Journal ArticleDOI
TL;DR: It is shown analytically that these mean exit times depend on the payoff matrix of the game in an amazingly simple way under weak selection, i.e. strong stochasticity.
Abstract: In evolutionary game dynamics, reproductive success increases with the performance in an evolutionary game. If strategy A performs better than strategy B, strategy A will spread in the population. Under stochastic dynamics, a single mutant will sooner or later take over the entire population or go extinct. We analyze the mean exit times (or average fixation times) associated with this process. We show analytically that these times depend on the payoff matrix of the game in an amazingly simple way under weak selection, i.e. strong stochasticity: the payoff difference 1 is a linear function of the number of A individuals i, 1 = u i +v. The unconditional mean exit time depends only on the constant term v. Given that a single A mutant takes over the population, the corresponding conditional mean exit time depends only on the density dependent term u. We demonstrate this finding for two commonly applied microscopic evolutionary processes.

Journal ArticleDOI
TL;DR: An efficient online algorithm is presented that ensures that the agent's average performance loss vanishes over time, provided that the environment is oblivious to the agents' actions.
Abstract: We consider a learning problem where the decision maker interacts with a standard Markov decision process, with the exception that the reward functions vary arbitrarily over time. We show that, against every possible realization of the reward process, the agent can perform as well---in hindsight---as every stationary policy. This generalizes the classical no-regret result for repeated games. Specifically, we present an efficient online algorithm---in the spirit of reinforcement learning---that ensures that the agent's average performance loss vanishes over time, provided that the environment is oblivious to the agent's actions. Moreover, it is possible to modify the basic algorithm to cope with instances where reward observations are limited to the agent's trajectory. We present further modifications that reduce the computational cost by using function approximation and that track the optimal policy through infrequent changes.

Proceedings ArticleDOI
13 May 2009
TL;DR: In this paper, a stochastic game theoretic approach to security and intrusion detection in communication and computer networks is proposed. But the authors focus on the non-cooperative zero-sum or nonzero-sum game.
Abstract: This paper studies a stochastic game theoretic approach to security and intrusion detection in communication and computer networks. Specifically, an Attacker and a Defender take part in a two-player game over a network of nodes whose security assets and vulnerabilities are correlated. Such a network can be modeled using weighted directed graphs with the edges representing the influence among the nodes. The game can be formulated as a non-cooperative zero-sum or nonzero-sum stochastic game. However, due to correlation among the nodes, if some nodes are compromised, the effective security assets and vulnerabilities of the remaining ones will not stay the same in general, which leads to complex system dynamics. We examine existence, uniqueness, and structure of the solution and also provide numerical examples to illustrate our model.

Proceedings ArticleDOI
11 Aug 2009
TL;DR: A family of games on which the discrete strategy improvement algorithm for solving parity games due to Voege and Jurdzinski requires exponentially many strategy iterations is outlined, answering in the negative the long-standing question whether this algorithm runs in polynomial time.
Abstract: This paper presents a new lower bound for the discrete strategy improvement algorithm for solving parity games due to Voege and Jurdzinski. First, we informally show which structures are difficult to solve for the algorithm. Second, we outline a family of games on which the algorithm requires exponentially many strategy iterations, answering in the negative the long-standing question whether this algorithm runs in polynomial time. Additionally we note that the same family of games can be used to prove a similar result w.r.t. the strategy improvement variant by Schewe as well as the strategy iteration for solving discounted payoff games due to Puri.

Journal ArticleDOI
TL;DR: In this paper, the authors consider a game of strategic experimentation with two-armed bandits where the risky arm distributes lump-sum payoffs according to a Poisson process, and construct asymmetric equilibria in which players have symmetric continuation values at sufficiently optimistic beliefs.
Abstract: We study a game of strategic experimentation with two-armed bandits where the risky arm distributes lump-sum payoffs according to a Poisson process. Its intensity is either high or low, and unknown to the players. We consider Markov perfect equilibria with beliefs as the state variable. As the belief process is piecewise deterministic, payoff functions solve differential-difference equations. There is no equilibrium where all players use cut-off strategies, and all equilibria exhibit an 'encouragement effect' relative to the single-agent optimum. We construct asymmetric equilibria in which players have symmetric continuation values at sufficiently optimistic beliefs yet take turns playing the risky arm before all experimentation stops. Owing to the encouragement effect, these equilibria Pareto dominate the unique symmetric one for sufficiently frequent turns. Rewarding the last experimenter with a higher continuation value increases the range of beliefs where players experiment, but may reduce average payoffs at more optimistic beliefs. Some equilibria exhibit an 'anticipation effect': as beliefs become more pessimistic, the continuation value of a single experimenter increases over some range because a lower belief means a shorter wait until another player takes over.

Proceedings ArticleDOI
13 May 2009
TL;DR: In this article, a Markov Decision Evolutionary Game with N players is introduced, in which each individual in a large population interacts with other randomly selected players, and the states and actions of each player in an interaction together determine the instantaneous payoff for all involved players.
Abstract: We introduce Markov Decision Evolutionary Games with N players, in which each individual in a large population interacts with other randomly selected players. The states and actions of each player in an interaction together determine the instantaneous payoff for all involved players. They also determine the transition probabilities to move to the next state. Each individual wishes to maximize the total expected discounted payoff over an infinite horizon. We provide a rigorous derivation of the asymptotic behavior of this system as the size of the population grows to infinity. We show that under any Markov strategy, the random process consisting of one specific player and the remaining population converges weakly to a jump process driven by the solution of a system of differential equations. We characterize the solutions to the team and to the game problems at the limit of infinite population and use these to construct almost optimal strategies for the case of a finite, but large, number of players. We show that the large population asymptotic of the microscopic model is equivalent to a (macroscopic) Markov decision evolutionary game in which a local interaction is described by a single player against a population profile. We illustrate our model to derive the equations for a dynamic evolutionary Hawk and Dove game with energy level.

Journal ArticleDOI
TL;DR: It is demonstrated that the evolution of prisoner's dilemma games can exhibit a repetitive succession of oscillatory and stationary states upon changing a single payoff value, which highlights the remarkable sensitivity of cyclical interactions on the parameters that define the strength of dominance.
Abstract: Evolutionary prisoner's dilemma games are studied with players located on square lattice and random regular graph defining four neighbors for each one. The players follow one of the three strategies: tit-for-tat, unconditional cooperation, and defection. The simplified payoff matrix is characterized by two parameters: the temptation b to choose defection and the cost c of inspection reducing the income of tit-for-tat. The strategy imitation from one of the neighbors is controlled by pairwise comparison at a fixed level of noise. Using Monte Carlo simulations and the extended versions of pair approximation we have evaluated the b-c phase diagrams indicating a rich plethora of phase transitions between stationary coexistence, absorbing, and oscillatory states, including continuous and discontinuous phase transitions. By reasonable costs the tit-for-tat strategy prevents extinction of cooperators across the whole span of b determining the prisoner's dilemma game, irrespective of the connectivity structure. We also demonstrate that the system can exhibit a repetitive succession of oscillatory and stationary states upon changing a single payoff value, which highlights the remarkable sensitivity of cyclical interactions on the parameters that define the strength of dominance.

Journal ArticleDOI
TL;DR: The inhomogeneous activity in strategy transfer yields a relevant increase in the density of cooperators within a range of the portion of influential players, and the noise dependence of this phenomenon is discussed by evaluating phase diagrams.
Abstract: We study a spatial two-strategy (cooperation and defection) prisoner's dilemma game with two types ( A and B ) of players located on the sites of a square lattice. The evolution of strategy distribution is governed by iterated strategy adoption from a randomly selected neighbor with a probability depending on the payoff difference and also on the type of the neighbor. The strategy adoption probability is reduced by a prefactor (w<1) from the players of type B . We consider the competition between two opposite effects when increasing the number of neighbors ( k=4 , 8, and 24). Within a range of the portion of influential players (type A ) the inhomogeneous activity in strategy transfer yields a relevant increase (dependent on w ) in the density of cooperators. The noise dependence of this phenomenon is also discussed by evaluating phase diagrams.

Journal ArticleDOI
TL;DR: This work shows that the multi-level Monte Carlo method can be rigorously justified for non-globally Lipschitz payoffs, and considers digital, lookback and barrier options that requires non-standard strong convergence analysis of the Euler–Maruyama method.
Abstract: Giles (Oper. Res. 56:607-617, 2008) introduced a multi-level Monte Carlo method for approximating the expected value of a function of a stochastic differential equation solution. A key application is to compute the expected payoff of a finan- cial option. This new method improves on the computational complexity of standard Monte Carlo. Giles analysed globally Lipschitz payoffs, but also found good per- formance in practice for non-globally Lipschitz cases. In this work, we show that the multi-level Monte Carlo method can be rigorously justified for non-globally Lip- schitz payoffs. In particular, we consider digital, lookback and barrier options. This requires non-standard strong convergence analysis of the Euler-Maruyama method.

Proceedings ArticleDOI
31 May 2009
TL;DR: Recently, Chen et al. as mentioned in this paper showed that there is an oblivious PTAS for anonymous games with two strategies and any bounded number of player types with running time in polynomial time for some α ≥ 1/3.
Abstract: If a class of games is known to have a Nash equilibrium with probability values that are either zero or Ω(1) -- and thus with support of bounded size -- then obviously this equilibrium can be found exhaustively in polynomial time. Somewhat surprisingly, we show that there is a PTAS for the class of games whose equilibria are guaranteed to have small --- O(1/n) -- values, and therefore large -- Ω(n) -- supports. We also point out that there is a PTAS for games with sparse payoff matrices, a family for which the exact problem is known to be PPAD-complete [Chen, Deng, Teng 2006]. Both algorithms are of a special kind that we call oblivious: The algorithm just samples a fixed distribution on pairs of mixed strategies, and the game is only used to determine whether the sampled strategies comprise an e-Nash equilibrium; the answer is "yes" with inverse polynomial probability (in the second case, the algorithm is actually deterministic). These results bring about the question: Is there an oblivious PTAS for finding a Nash equilibrium in general games? We answer this question in the negative; our lower bound comes close to the quasi-polynomial upper bound of [Lipton, Markakis, Mehta 2003]. Another recent PTAS for anonymous games [Daskalakis, Papadimitriou 2007 and 2008, Daskalakis 2008] is also oblivious in a weaker sense appropriate for this class of games (it samples from a fixed distribution on unordered collections of mixed strategies), but its running time is exponential in 1/e. We prove that any oblivious PTAS for anonymous games with two strategies and three player types must have 1/eα in the exponent of the running time for some α ≥ 1/3, rendering the algorithm in [Daskalakis 2008] (which works with any bounded number of player types) essentially optimal within oblivious algorithms. In contrast, we devise a poly n • (1/e)O(\log2(1/e)) non-oblivious PTAS for anonymous games with two strategies and any bounded number of player types. The key idea of our algorithm is to search not over unordered sets of mixed strategies, but over a carefully crafted set of collections of the first O(log 1/e) moments of the distribution of the number of players playing strategy 1 at equilibrium. The algorithm works because of a probabilistic result of more general interest that we prove: the total variation distance between two sums of independent indicator random variables decreases exponentially with the number of moments of the two sums that are equal, independent of the number of indicators.

Proceedings ArticleDOI
14 Jun 2009
TL;DR: This paper studies two-player security games which can be viewed as sequences of nonzero-sum matrix games played by an Attacker and a Defender and discusses both the classical FP and the stochastic FP, where for the latter the payoff function of each player includes an entropy term to randomize its own strategy, which could be interpreted as a way of concealing its true strategy.
Abstract: We study two-player security games which can be viewed as sequences of nonzero-sum matrix games played by an Attacker and a Defender. At each stage of the game iterations, the players make imperfect observations of each other's previous actions. The underlying decision process can be viewed as a fictitious play (FP) game, but what differentiates this class from the standard one is that the communication channels that carry action information from one player to the other, or the sensor systems, are error prone. Two possible scenarios are addressed in the paper: (i) if the error probabilities associated with the sensor systems are known to the players, then our analysis provides guidelines for each player to reach a Nash equilibrium (NE), which is related to the NE of the underlying static game; (ii) if the error probabilities are not known to the players, then we study the effect of observation errors on the convergence to the NE and the final outcome of the game. We discuss both the classical FP and the stochastic FP, where for the latter the payoff function of each player includes an entropy term to randomize its own strategy, which can be interpreted as a way of concealing its true strategy.

Journal ArticleDOI
01 Jul 2009-EPL
TL;DR: In this article, the authors study co-evolutionary Prisoner's Dilemma games where each player can imitate both the strategy and imitation rule from a randomly chosen neighbor with a probability dependent on the payoff difference when the player's income is collected from games with the neighbors.
Abstract: We study co-evolutionary Prisoner's Dilemma games where each player can imitate both the strategy and imitation rule from a randomly chosen neighbor with a probability dependent on the payoff difference when the player's income is collected from games with the neighbors. The players, located on the sites of a two-dimensional lattice, follow unconditional cooperation or defection and use individual strategy adoption rule described by a parameter. If the system is started from a random initial state then the present co-evolutionary rule drives the system towards a state where only one evolutionary rule remains alive even in the coexistence of cooperative and defective behaviors. The final rule is related to the optimum providing the highest level of cooperation and affected by the topology of the connectivity structure.