scispace - formally typeset
Search or ask a question

Showing papers on "Stochastic game published in 2011"


01 Jan 2011
TL;DR: In this paper, the existence of a pure-strategy perfect equilibrium in normal-form games was shown to be impossible, in the sense that compactness, continuity, and quasiconcavity of a game are not strong enough to warrant it.
Abstract: Article history: We provide sufficient conditions for a (possibly) discontinuous normal-form game to pos- sess a pure-strategy trembling-hand perfect equilibrium. We first show that compactness, continuity, and quasiconcavity of a game are too weak to warrant the existence of a pure- strategy perfect equilibrium. We then identify two classes of games for which the existence of a pure-strategy perfect equilibrium can be established: (1) the class of compact, metric, concave games satisfying upper semicontinuity of the sum of payoffs and a strengthening of payoff security; and (2) the class of compact, metric games satisfying upper semicon- tinuity of the sum of payoffs, strengthenings of payoff security and quasiconcavity, and a notion of local concavity and boundedness of payoff differences on certain subdomains of a player's payoff function. Various economic games illustrate our results.

340 citations


Journal ArticleDOI
TL;DR: The efficiency of institutionalized punishment is studied by evaluating the stationary states in the spatial public goods game comprising unconditional defectors, cooperators, and cooperating pool punishers as the three competing strategies.
Abstract: The efficiency of institutionalized punishment is studied by evaluating the stationary states in the spatial public goods game comprising unconditional defectors, cooperators, and cooperating pool punishers as the three competing strategies. Fines and costs of pool punishment are considered as the two main parameters determining the stationary distributions of strategies on the square lattice. Each player collects a payoff from five five-person public goods games, and the evolution of strategies is subsequently governed by imitation based on pairwise comparisons at a low level of noise. The impact of pool punishment on the evolution of cooperation in structured populations is significantly different from that reported previously for peer punishment. Representative phase diagrams reveal remarkably rich behavior, depending also on the value of the synergy factor that characterizes the efficiency of investments payed into the common pool. Besides traditional single- and two-strategy stationary states, a rock-paper-scissors type of cyclic dominance can emerge in strikingly different ways.

317 citations


Journal ArticleDOI
TL;DR: The proposed stationary policy in the anti-jamming game is shown to achieve much better performance than the policy obtained from myopic learning, which only maximizes each stage's payoff, and a random defense strategy, since it successfully accommodates the environment dynamics and the strategic behavior of the cognitive attackers.
Abstract: Various spectrum management schemes have been proposed in recent years to improve the spectrum utilization in cognitive radio networks. However, few of them have considered the existence of cognitive attackers who can adapt their attacking strategy to the time-varying spectrum environment and the secondary users' strategy. In this paper, we investigate the security mechanism when secondary users are facing the jamming attack, and propose a stochastic game framework for anti-jamming defense. At each stage of the game, secondary users observe the spectrum availability, the channel quality, and the attackers' strategy from the status of jammed channels. According to this observation, they will decide how many channels they should reserve for transmitting control and data messages and how to switch between the different channels. Using the minimax-Q learning, secondary users can gradually learn the optimal policy, which maximizes the expected sum of discounted payoffs defined as the spectrum-efficient throughput. The proposed stationary policy in the anti-jamming game is shown to achieve much better performance than the policy obtained from myopic learning, which only maximizes each stage's payoff, and a random defense strategy, since it successfully accommodates the environment dynamics and the strategic behavior of the cognitive attackers.

310 citations


Journal ArticleDOI
TL;DR: These results demonstrate that economic game experiments run on MTurk are comparable to those run in laboratory settings, even when using very low stakes.
Abstract: Online labor markets such as Amazon Mechanical Turk (MTurk) off er an unprecedented opportunity to run economic game experiments quickly and inexpensively. Using Mturk, we recruited 756 subjects and examined their behavior in four canonical economic games, with two payoff conditions each: a stakes condition, in which subjects' earnings were based on the outcome of the game (maximum earnings of $1); and a no-stakes condition, in which subjects' earnings are una ffected by the outcome of the game. Our results demonstrate that economic game experiments run on MTurk are comparable to those run in laboratory settings, even when using very low stakes.

284 citations


Journal ArticleDOI
TL;DR: This article investigates the Game-theoretic Rough Set model and its capability of analyzing a major decision problem evident in existing probabilistic rough set models and formulate a learning method using the GTRS model that repeatedly analyzes payoff tables created from approximation measures and modified conditional risk strategies to calculate parameter values.
Abstract: This article investigates the Game-theoretic Rough Set (GTRS) model and its capability of analyzing a major decision problem evident in existing probabilistic rough set models. A major challenge in the application of probabilistic rough set models is their inability to formulate a method of decreasing the size of the boundary region through further explorations of the data. To decrease the size of this region, objects must be moved to either the positive or negative regions. Game theory allows a solution to this decision problem by having the regions compete or cooperate with each other in order to find which is best fit to be selected for the move. There are two approaches discussed in this article. First, the region parameters that define the minimum conditional probabilities for region inclusion can either compete or cooperate in order to increase their size. The second approach is formulated by having classification approximation measures compete against each other. We formulate a learning method using the GTRS model that repeatedly analyzes payoff tables created from approximation measures and modified conditional risk strategies to calculate parameter values.

205 citations


Journal ArticleDOI
05 Apr 2011
TL;DR: A class of convex Nash games where strategy sets are coupled across agents through a common constraint and payoff functions are linked via a scaled congestion cost metric is considered, showing that the equilibrium is locally unique both in the primal space as well as in the larger primal-dual space.
Abstract: We consider a class of convex Nash games where strategy sets are coupled across agents through a common constraint and payoff functions are linked via a scaled congestion cost metric. A solution to a related variational inequality problem provides a set of Nash equilibria characterized by common Lagrange multipliers for shared constraints. While this variational problem may be characterized by a non-monotone map, it is shown to admit solutions, even in the absence of restrictive compactness assumptions on strategy sets. Additionally, we show that the equilibrium is locally unique both in the primal space as well as in the larger primal-dual space. The existence statements can be generalized to accommodate a piecewise-smooth congestion metric while affine restrictions, surprisingly, lead to both existence and global uniqueness guarantees. In the second part of the technical note, we discuss distributed computation of such equilibria in monotone regimes via a distributed iterative Tikhonov regularization (ITR) scheme. Application to a class of networked rate allocation games suggests that the ITR schemes perform better than their two-timescale counterparts.

179 citations


Journal ArticleDOI
TL;DR: A decision theoretic framework in which agents are learning about market behavior and that provides microfoundations for models of adaptive learning is presented and it is shown that the equilibrium stock price is then determined by investors' expectations of the price and dividend in the next period, rather than by Expectations of the discounted sum of dividends.

152 citations


Journal ArticleDOI
TL;DR: In this paper, the authors study an infinite horizon game in which pairs of players con-ected in a network are randomly matched to bargain and show that all equilibria are payoff equivalent.
Abstract: We study an infinite horizon game in which pairs of players con nected in a network are randomly matched to bargain. Players who reach agreement are replaced by new players at the same positions in the network. We show that all equilibria are payoff equivalent. The payoffs and the set of agreement links converge as players become patient. Several new concepts—mutually estranged sets, partners, and shortage ratios—provide insights into the relative strengths of the positions in the network. We develop a procedure to deter mine the limit equilibrium payoffs for all players. Characterizations of equitable and nondiscriminatory networks are also obtained. (.JEL C78, D85) Competitive equilibrium theory assumes large and anonymous markets in which every buyer can trade with every seller. Underlying these assumptions are standard goods and services that may be traded at low transaction costs by agents who are not in specific relationships with one another. However, in many markets goods and services are heterogeneous (e.g., cars, apartments) or need to be tailored to particular needs (e.g., manufacturing inputs, technical support). Furthermore, trad ing opportunities may depend on transportation costs, social relationships, infor mation, advertising, trust, technological compatibility, joint business opportunities, free trade agreements, etc. In such cases it is natural to model the market using a network, where only pairs of connected agents may engage in exchange. New theories are needed to explore the influence of the network structure on market outcomes. Many questions arise: How does an agent's position in the net work determine his bargaining power and the local prices he faces? Who trades with whom and on what terms? When are prices uniform in the network? One possible conjecture is that an agent's bargaining power is determined by his (relative) number of connections in the network. However, this simple theory is implausible. Consider the network of four sellers (located at the top nodes) and nine buyers (located at the bottom nodes) depicted in Figure 1. Assume that each seller supplies one unit of a homogeneous indivisible good, each buyer demands one unit of the good, and all buyers have identical values for the good. The buyer located in the middle has the largest number of links in the network, as he is connected to each

134 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used field data from the Swedish lowest unique positive integer (LUPI) game, where players pick positive integers and whoever chose the lowest unique number of players wins a fixed prize.
Abstract: Game theory is usually difficult to test precisely in the field because predictions typically depend sensitively on features that are not controlled or observed. We conduct one such test using field data from the Swedish lowest unique positive integer (LUPI) game. In the LUPI game, players pick positive integers and whoever chose the lowest unique number wins a fixed prize. Theoretical equilibrium predictions are derived assuming Poisson- distributed uncertainty about the number of players, and tested using both field and laboratory data. The field and lab data show similar patterns. Despite various deviations from equilibrium, there is a surprising degree of convergence toward equilibrium. Some of the deviations from equilibrium can be rationalized by a cognitive hierarchy model.

116 citations


Journal ArticleDOI
07 Jul 2011-PLOS ONE
TL;DR: An alternative way of understanding the evolution of cooperative behavior and its ubiquitous presence in nature is presented, and it is suggested that swarming could be an important phenomenon by means of which cooperation can be sustained even under highly unfavorable conditions.
Abstract: We study the evolution of cooperation among selfish individuals in the stochastic strategy spatial prisoner's dilemma game. We equip players with the particle swarm optimization technique, and find that it may lead to highly cooperative states even if the temptations to defect are strong. The concept of particle swarm optimization was originally introduced within a simple model of social dynamics that can describe the formation of a swarm, i.e., analogous to a swarm of bees searching for a food source. Essentially, particle swarm optimization foresees changes in the velocity profile of each player, such that the best locations are targeted and eventually occupied. In our case, each player keeps track of the highest payoff attained within a local topological neighborhood and its individual highest payoff. Thus, players make use of their own memory that keeps score of the most profitable strategy in previous actions, as well as use of the knowledge gained by the swarm as a whole, to find the best available strategy for themselves and the society. Following extensive simulations of this setup, we find a significant increase in the level of cooperation for a wide range of parameters, and also a full resolution of the prisoner's dilemma. We also demonstrate extreme efficiency of the optimization algorithm when dealing with environments that strongly favor the proliferation of defection, which in turn suggests that swarming could be an important phenomenon by means of which cooperation can be sustained even under highly unfavorable conditions. We thus present an alternative way of understanding the evolution of cooperative behavior and its ubiquitous presence in nature, and we hope that this study will be inspirational for future efforts aimed in this direction.

113 citations


Proceedings ArticleDOI
16 Jul 2011
TL;DR: New efficient algorithms for computing optimal strategic solutions using Prospect Theory and Quantal Response Equilibrium are provided and the most comprehensive experiment to date studying the effectiveness of different models against human subjects for security games is studied.
Abstract: Recent real-world deployments of Stackelberg security games make it critical that we address human adversaries' bounded rationality in computing optimal strategies. To that end, this paper provides three key contributions: (i) new efficient algorithms for computing optimal strategic solutions using Prospect Theory and Quantal Response Equilibrium; (ii) the most comprehensive experiment to date studying the effectiveness of different models against human subjects for security games; and (iii) new techniques for generating representative payoff structures for behavioral experiments in generic classes of games. Our results with human subjects show that our new techniques outperform the leading contender for modeling human behavior in security games.

Journal ArticleDOI
TL;DR: This work derives a general result that holds for any number of strategies, for a large class of population structures under weak selection, and shows that for certain parameter values both repetition and space are needed to promote evolution of cooperation.
Abstract: Many specific models have been proposed to study evolutionary game dynamics in structured populations, but most analytical results so far describe the competition of only two strategies. Here we derive a general result that holds for any number of strategies, for a large class of population structures under weak selection. We show that for the purpose of strategy selection any evolutionary process can be characterized by two key parameters that are coefficients in a linear inequality containing the payoff values. These structural coefficients, σ1 and σ2, depend on the particular process that is being studied, but not on the number of strategies, n, or the payoff matrix. For calculating these structural coefficients one has to investigate games with three strategies, but more are not needed. Therefore, n = 3 is the general case. Our main result has a geometric interpretation: Strategy selection is determined by the sum of two terms, the first one describing competition on the edges of the simplex and the second one in the center. Our formula includes all known weak selection criteria of evolutionary games as special cases. As a specific example we calculate games on sets and explore the synergistic interaction between direct reciprocity and spatial selection. We show that for certain parameter values both repetition and space are needed to promote evolution of cooperation.

Journal ArticleDOI
TL;DR: In the centipede game as discussed by the authors, the game is a two-player, finite-move game in which the subjects alternate choosing whether to end the game or to pass to the other player.
Abstract: It is difficult to overstate the profound impact that game theory has had on the economic approach and on the sciences more generally. For that reason, understanding how closely the assumptions that underpin game theoretic analysis conform to actual human decision making is a question of first-order importance to economists. In this spirit, backward induction represents one of the most basic concepts in game theory. Backward induction played a prominent role in Reinhard Selten’s (1965) development of perfect equilibrium, and it has helped to shape the modern refinement literature. Although backward induction is a cornerstone of game theory, existing empirical evidence suggests that economic agents engage in backward induction less frequently than theorists might hope. Backward induction has fared especially poorly in the centipede game, which was introduced by Robert W. Rosenthal (1981) and has since been extensively analyzed (Ken Binmore 1987; Robert J. Aumann 1988; Philip J. Reny 1988; David M. Kreps 1990; Geir B. Asheim and Martin Dufwenberg 2003). The original centipede game is a two-player, finite-move game in which the subjects alternate choosing whether to end the game or to pass to the other player. The subject’s payoff to ending the game at a particular node is greater than the payoff he receives if the other player ends the game at the next node, but less than the payoff earned if the other player elects not to end the game. The player making the final choice gets paid more from stopping than from passing, and thus would be expected to stop. If the opponent will stop at the last node, then, conditional on reaching the penultimate node, the player maximizes his earnings by stopping at that node. Following this logic further, backward induction leads to the unique subgame perfect equilibrium: the game is stopped at the first node. As pointed out in prior research

Posted Content
TL;DR: It is shown that the large population asymptotic of the microscopic model is equivalent to a (macroscopic) Markov decision evolutionary game in which a local interaction is described by a single player against a population profile.
Abstract: We introduce Mean Field Markov games with $N$ players, in which each individual in a large population interacts with other randomly selected players. The states and actions of each player in an interaction together determine the instantaneous payoff for all involved players. They also determine the transition probabilities to move to the next state. Each individual wishes to maximize the total expected discounted payoff over an infinite horizon. We provide a rigorous derivation of the asymptotic behavior of this system as the size of the population grows to infinity. Under indistinguishability per type assumption, we show that under any Markov strategy, the random process consisting of one specific player and the remaining population converges weakly to a jump process driven by the solution of a system of differential equations. We characterize the solutions to the team and to the game problems at the limit of infinite population and use these to construct near optimal strategies for the case of a finite, but large, number of players. We show that the large population asymptotic of the microscopic model is equivalent to a (macroscopic) mean field stochastic game in which a local interaction is described by a single player against a population profile (the mean field limit). We illustrate our model to derive the equations for a dynamic evolutionary Hawk and Dove game with energy level.

Journal ArticleDOI
TL;DR: The price of anarchy (POA) in the strategic-form game is characterized under an “Effective-investment” model and a “Bad-traffic’ model, and insight is given on how the POA depends on individual players' cost functions and their mutual influence.
Abstract: We study a network security game where strategic players choose their investments in security. Since a player's investment can reduce the propagation of computer viruses, a key feature of the game is the positive externality exerted by the investment. With selfish players, unfortunately, the overall network security can be far from optimum. The contributions of this paper are as follows. 1) We first characterize the price of anarchy (POA) in the strategic-form game under an “Effective-investment” model and a “Bad-traffic” model, and give insight on how the POA depends on individual players' cost functions and their mutual influence. We also introduce the concept of “weighted POA” to bound the region of payoff vectors. 2) In a repeated game, players have more incentive to cooperate for their long term interests. We consider the socially best outcome that can be supported by the repeated game, as compared to the social optimum. 3) Next, we compare the benefits of improving security technology and improving incentives, and show that improving technology alone may not offset the price of anarchy. 4) Finally, we characterize the performance of correlated equilibrium (CE). Although the paper focuses on network security, many results are generally applicable to games with positive externalities .

Proceedings ArticleDOI
23 Jan 2011
TL;DR: This work proves a generalization of von Neumann's minmax theorem to the class of separable multiplayer zero-sum games, and shows that finding a pure Nash equilibrium in coordination-only polymatrix games is PLS-complete; hence, computing a mixed Nash equilibrium is in PLS ∩ PPAD.
Abstract: We prove a generalization of von Neumann's minmax theorem to the class of separable multiplayer zero-sum games, introduced in [Bregman and Fokin 1998]. These games are polymatrix---that is, graphical games in which every edge is a two-player game between its endpoints---in which every outcome has zero total sum of players' payoffs. Our generalization of the minmax theorem implies convexity of equilibria, polynomial-time tractability, and convergence of no-regret learning algorithms to Nash equilibria. Given that Nash equilibria in 3-player zero-sum games are already PPAD-complete, this class of games, i.e. with pairwise separable utility functions, defines essentially the broadest class of multi-player constant-sum games to which we can hope to push tractability results. Our result is obtained by establishing a certain game-class collapse, showing that separable constant-sum games are payoff equivalent to pairwise constant-sum polymatrix games---polymatrix games in which all edges are constant-sum games, and invoking a recent result of [Daskalakis, Papadimitriou 2009] for these games.We also explore generalizations to classes of non-constant-sum multi-player games. A natural candidate is polymatrix games with strictly competitive games on their edges. In the two player setting, such games are minmax solvable and recent work has shown that they are merely affine transformations of zero-sum games [Adler, Daskalakis, Papadimitriou 2009]. Surprisingly we show that a polymatrix game comprising of strictly competitive games on its edges is PPAD-complete to solve, proving a striking difference in the complexity of networks of zero-sum and strictly competitive games. Finally, we look at the role of coordination in networked interactions, studying the complexity of polymatrix games with a mixture of coordination and zero-sum games. We show that finding a pure Nash equilibrium in coordination-only polymatrix games is PLS-complete; hence, computing a mixed Nash equilibrium is in PLS ∩ PPAD, but it remains open whether the problem is in P. If, on the other hand, coordination and zero-sum games are combined, we show that the problem becomes PPAD-complete, establishing that coordination and zero-sum games achieve the full generality of PPAD.

Proceedings ArticleDOI
02 May 2011
TL;DR: It is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified, and it is demonstrated empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.
Abstract: Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-agent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.

Journal ArticleDOI
TL;DR: The authors explored the extent to which altruism, as measured by giving in a dictator game (DG), accounts for play in a noisy version of the repeated prisoner's dilemma and found that DG giving is correlated with cooperation in the repeated game when no cooperative equilibria exist, but not when cooperation is an equilibrium.
Abstract: We explore the extent to which altruism, as measured by giving in a dictator game (DG), accounts for play in a noisy version of the repeated prisoner’s dilemma. We find that DG giving is correlated with cooperation in the repeated game when no cooperative equilibria exist, but not when cooperation is an equilibrium. Furthermore, none of the commonly observed strategies are better explained by inequity aversion or efficiency concerns than money maximization. Various survey questions provide additional evidence for the relative unimportance of social preferences. We conclude that cooperation in repeated games is primarily motivated by long-term payoff maximization and that even though some subjects may have other goals, this does not seem to be the key determinant of how play varies with the parameters of the repeated game. In particular, altruism does not seem to be a major source of the observed diversity of play.

Journal ArticleDOI
TL;DR: In this article, a new model aimed at predicting behavior in games involving a randomized allocation procedure is presented, which is designed to capture the relative importance and interaction between procedural justice and distributive justice, defined crudely in terms of the difference between one's expected payoff and average expected payoff in the group.
Abstract: This article presents a new model aimed at predicting behavior in games involving a randomized allocation procedure. It is designed to capture the relative importance and interaction between procedural justice (defined crudely in terms of the difference between one’s expected payoff and average expected payoff in the group) and distributive justice (difference between own and average actual payoffs). The model is applied to experimental games, including “randomized” variations of simple sequential bargaining games, and delivers qualitatively correct predictions. In view of the model redistribution of income can be seen as a substitute for vertical social mobility. This contributes to the explanation of greater demand for redistribution in European countries vis-a-vis the United States. I conclude with suggestions for further verification of the model and possible extensions.

Proceedings ArticleDOI
02 May 2011
TL;DR: This work introduces a general model of infinite Bayesian Stackelberg security games that allows payoffs to be represented using continuous payoff distributions, and develops several techniques for finding approximate solutions.
Abstract: Game theory is fast becoming a vital tool for reasoning about complex real-world security problems, including critical infrastructure protection. The game models for these applications are constructed using expert analysis and historical data to estimate the values of key parameters, including the preferences and capabilities of terrorists. In many cases, it would be natural to represent uncertainty over these parameters using continuous distributions (such as uniform intervals or Gaussians). However, existing solution algorithms are limited to considering a small, finite number of possible attacker types with different payoffs. We introduce a general model of infinite Bayesian Stackelberg security games that allows payoffs to be represented using continuous payoff distributions. We then develop several techniques for finding approximate solutions for this class of games, and show empirically that our methods offer dramatic improvements over the current state of the art, providing new ways to improve the robustness of security game models.

Journal ArticleDOI
TL;DR: It is shown that reputation effects do not last forever in such games if buyers can observe all past signals, and a finite rating system is constructed that increases payoffs of almost all buyers, while decreasing the seller[modifier letter apostrophe]s payoff.

Journal ArticleDOI
TL;DR: This work introduces a simple switch that allows a player to either keep its original payoff or use the average payoff of all its neighbors, and shows that, in general, taking into account the environment promotes cooperation.

Journal ArticleDOI
TL;DR: Compared to traditional models that only allow for individual target hardening, the results show that the proposed game-theoretic model could significantly increase the defender's payoff, especially when the unit cost of defense is high.
Abstract: We propose a novel class of game-theoretic models for the optimal assignment of defensive resources in a game between a defender and an attacker. Compared to the other game-theoretic models in the literature of defense allocation problems, the novelty of our model is that we allow the defender to assign her continuous-level defensive resources to any subset (or arbitrary layers) of targets due to functional similarity or geographical proximity. We develop methods to solve for equilibrium, and illustrate our model using numerical examples. Compared to traditional models that only allow for individual target hardening, our results show that our model could significantly increase the defender's payoff, especially when the unit cost of defense is high.

Proceedings Article
Aleksandrs Slivkins1
21 Dec 2011
TL;DR: This work considers similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem, and presents algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance.
Abstract: In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now well-understood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of contextual bandits, a natural extension of the basic MAB problem where before each round an algorithm is given the context--a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on web pages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a similarity distance between the context-arm pairs which bounds from above the difference between the respective expected payoffs. Prior work on contextual bandits with similarity uses "uniform" partitions of the similarity space, so that each context-arm pair is approximated by the closest pair in the partition. Algorithms based on "uniform" partitions disregard the structure of the payoffs and the context arrivals, which is potentially wasteful. We present algorithms that are based on adaptive partitions, and take advantage of "benign" payoffs and context arrivals without sacrificing the worst-case performance. The central idea is to maintain a finer partition in high-payoff regions of the similarity space and in popular regions of the context space. Our results apply to several other settings, e.g., MAB with constrained temporal change (Slivkins and Upfal, 2008) and sleeping bandits (Kleinberg et al., 2008a).

Journal ArticleDOI
Deng-Feng Li1
TL;DR: It is proven that two players have the identical interval-type value of the interval-valued matrix game and that the linear programming models and method proposed in this paper extend those of the classical matrix games.
Abstract: Matrix game theory is concerned with how two players make decisions when they are faced with known exact payoffs. The aim of this paper is to develop a simple and an effective linear programming method for solving matrix games in which the payoffs are expressed with intervals. Because the payoffs of the matrix game are intervals, the value of the matrix game is an interval as well. Based on the definition of the value for matrix games, the value of the matrix game may be regarded as a function of values in the payoff intervals, which is proven to be non-decreasing. A pair of auxiliary linear programming models is formulated to obtain the upper bound and the lower bound of the value of the interval-valued matrix game by using the upper bounds and the lower bounds of the payoff intervals, respectively. By the duality theorem of linear programming, it is proven that two players have the identical interval-type value of the interval-valued matrix game. Also it is proven that the linear programming models and method proposed in this paper extend those of the classical matrix games. The linear programming method proposed in this paper is demonstrated with a real investment decision example and compared with other similar methods to show the validity, applicability and superiority.

Journal ArticleDOI
TL;DR: In this paper, the authors present an algorithm to compute the set of perfect public equilibrium payoffs as the discount factor tends to 1 for stochastic games with observable states and public (but not necessarily perfect) monitoring when the limiting set of (long run players') equilibrium payoff is independent of the initial state.
Abstract: We present an algorithm to compute the set of perfect public equilibrium payoffs as the discount factor tends to 1 for stochastic games with observable states and public (but not necessarily perfect) monitoring when the limiting set of (long-run players') equilibrium payoffs is independent of the initial state. This is the case, for instance, if the Markov chain induced by any Markov strategy profile is irreducible. We then provide conditions under which a folk theorem obtains: if in each state the joint distribution over the public signal and next period's state satisfies some rank condition, every feasible payoff vector above the minmax payoff is sustained by a perfect public equilibrium with low discounting.

Journal ArticleDOI
TL;DR: A distributed algorithm for computing an NE of a special multi-leader-follower game wherein the nonconvexity is due to the followers' equilibrium conditions in the leaders' optimization problems is presented and a matrix-theoretic condition for the convergence of the algorithm is provided.
Abstract: This paper develops an optimization-based theory for the existence and uniqueness of equilibria of a noncooperative game wherein the selfish players' optimization problems are nonconvex and there are side constraints and an associated price clearance to be satisfied by the equilibria. A new concept of equilibrium for such a nonconvex game, which we term a “quasi-Nash equilibrium” (QNE), is introduced as a solution of the variational inequality (VI) obtained by aggregating the first-order optimality conditions of the players' problems while retaining the convex constraints (if any) in the defining set of the VI. Under a second-order sufficiency condition from nonlinear programming, a QNE becomes a local Nash equilibrium of the game. Uniqueness of a QNE is established using a degree-theoretic proof. Under a key boundedness property of the Karush-Kuhn-Tucker multipliers of the nonconvex constraints and the positive definiteness of the Hessians of the players' Lagrangian functions, we establish the single-valuedness of the players' best-response maps, from which the existence of a Nash equilibrium (NE) of the nonconvex game follows. We also present a distributed algorithm for computing an NE of such a game and provide a matrix-theoretic condition for the convergence of the algorithm. An application is presented that pertains to a special multi-leader-follower game wherein the nonconvexity is due to the followers' equilibrium conditions in the leaders' optimization problems. Another application to a cognitive radio paradigm in a signal processing game is described in detail in [G. Scutari and J.S. Pang, IEEE Trans. Inform. Theory, submitted; J.S. Pang and G. Scutari, Joint IEEE Trans. Signal Process, submitted].

Journal ArticleDOI
TL;DR: It is found that considering the environment, i.e., integrating neighborhoods in the evaluation of fitness, promotes cooperation, and the definition of individual fitness is reconsidered which integrates the environment with the traditional individual payoffs.
Abstract: A fundamental question of human society is the evolution of cooperation. Many previous studies explored this question via setting spatial background, where players obtain their payoffs by playing game with their nearest neighbors. Another undoubted fact is that the environment plays an important role in the individual development. Inspired by these phenomena, we reconsider the definition of individual fitness which integrates the environment, denoted by the average payoff of all individual neighbors, with the traditional individual payoffs by introducing a selection parameter u . Tuning u equal to zero returns the traditional version, while increasing u bears the influence of environment. We find that considering the environment, i.e., integrating neighborhoods in the evaluation of fitness, promotes cooperation. If we enhance the value of u , the invasion of defection could be resisted better. We also provide quantitative explanations and complete phase diagrams presenting the influence of the environment on the evolution of cooperation. Finally, the universality of this mechanism is testified for different neighborhood sizes, different topology structures and different game models. Our work may shed light on the emergence and persistence of cooperation in our life.

Journal ArticleDOI
TL;DR: This paper introduces stochastic games with imperfect public signals, and provides a sufficient condition for the folk theorem when the game is irreducible, thus generalizing the folk theorems of Dutta (1995) and Fudenberg, Levine, and Maskin (1994).

Proceedings ArticleDOI
02 May 2011
TL;DR: The simulation results show that the evolution of cooperation in networked systems is quite nuanced and depends on the combination of network type, update rules and the initial fraction of cooperating agents.
Abstract: We study the phenomenon of evolution of cooperation in a society of self-interested agents using repeated games in graphs. A repeated game in a graph is a multiple round game, where, in each round, an agent gains payoff by playing a game with its neighbors and updates its action (state) by using the actions and/or payoffs of its neighbors. The interaction model between the agents is a two-player, two-action (cooperate and defect) Prisoner's Dilemma (PD) game (a prototypical model for interaction between self-interested agents). The conventional wisdom is that the presence of network structure enhances cooperation and current models use multiagent simulation to show evolution of cooperation. However, these results are based on particular combination of interaction game, network model and state update rules (e.g., PD game on a grid with imitate your best neighbor rule leads to evolution of cooperation). The state-of-theart lacks a comprehensive picture of the dependence of the emergence of cooperation on model parameters like network topology, interaction game, state update rules and initial fraction of cooperators. We perform a thorough study of the phenomenon of evolution of cooperation using (a) a set of popular categories of networks, namely, grid, random networks, scale-free networks, and small-world networks and (b) a set of cognitively motivated update rules. Our simulation results show that the evolution of cooperation in networked systems is quite nuanced and depends on the combination of network type, update rules and the initial fraction of cooperating agents. We also provide an analysis to support our simulation results.