scispace - formally typeset
Search or ask a question

Showing papers on "Stochastic game published in 2015"


Proceedings Article
21 Feb 2015
TL;DR: In this article, the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits was studied and a UCB-like algorithm for solving the problem was presented.
Abstract: A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(KL(1/)logn) and O( p KLnlogn) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and is the gap between the expected returns of the optimal and best suboptimal solutions. The gapdependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.

134 citations


Journal ArticleDOI
TL;DR: Within the class of memory-one strategies for the iterated Prisoner's Dilemma, partner strategies, competitive strategies and zero-determinant strategies are characterized, where a player using a partner strategy never obtains less than the co-player.

129 citations


Journal ArticleDOI
TL;DR: This contribution breaks with the tradition to restrict stochastic evolutionary game dynamics to populations of constant size and introduces a theoretical framework to investigate relevant and natural changes arising in populations that vary in size according to fitness—a feature common to many real biological systems.
Abstract: Frequency-dependent selection and demographic fluctuations play important roles in evolutionary and ecological processes. Under frequency-dependent selection, the average fitness of the population may increase or decrease based on interactions between individuals within the population. This should be reflected in fluctuations of the population size even in constant environments. Here, we propose a stochastic model that naturally combines these two evolutionary ingredients by assuming frequency-dependent competition between different types in an individual-based model. In contrast to previous game theoretic models, the carrying capacity of the population, and thus the population size, is determined by pairwise competition of individuals mediated by evolutionary games and demographic stochasticity. In the limit of infinite population size, the averaged stochastic dynamics is captured by deterministic competitive Lotka–Volterra equations. In small populations, demographic stochasticity may instead lead to the extinction of the entire population. Because the population size is driven by fitness in evolutionary games, a population of cooperators is less prone to go extinct than a population of defectors, whereas in the usual systems of fixed size the population would thrive regardless of its average payoff.

106 citations


Journal ArticleDOI
TL;DR: In this paper, a random pairwise-matching protocol was used to model the behavior of players in a discrete-time iterated RPS game under a microscopic model of win-lose-tie conditional response.
Abstract: How humans make decisions in non-cooperative strategic interactions is a big question. For the fundamental Rock-Paper-Scissors (RPS) model game system, classic Nash equilibrium (NE) theory predicts that players randomize completely their action choices to avoid being exploited, while evolutionary game theory of bounded rationality in general predicts persistent cyclic motions, especially in finite populations. However as empirical studies have been relatively sparse, it is still a controversial issue as to which theoretical framework is more appropriate to describe decision-making of human subjects. Here we observe population-level persistent cyclic motions in a laboratory experiment of the discrete-time iterated RPS game under the traditional random pairwise-matching protocol. This collective behavior contradicts with the NE theory but is quantitatively explained, without any adjustable parameter, by a microscopic model of win-lose-tie conditional response. Theoretical calculations suggest that if all players adopt the same optimized conditional response strategy, their accumulated payoff will be much higher than the reference value of the NE mixed strategy. Our work demonstrates the feasibility of understanding human competition behaviors from the angle of non-equilibrium statistical physics.

105 citations


Journal ArticleDOI
TL;DR: This paper considers zero-determinant strategies in the iterated public goods game, a representative multi-player game where in each round each player will choose whether or not to put his tokens into a public pot, and the tokens in this pot are multiplied by a factor larger than one and then evenly divided among all players.
Abstract: Recently, Press and Dyson have proposed a new class of probabilistic and conditional strategies for the two-player iterated Prisoner’s Dilemma, so-called zero-determinant strategies. A player adopting zero-determinant strategies is able to pin the expected payoff of the opponents or to enforce a linear relationship between his own payoff and the opponents’ payoff, in a unilateral way. This paper considers zero-determinant strategies in the iterated public goods game, a representative multi-player game where in each round each player will choose whether or not to put his tokens into a public pot, and the tokens in this pot are multiplied by a factor larger than one and then evenly divided among all players. The analytical and numerical results exhibit a similar yet different scenario to the case of two-player games: (i) with small number of players or a small multiplication factor, a player is able to unilaterally pin the expected total payoff of all other players; (ii) a player is able to set the ratio between his payoff and the total payoff of all other players, but this ratio is limited by an upper bound if the multiplication factor exceeds a threshold that depends on the number of players.

96 citations


Journal ArticleDOI
TL;DR: This work proposes a new no-regret learning algorithm that achieves average regret that scales as O (1/√T) with the number T of rounds when used against an adversary, and represents an almost-quadratic improvement over the best previously known strongly-uncoupled dynamics.

92 citations


Journal ArticleDOI
TL;DR: It is found that individual behaviour is best explained by a learning rule that is trying to maximize personal income, and that conditional cooperation disappears when the consequences of cooperation are made clearer.
Abstract: Economic games such as the public goods game are increasingly being used to measure social behaviours in humans and non-human primates. The results of such games have been used to argue that people are pro-social, and that humans are uniquely altruistic, willingly sacrificing their own welfare in order to benefit others. However, an alternative explanation for the empirical observations is that individuals are mistaken, but learn, during the game, how to improve their personal payoff. We test between these competing hypotheses, by comparing the explanatory power of different behavioural rules, in public goods games, where individuals are given different amounts of information. We find: (i) that individual behaviour is best explained by a learning rule that is trying to maximize personal income; (ii) that conditional cooperation disappears when the consequences of cooperation are made clearer; and (iii) that social preferences, if they exist, are more anti-social than pro-social.

81 citations


Journal ArticleDOI
01 Mar 2015-Energy
TL;DR: Simulation results demonstrate the superior performance of the proposed mechanism in reducing the peak load and increasing the suppliers' profit and the customers' payoff.

76 citations


Journal ArticleDOI
TL;DR: The results are consistent with a tentative interpretation of game theory as explaining evolved behavior, with the additional hypothesis that chimpanzees may retain or practice a specialized capacity to adjust strategy choice during competition to perform at least as well as, or better than, humans have.
Abstract: The capacity for strategic thinking about the payoff-relevant actions of conspecifics is not well understood across species. We use game theory to make predictions about choices and temporal dynamics in three abstract competitive situations with chimpanzee participants. Frequencies of chimpanzee choices are extremely close to equilibrium (accurate-guessing) predictions, and shift as payoffs change, just as equilibrium theory predicts. The chimpanzee choices are also closer to the equilibrium prediction, and more responsive to past history and payoff changes, than two samples of human choices from experiments in which humans were also initially uninformed about opponent payoffs and could not communicate verbally. The results are consistent with a tentative interpretation of game theory as explaining evolved behavior, with the additional hypothesis that chimpanzees may retain or practice a specialized capacity to adjust strategy choice during competition to perform at least as well as, or better than, humans have.

72 citations


Journal ArticleDOI
Jianchao Zheng, Yueming Cai, Ning Lu1, Yuhua Xu, Xuemin Shen1 
TL;DR: A fully distributed and online algorithm based on stochastic learning for the interference-mitigation channel selection, which is proved to converge to the NE of the formulated game.
Abstract: In this paper, we investigate the problem of channel selection for interference mitigation in opportunistic spectrum access networks using a stochastic game-theoretic approach. The studied network is distributed and dynamic , where each user only has its individual information, and no information exchange is available among users. Moreover, each user is considered to be dynamically active due to its specific data service requirement. Specifically, a user randomly becomes active and then competes for the wireless channel to transmit for a random duration. To capture such dynamic interactions among users, a dynamic interference graph is defined, and based on this, the interference mitigation problem is formulated as a graphical stochastic game. It is proved to be an exact potential game, in which the existence of the Nash equilibrium (NE) is guaranteed. Then, the performance bounds of the NE are theoretically analyzed. Furthermore, we design a fully distributed and online algorithm based on stochastic learning for the interference-mitigation channel selection, which is proved to converge to the NE of the formulated game. Finally, we conduct simulations to validate the effectiveness of the proposed algorithm for interference mitigation and throughput improvement in the distributed and dynamic environment.

69 citations


Journal ArticleDOI
TL;DR: In this paper, an improved spatial prisoner's dilemma game model is presented which simultaneously considers the individual diversity and increasing neighborhood size on two interdependent lattices, and the optimal density of influential players exists for the cooperation to be promoted, and can be further facilitated through the utility coupling.

Journal ArticleDOI
TL;DR: This paper proposes a general model of ZD strategies for noisy repeated games and finds thatZD strategies have high robustness against errors and derives the pinning strategy under noise, by which the ZD strategy player coercively sets the opponent's expected payoff to his desired level, although his payoff control ability declines with the increase of noise strength.
Abstract: Repeated game theory has been one of the most prevailing tools for understanding long-running relationships, which are the foundation in building human society. Recent works have revealed a new set of "zero-determinant" (ZD) strategies, which is an important advance in repeated games. A ZD strategy player can exert unilateral control on two players' payoffs. In particular, he can deterministically set the opponent's payoff or enforce an unfair linear relationship between the players' payoffs, thereby always seizing an advantageous share of payoffs. One of the limitations of the original ZD strategy, however, is that it does not capture the notion of robustness when the game is subjected to stochastic errors. In this paper, we propose a general model of ZD strategies for noisy repeated games and find that ZD strategies have high robustness against errors. We further derive the pinning strategy under noise, by which the ZD strategy player coercively sets the opponent's expected payoff to his desired level, although his payoff control ability declines with the increase of noise strength. Due to the uncertainty caused by noise, the ZD strategy player cannot ensure his payoff to be permanently higher than the opponent's, which implies dominant extortions do not exist even under low noise. While we show that the ZD strategy player can still establish a novel kind of extortions, named contingent extortions, where any increase of his own payoff always exceeds that of the opponent's by a fixed percentage, and the conditions under which the contingent extortions can be realized are more stringent as the noise becomes stronger.

Journal ArticleDOI
TL;DR: This paper conducted a laboratory experiment with a constant-sum sender-receiver game and a sequential game of matching pennies with the same payoff structure to investigate the impact of individuals' first and second-order beliefs on truth-telling.
Abstract: We conduct a laboratory experiment with a constant-sum sender–receiver game and a sequential game of matching pennies with the same payoff structure to investigate the impact of individuals’ first- and second-order beliefs on truth-telling. While first-movers in matching pennies choose an action at random, senders in the sender–receiver game tell the truth more often than they lie. Since second-order beliefs are uncorrelated with actions in both games, excessive truth-telling is unlikely to be driven by guilt aversion or preferences for truth-telling that are based on second-order beliefs; preferences for truth-telling per-se, on the other hand, cannot be rejected.

Journal ArticleDOI
TL;DR: Stability of steady states, Nash equilibria and the relationship of the proposed model to the standard replicator equation are discussed and the dynamical behavior of the model over different graphs is investigated by means of extended simulations.
Abstract: A new mathematical formulation of evolutionary game dynamics on networked populations is proposed. The model extends the standard replicator equation to a finite set of players organized on an arbitrary network of connections (graph). Classical results of multipopulation evolutionary game theory are used in combination with graph theory to obtain the mathematical model. Specifically, the players, located at the vertices of the graph, are interpreted as subpopulations of a multipopulation dynamical game. The members of each subpopulation are replicators, engaged at each time instant into 2-player games with the members of other connected subpopulations. This idea allows us to write an extended equation describing the game dynamics of a finite set of players connected by a graph. The obtained equation does not require any assumption on the game payoff matrices nor graph topology. Stability of steady states, Nash equilibria and the relationship of the proposed model to the standard replicator equation are discussed. The dynamical behavior of the model over different graphs is also investigated by means of extended simulations.

Journal ArticleDOI
TL;DR: In this paper, the authors derive a new class of continuous-time learning dynamics consisting of a replicator-like drift adjusted by a penalty term that renders the boundary of the game's strategy space repelling, which is equivalent to players keeping an exponentially discounted aggregate of their ongoing payoffs and then using a smooth best response to pick an action based on these performance scores.
Abstract: Starting from a heuristic learning scheme for N-person games, we derive a new class of continuous-time learning dynamics consisting of a replicator-like drift adjusted by a penalty term that renders the boundary of the game's strategy space repelling. These penalty-regulated dynamics are equivalent to players keeping an exponentially discounted aggregate of their ongoing payoffs and then using a smooth best response to pick an action based on these performance scores. Owing to this inherent duality, the proposed dynamics satisfy a variant of the folk theorem of evolutionary game theory and they converge to (arbitrarily precise) approximations of Nash equilibria in potential games. Motivated by applications to traffic engineering, we exploit this duality further to design a discrete-time, payoff-based learning algorithm which retains these convergence properties and only requires players to observe their in-game payoffs: moreover, the algorithm remains robust in the presence of stochastic perturbations and observation errors, and it does not require any synchronization between players.

Journal ArticleDOI
TL;DR: The question is asked whether individuals in uninvadable population states will appear to be maximizing conventional goal functions (with population‐structure coefficients exogenous to the individual's behavior), when what is really being maximized is invasion fitness at the genetic level.
Abstract: A long-standing question in biology and economics is whether individual organisms evolve to behave as if they were striving to maximize some goal function. We here formalize this "as if" question in a patch-structured population in which individuals obtain material payoffs from (perhaps very complex multimove) social interactions. These material payoffs determine personal fitness and, ultimately, invasion fitness. We ask whether individuals in uninvadable population states will appear to be maximizing conventional goal functions (with population-structure coefficients exogenous to the individual's behavior), when what is really being maximized is invasion fitness at the genetic level. We reach two broad conclusions. First, no simple and general individual-centered goal function emerges from the analysis. This stems from the fact that invasion fitness is a gene-centered multigenerational measure of evolutionary success. Second, when selection is weak, all multigenerational effects of selection can be summarized in a neutral type-distribution quantifying identity-by-descent between individuals within patches. Individuals then behave as if they were striving to maximize a weighted sum of material payoffs (own and others). At an uninvadable state it is as if individuals would freely choose their actions and play a Nash equilibrium of a game with a goal function that combines self-interest (own material payoff), group interest (group material payoff if everyone does the same), and local rivalry (material payoff differences).

Journal ArticleDOI
TL;DR: It is shown both problems are undecidable for multi-exit RMDPs, but are decidable for 1-RMDPs and 1-RSSGs, and more general model-checking problems with respect to linear-time temporal properties are undECidable even for a fixed property.
Abstract: We introduce Recursive Markov Decision Processes (RMDPs) and Recursive Simple Stochastic Games (RSSGs), which are classes of (finitely presented) countable-state MDPs and zero-sum turn-based (perfect information) stochastic games. They extend standard finite-state MDPs and stochastic games with a recursion feature. We study the decidability and computational complexity of these games under termination objectives for the two players: one player's goal is to maximize the probability of termination at a given exit, while the other player's goal is to minimize this probability. In the quantitative termination problems, given an RMDP (or RSSG) and probability p, we wish to decide whether the value of such a termination game is at least p (or at most p); in the qualitative termination problem we wish to decide whether the value is 1. The important 1-exit subclasses of these models, 1-RMDPs and 1-RSSGs, correspond in a precise sense to controlled and game versions of classic stochastic models, including multitype Branching Processes and Stochastic Context-Free Grammars, where the objective of the players is to maximize or minimize the probability of termination (extinction). We provide a number of upper and lower bounds for qualitative and quantitative termination problems for RMDPs and RSSGs. We show both problems are undecidable for multi-exit RMDPs, but are decidable for 1-RMDPs and 1-RSSGs. Specifically, the quantitative termination problem is decidable in PSPACE for both 1-RMDPs and 1-RSSGs, and is at least as hard as the square root sum problem, a well-known open problem in numerical computation. We show that the qualitative termination problem for 1-RMDPs (i.e., a controlled version of branching processes) can be solved in polynomial time both for maximizing and minimizing 1-RMDPs. The qualitative problem for 1-RSSGs is in NP ∩ coNP, and is at least as hard as the quantitative termination problem for Condon's finite-state simple stochastic games, whose complexity remains a well known open problem. Finally, we show that even for 1-RMDPs, more general (qualitative and quantitative) model-checking problems with respect to linear-time temporal properties are undecidable even for a fixed property.

Proceedings ArticleDOI
04 May 2015
TL;DR: It is established that ON-SGSP consistently outperforms NashQ and FFQ algorithms on a single state non-generic game as well as on a synthetic two-player game setup with 810,000 states.
Abstract: We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted general-sum stochastic game. We first generalize a non-linear optimization problem from [9] to a general N-player game setting. Next, we break down the optimization problem into simpler sub-problems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game - Sub-Problem) conditions. Using these conditions, we develop two provably convergent algorithms. The first algorithm - OFF-SGSP - is centralized and model-based, i.e., it assumes complete information of the game. The second algorithm - ON-SGSP - is an online model-free algorithm. We establish that both algorithms converge, in self-play, to the equilibria of a certain ordinary differential equation (ODE), whose stable limit points coincide with stationary NE of the underlying general-sum stochastic game. On a single state non-generic game [12] as well as on a synthetic two-player game setup with 810,000 states, we establish that ON-SGSP consistently outperforms NashQ [16] and FFQ [21] algorithms.

Journal ArticleDOI
TL;DR: In this paper, the authors generalized the General Lotto game and the Colonel Blotto game to allow for battlefield valuations that are heterogeneous across battlefields and asymmetric across players, and for the players to have asymmetric resource constraints.
Abstract: In this paper, we generalize the General Lotto game (budget constraints satisfied in expectation) and the Colonel Blotto game (budget constraints hold with probability one) to allow for battlefield valuations that are heterogeneous across battlefields and asymmetric across players, and for the players to have asymmetric resource constraints. We completely characterize Nash equilibrium in the generalized version of the General Lotto game and then show how this characterization can be applied to identify equilibria in the Colonel Blotto version of the game. In both games, we find that there exist sets of non-pathological parameter configurations of positive Lebesgue measure with multiple payoff nonequivalent equilibria.

Journal ArticleDOI
25 Jun 2015-Games
TL;DR: All the memory-one good strategies for the non-symmetric version of the Prisoner’s Dilemma, which follows from the so-called Folk Theorem for supergames, are described.
Abstract: For the iterated Prisoner’s Dilemma there exist good strategies which solve the problem when we restrict attention to the long term average payoff. When used by both players, these assure the cooperative payoff for each of them. Neither player can benefit by moving unilaterally to any other strategy, i.e., these provide Nash equilibria. In addition, if a player uses instead an alternative which decreases the opponent’s payoff below the cooperative level, then his own payoff is decreased as well. Thus, if we limit attention to the long term payoff, these strategies effectively stabilize cooperative behavior. The existence of such strategies follows from the so-called Folk Theorem for supergames, and the proof constructs an explicit memory-one example, which has been labeled Grim. Here we describe all the memory-one good strategies for the non-symmetric version of the Prisoner’s Dilemma. This is the natural object of study when the payoffs are in units of the separate players’ utilities. We discuss the special advantages and problems associated with some specific good strategies.

Journal ArticleDOI
TL;DR: In this article, the authors propose a system theory approach to the modeling of onset and evolution of criminality in a territory, which aims at capturing the complexity features of social systems and is related to the fact that individuals have the ability to develop specific heterogeneously distributed strategies, which depend also on those expressed by others.
Abstract: This paper proposes a systems theory approach to the modeling of onset and evolution of criminality in a territory. This approach aims at capturing the complexity features of social systems. Complexity is related to the fact that individuals have the ability to develop specific heterogeneously distributed strategies, which depend also on those expressed by the other individuals. The modeling is developed by methods of generalized kinetic theory where interactions and decisional processes are modeled by theoretical tools of stochastic game theory.

Posted Content
10 Sep 2015
TL;DR: In this paper, an analytical model to study the evolution towards equilibrium in spatial games, with ''memory-aware'' agents, i.e., agents that accumulate their payoff over time, was introduced.
Abstract: We introduce an analytical model to study the evolution towards equilibrium in spatial games, with `memory-aware' agents, i.e., agents that accumulate their payoff over time. In particular, we focus our attention on the spatial Prisoner's Dilemma, as it constitutes an emblematic example of a game whose Nash equilibrium is defection. Previous investigations showed that, under opportune conditions, it is possible to reach, in the evolutionary Prisoner's Dilemma, an equilibrium of cooperation. Notably, it seems that mechanisms like motion may lead a population to become cooperative. In the proposed model, we map agents to particles of a gas so that, on varying the system temperature, they randomly move. In doing so, we are able to identify a relation between the temperature and the final equilibrium of the population, explaining how it is possible to break the classical Nash equilibrium in the spatial Prisoner's Dilemma when considering agents able to increase their payoff over time. Moreover, we introduce a formalism to study order-disorder phase transitions in these dynamics. As result, we highlight that the proposed model allows to explain analytically how a population, whose interactions are based on the Prisoner's Dilemma, can reach an equilibrium far from the expected one; opening also the way to define a direct link between evolutionary game theory and statistical physics.

Journal ArticleDOI
TL;DR: The historical context and the impact of Shapley’s contribution to stochastic games, which were the first general dynamic model of a game to be defined, are summarized.
Abstract: In 1953, Lloyd Shapley contributed his paper “Stochastic games” to PNAS. In this paper, he defined the model of stochastic games, which were the first general dynamic model of a game to be defined, and proved that it admits a stationary equilibrium. In this Perspective, we summarize the historical context and the impact of Shapley’s contribution.

Journal ArticleDOI
TL;DR: It is shown that in contrast to multi-dimensional mean-payoff games that are known to be coNP-complete,Multi-dimensional total-payoffs games are undecidable, and conservative approximations of these objectives are introduced.
Abstract: We consider two-player games played on weighted directed graphs with mean-payoff and total-payoff objectives, two classical quantitative objectives. While for single-dimensional games the complexity and memory bounds for both objectives coincide, we show that in contrast to multi-dimensional mean-payoff games that are known to be coNP-complete, multi-dimensional total-payoff games are undecidable. We introduce conservative approximations of these objectives, where the payoff is considered over a local finite window sliding along a play, instead of the whole play. For single dimension, we show that (i) if the window size is polynomial, deciding the winner takes polynomial time, and (ii) the existence of a bounded window can be decided in NP ? coNP, and is at least as hard as solving mean-payoff games. For multiple dimensions, we show that (i) the problem with fixed window size is EXPTIME-complete, and (ii) there is no primitive-recursive algorithm to decide the existence of a bounded window.

Posted Content
TL;DR: The optimization problem faced by a perfectly informed principal in a Bayesian game, who reveals information to the players about the state of nature to obtain a desirable equilibrium is studied, and it is shown that it is NP-hard to obtain an additive FPTAS.
Abstract: We study the optimization problem faced by a perfectly informed principal in a Bayesian game, who reveals information to the players about the state of nature to obtain a desirable equilibrium. This signaling problem is the natural design question motivated by uncertainty in games and has attracted much recent attention. We present new hardness results for signaling problems in (a) Bayesian two-player zero-sum games, and (b) Bayesian network routing games. For Bayesian zero-sum games, when the principal seeks to maximize the equilibrium utility of a player, we show that it is NP-hard to obtain an additive FPTAS. Our hardness proof exploits duality and the equivalence of separation and optimization in a novel way. Further, we rule out an additive PTAS assuming planted clique hardness, which states that no polynomial time algorithm can recover a planted clique from an Erdős-Renyi random graph. Complementing these, we obtain a PTAS for a structured class of zero-sum games (where obtaining an FPTAS is still NP-hard) when the payoff matrices obey a Lipschitz condition. Previous results ruled out an FPTAS assuming planted-clique hardness, and a PTAS only for implicit games with quasi-polynomial-size strategy sets. For Bayesian network routing games, wherein the principal seeks to minimize the average latency of the Nash flow, we show that it is NP-hard to obtain a (multiplicative) $(4/3 - \epsilon)$-approximation, even for linear latency functions. This is the optimal inapproximability result for linear latencies, since we show that full revelation achieves a $(4/3)$-approximation for linear latencies.

Journal ArticleDOI
TL;DR: In this paper, the authors study coordination games under general type spaces and characterize rationalizable actions in terms of the properties of the belief hierarchies and show that there is a unique rationalizable action played whenever there is approximate common certainty of rank beliefs, defined as the probability the players assign to their payoff parameters being higher than their opponents.
Abstract: We study coordination games under general type spaces We characterize rationalizable actions in terms of the properties of the belief hierarchies and show that there is a unique rationalizable action played whenever there is approximate common certainty of rank beliefs, defi ned as the probability the players assign to their payoff parameters being higher than their opponents’ We argue that this is the driving force behind selection results for the specifi c type spaces in the global games literature

Journal ArticleDOI
Jianhua Liu, Shigen Shen1, Guangxue Yue, Risheng Han, Hongjie Li 
01 May 2015
TL;DR: The proposed SECG strategy in the virtual-sensor-service attack-defense game is shown to achieve much better performance than strategies obtained from the evolutionary coalition game or stochastic game, since it successfully accommodates the environment dynamics and the strategic behavior of the service attackers.
Abstract: Graphical abstractThe figure shows an attack model for VSSNs, which includes end users, malicious users, physical sensors, and virtual sensors. The physical sensor network is mapped to VSSNs. Malicious users attack virtual-sensor-service nodes by scanning service vulnerabilities and attacking virtual capacities in order to damage service composites and reduce the service reliability. Virtual-sensor-service nodes' adaptive defense optimization problem can be modeled by several representations such as coalition players' decision problem (MCDP), the stochastic and evolutionary coalition game, and interactive partially observable Markov decision process for attackers and defenders. A number of attackers and defenders interact with virtual sensor service at discrete time stages. At each time stage, every player takes a defense action, which makes the virtual sensor service transit to the next state. At the next time stage, each player receives a local observation of the virtual capacity. State transition and update learning model the dynamics of the service reliability, while the accumulative payoff by self-learning optimizes the value of taking action in a particular state in coalitions. Display Omitted HighlightsFormulate a novel SECG defense model against attackers for VSSNs.Propose a mechanism of coalition formation based on Barabasi-Albert(BA) model.Expand the technique of Markov decision process (MDP) for the SECG.Obtain the optimal defense strategies using the minimax-Q learning. Various intrusion detection systems (IDSs) have been proposed in recent years to provide safe and reliable services in cloud computing. However, few of them have considered the existence of service attackers who can adapt their attacking strategies to the topology-varying environment and service providers' strategies. In this paper, we investigate the security and dependability mechanism when service providers are facing service attacks of software and hardware, and propose a stochastic evolutionary coalition game (SECG) framework for secure and reliable defenses in virtual sensor services. At each stage of the game, service providers observe the resource availability, the quality of service (QoS), and the attackers' strategies from cloud monitoring systems (CMSs) and IDSs. According to these observations, they will decide how evolutionary coalitions should be dynamically formed for reliable virtual-sensor-service composites to deliver data and how to adaptively defend in the face of uncertain attack strategies. Using the evolutionary coalition game, virtual-sensor-service nodes can form a reliable service composite by a reliability update function. With the Markov chain constructed, virtual-sensor-service nodes can gradually learn the optimal strategy and evolutionary coalition structure through the minimax-Q learning, which maximizes the expected sum of discounted payoffs defined as QoS for virtual-sensor-service composites. The proposed SECG strategy in the virtual-sensor-service attack-defense game is shown to achieve much better performance than strategies obtained from the evolutionary coalition game or stochastic game, which only maximizes each stage's payoff and optimizes a defense strategy of stochastic evolutionary, since it successfully accommodates the environment dynamics and the strategic behavior of the service attackers.

Journal ArticleDOI
TL;DR: In this paper, a model of strategic network formation with local complementarities in effort levels and positive local externalities is presented for a general class of payoff functions, which subsumes the linear-quadratic specification frequently used in the literature.
Abstract: This paper presents a model of strategic network formation with local complementarities in effort levels and positive local externalities Results are obtained for a general class of payoff functions, which subsumes the linear-quadratic specification frequently used in the literature We assume homogeneous agents and characterize equilibria for two-sided and one-sided link formation (Pairwise) Nash equilibrium networks are shown to be nested split graphs, which are a strict subset of core-periphery networks The relevance of the convexity of the value function in obtaining these structures is highlighted In equilibrium more central agents exert more effort and obtain higher gross payoffs However, net of linking cost, central agents may obtain strictly lower net payoffs Under additional assumptions on payoffs we show that the only efficient network structures are also nested split graphs These findings are relevant for a wide range of social and economic phenomena, such as educational attainment, criminal activity, labor market participation, and R&D expenditures of firms

Journal ArticleDOI
TL;DR: It is established that general zero-sum games with separable definable transition functions have a uniform value, and applications to nonlinear maps arising in risk sensitive control and Perron-Frobenius theory are given.
Abstract: Definable zero-sum stochastic games involve a finite number of states and action sets, and reward and transition functions, that are definable in an o-minimal structure. Prominent examples of such games are finite, semi-algebraic, or globally subanalytic stochastic games. We prove that the Shapley operator of any definable stochastic game with separable transition and reward functions is definable in the same structure. Definability in the same structure does not hold systematically: we provide a counterexample of a stochastic game with semi-algebraic data yielding a non-semi-algebraic but globally subanalytic Shapley operator. Our definability results on Shapley operators are used to prove that any separable definable game has a uniform value; in the case of polynomially bounded structures, we also provide convergence rates. Using an approximation procedure, we actually establish that general zero-sum games with separable definable transition functions have a uniform value. These results highlight the key role played by the tame structure of transition functions. As particular cases of our main results, we obtain that stochastic games with polynomial transitions, definable games with finite actions on one side, and definable games with perfect information or switching controls have a uniform value. Applications to nonlinear maps arising in risk sensitive control and Perron-Frobenius theory are also given.

Posted Content
TL;DR: This work provides two main technical results that lift this conclusion that coarse correlated equilibria, which characterize outcomes resulting from no-regret learning dynamics, have near-optimal welfare to games of incomplete information, a.k.a., Bayesian games.
Abstract: Recent price-of-anarchy analyses of games of complete information suggest that coarse correlated equilibria, which characterize outcomes resulting from no-regret learning dynamics, have near-optimal welfare. This work provides two main technical results that lift this conclusion to games of incomplete information, a.k.a., Bayesian games. First, near-optimal welfare in Bayesian games follows directly from the smoothness-based proof of near-optimal welfare in the same game when the private information is public. Second, no-regret learning dynamics converge to Bayesian coarse correlated equilibrium in these incomplete information games. These results are enabled by interpretation of a Bayesian game as a stochastic game of complete information.