scispace - formally typeset
Search or ask a question

Showing papers on "Stochastic game published in 2001"


Journal ArticleDOI
TL;DR: This work defines Markov strategy and Markov perfect equilibrium and shows that an MPE is generically robust: if payoffs of a generic game are perturbed, there exists an almost Markovian equilibrium in the perturbed game near the initial MPE.

822 citations


Journal ArticleDOI
TL;DR: The core of a class of coalition formation game in which every player's payoff depends only on the members of her coalition is analyzed, and two top-coalition properties each of which guarantee the existence of a core allocation are introduced.
Abstract: We analyze the core of a class of coalition formation game in which every player's payoff depends only on the members of her coalition. We first consider anonymous games and additively separable games. Neither of these strong properties guarantee the existence of a core allocation, even if additional strong properties are imposed. We then introduce two top-coalition properties each of which guarantee the existence. We show that these properties are independent of the Scarf-balancedness condition. Finally we give several economic applications.

504 citations


Journal ArticleDOI
TL;DR: In this paper, the authors report laboratory data for games that are played only once, and show that a change in the payoff structure produces a large inconsistency between theoretical predictions and observed behavior, which is consistent with simple intuition based on the interaction of payoff asymmetries and noisy introspection about others' decisions.
Abstract: This paper reports laboratory data for games that are played only once. These games span the standard categories: static and dynamic games with complete and incomplete information. For each game, the treasure is a treatment in which behavior conforms nicely to predictions of the Nash equilibrium or relevant refinement. In each case, however, a change in the payoff structure produces a large inconsistency between theoretical predictions and observed behavior. These contradictions are generally consistent with simple intuition based on the interaction of payoff asymmetries and noisy introspection about others' decisions.

483 citations


Proceedings Article
04 Aug 2001
TL;DR: This paper introduces two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence, and contributes a new learning algorithm, WoLF policy hillclimbing, that is proven to be rational.
Abstract: This paper investigates the problem of policy learning in multiagent environments using the stochastic game framework, which we briefly overview. We introduce two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence. We examine existing reinforcement learning algorithms according to these two properties and notice that they fail to simultaneously meet both criteria. We then contribute a new learning algorithm, WoLF policy hillclimbing, that is based on a simple principle: “learn quickly while losing, slowly while winning.” The algorithm is proven to be rational and we present empirical results for a number of stochastic games showing the algorithm converges.

333 citations


Posted Content
TL;DR: In this paper, the authors study games with strategic complementarities, arbitrary numbers of players and actions, and slightly noisy payoff signals, and prove limit uniqueness: as the signal noise vanishes, the game has a unique strategy profile that survives iterative dominance.
Abstract: We study games with strategic complementarities, arbitrary numbers of players and actions, and slightly noisy payoff signals. We prove limit uniqueness: as the signal noise vanishes, the game has a unique strategy profile that survives iterative dominance. This generalizes a result of Carlsson and van Damme (1993) for two player, two action games. Te surviving profile, however, may depend on fine details of the structure of the noise. We provide sufficient conditions on payoffs for there to be noise-independent selection.

294 citations


Journal ArticleDOI
TL;DR: In this article, the authors characterize a class of simple adaptive strategies, in the repeated play of a game, having the Hannanconsistency property: in the long run, the player is guaranteed an average payoff as large as the best-reply payoff to the empirical distribution of play of the other players; i.e., there is no regret.

233 citations


Journal ArticleDOI
TL;DR: It is demonstrated experimentally that using these new utility functions can result in significantly improved performance over that of previously investigated COIN payoff utilities, over and above those previous utilities' superiority to the conventional team game utility.
Abstract: We consider the problem of designing (perhaps massively distributed) collectives of computational processes to maximize a provided "world utility" function. We consider this problem when the behavior of each process in the collective can be cast as striving to maximize its own payoff utility function. For such cases the central design issue is how to initialize/update those payoff utility functions of the individual processes so as to induce behavior of the entire collective having good values of the world utility. Traditional "team game" approaches to this problem simply assign to each process the world utility as its payoff utility function. In previous work we used the "Collective Intelligence" (COIN) framework to derive a better choice of payoff utility functions, one that results in world utility performance up to orders of magnitude superior to that ensuing from the use of the team game utility. In this paper, we extend these results using a novel mathematical framework. Under that new framework we review the derivation of the general class of payoff utility functions that both (i) are easy for the individual processes to try to maximize, and (ii) have the property that if good values of them are achieved, then we are assured a high value of world utility. These are the "Aristocrat Utility" and a new variant of the "Wonderful Life Utility" that was introduced in the previous COIN work. We demonstrate experimentally that using these new utility functions can result in significantly improved performance over that of previously investigated COIN payoff utilities, over and above those previous utilities' superiority to the conventional team game utility. These results also illustrate the substantial superiority of these payoff functions to perhaps the most natural version of the economics technique of "endogenizing externalities."

232 citations


01 Jan 2001
TL;DR: A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given.
Abstract: In the multi-armed bandit problem, a gambler must decide which arm of non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to nd the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. In a sequence of plays, we prove that the per-round payoff of our algorithm approaches that of the best arm at the rate . We show by a matching lower bound that this is best possible. We also prove that our algorithm approaches the per-round payoff of any set of strategies at a similar rate: if the best strategy is chosen from a pool of strategies then our algorithm approaches the per-round payoff of the strategy at the rate . Finally, we apply our results to the problem of playing an unknown repeated matrix game. We show that our algorithm approaches the minimax payoff of the unknown game at the rate

204 citations


Patent
07 Sep 2001
TL;DR: In this article, a multi-player game is played on a large visual display and a plurality of player terminals coupled with the visual display, where players can place wagers, make any necessary selections and play the game at their own pace.
Abstract: A multi-player gaming platform includes a large visual display and a plurality of player terminals coupled to the visual display. The visual display indicates a game of chance including a single play field and a plurality of movable game pieces. The game pieces are associated with the respective player terminals. In response to a wager placed at one of the player terminals, the game piece associated with that terminal moves along or near the play field and generates a game outcome. The game outcome may be defined by the game piece itself or where the game piece lands on the play field. The game awards a payoff if the game outcome meets winning criteria. The wager placed at the one of the player terminals is independent of any other wagers placed at the other player terminals and is independent of when the other wagers are placed. Similarly, the game piece associated with the one of the player terminals operates independent of the game pieces associated with the other player terminals. There is no actual player-to-player interaction--the players merely make use of the same play field. Therefore, the gaming platform allows players to join the game of chance at any time and to place wagers, make any necessary selections, and play the game at their own pace.

180 citations


Book
31 Mar 2001
TL;DR: This chapter discusses social and economic networks in Cooperative Situations and a one-Stage model of Network Formation and Payoff Division and Variants on the Basic Model.
Abstract: Preface. Part I: Social and Economic Networks in Cooperative Situations. 1. Games and Networks. 2. Restricted Cooperation in Games. 3. Inheritance of Properties in Communication Situations. 4. Variants on the Basic Model. Part II: Network Formation. 5. Noncooperative Games. 6. A Network-Formation Model in Extensive Form. 7. A Network-Formation Model in Strategic Form. 8. Network Formation with Costs for Establishing Links. 9. A One-Stage Model of Network Formation and Payoff Division. 10. Network Formation and Potential Games. 11. Network Formation and Reward Functions. References. Notations. Index.

180 citations


Journal ArticleDOI
TL;DR: It is shown that a simple threat-based strategy is optimal and its competitive ratio is determined which yields, for realistic values of the problem parameters, surprisingly low competitive ratios.
Abstract: This paper is concerned with the time series search and one-way trading problems. In the (time series) search problem a player is searching for the maximum (or minimum) price in a sequence that unfolds sequentially, one price at a time. Once during this game the player can decide to accept the current price p in which case the game ends and the player's payoff is p . In the one-way trading problem a trader is given the task of trading dollars to yen. Each day, a new exchange rate is announced and the trader must decide how many dollars to convert to yen according to the current rate. The game ends when the trader trades his entire dollar wealth to yen and his payoff is the number of yen acquired.

Posted Content
TL;DR: Global games are games of incomplete information whose type space is determined by the players each observing a noisy signal of the underlying state as mentioned in this paper, allowing analysis of a number of economic models of coordination failure.
Abstract: Global games are games of incomplete information whose type space is determined by the players each observing a noisy signal of the underlying state. With strategic complementarities, global games often have a unique, dominance solvable equilibrium, allowing analysis of a number of economic models of coordination failure. For symmetric binary action global games, equilibrium strategies in the limit (as noise becomes negligible) are simple to characterize in terms of 'diffuse' beliefs over the actions of others. We describe a number of economic applications that fall in this category. We also explore the distinctive roles of public and private information in this setting, review results for general global games, discuss the relationship between global games and a literature on higher order beliefs in game theory and describe the relationship to local interaction games and dynamic games with payoff shocks.

Journal ArticleDOI
TL;DR: In this paper, the authors explore a dynamic model of product innovation, extending the work of Dutta, Lach and Rustichini (1995), and show that if R&D costs for quality improvements are low, the dynamic competition is structured as a race for being the pioneer firm with payoff equalization in equilibrium, but switches to a waiting game with a second-mover advantage in equilibrium if costs are high.
Abstract: This paper explores a dynamic model of product innovation, extending the work of Dutta, Lach and Rustichini (1995). It is shown that if R&D costs for quality improvements are low, the dynamic competition is structured as a race for being the pioneer firm with payoff equalization in equilibrium, but switches to a waiting game with a second-mover advantage in equilibrium if R&D costs are high. Moreover, the second-mover advantage increases monotonically as R&D becomes more costly.

Journal ArticleDOI
TL;DR: A population of players of players is randomly matched to play a normal form game G where each individual has preferences over the outcomes in the game and chooses an optimal action with respect to those preferences.

Journal ArticleDOI
TL;DR: In this article, an intertemporal decomposition scheme for the total side payment is proposed, which has the following individual rationality property: in each subgame that starts along the cooperative trajectory, one country is guaranteed to receive a higher payoff in the cooperative solution than in the disagreement solution.

Journal ArticleDOI
TL;DR: In this paper, the authors consider two-person zero-sum stochastic games and give sufficient conditions in terms of Φ(α,f) and its derivative at 0 for absorbing games with compact action spaces and incomplete information games.
Abstract: We consider two person zero-sum stochastic games. The recursive formula for the valuesvλ (resp.v n) of the discounted (resp. finitely repeated) version can be written in terms of a single basic operator Φ(α,f) where α is the weight on the present payoff andf the future payoff. We give sufficient conditions in terms of Φ(α,f) and its derivative at 0 for limv n and limvλ to exist and to be equal. We apply these results to obtain such convergence properties for absorbing games with compact action spaces and incomplete information games.

Journal ArticleDOI
TL;DR: In this article, the authors explore a dynamic model of product innovation and show that if R&D costs for quality improvements are low, the dynamic competition is structured as a race for being the pioneer firm with payoff equalization in equilibrium, but switches to a waiting game with a second-mover advantage in equilibrium if costs are high.
Abstract: This paper explores a dynamic model of product innovation, extending the work of Dutta, Lach, and Rustichini (1995). It is shown that if R&D costs for quality improvements are low, the dynamic competition is structured as a race for being the pioneer firm with payoff equalization in equilibrium, but switches to a waiting game with a second-mover advantage in equilibrium if R&D costs are high. Moreover, the second-mover advantage increases monotonically as R&D becomes more costly.

Posted Content
TL;DR: In this paper, the authors study decision problems under uncertainty where a decisionmaker observes an imperfect signal about the true state of the world and analyzes the information preferences and information demand of such decision-makers, based on properties of their payoff functions.
Abstract: February 2001 This paper studies decision problems under uncertainty where a decision-maker observes an imperfect signal about the true state of the world. We analyze the information preferences and information demand of such decision-makers, based on properties of their payoff functions. We restrict attention to "monotone decision problems," whereby the posterior beliefs induced by the signal can be ordered so that higher actions are chosen in response to higher signal realizations. Monotone decision problems are frequently encountered in economic modeling. We provide necessary and sufficient conditions for all decision makers with different classes of payoff functions to prefer one information structure to another. We also provide conditions under which two decision-makers in a given class can be ranked in terms of their marginal value for information and hence information demand. Applications and examples are given. Working Papers Index

Journal ArticleDOI
31 Aug 2001-Chaos
TL;DR: The combination of Parrondo's games can be therefore considered as a discrete-time Brownian ratchet, and the parameter space in which the paradoxical effect occurs is found and the winning rate analysis is carried out.
Abstract: Parrondo’s games present an apparently paradoxical situation where individually losing games can be combined to win. In this article we analyze the case of two coin tossing games. Game B is played with two biased coins and has state-dependent rules based on the player’s current capital. Game B can exhibit detailed balance or even negative drift (i.e., loss), depending on the chosen parameters. Game A is played with a single biased coin that produces a loss or negative drift in capital. However, a winning expectation is achieved by randomly mixing A and B. One possible interpretation pictures game A as a source of “noise” that is rectified by game B to produce overall positive drift—as in a Brownian ratchet. Game B has a state-dependent rule that favors a losing coin, but when this state dependence is broken up by the noise introduced by game A, a winning coin is favored. In this article we find the parameter space in which the paradoxical effect occurs and carry out a winning rate analysis. The significance of Parrondo’s games is that they are physically motivated and were originally derived by considering a Brownian ratchet—the combination of the games can be therefore considered as a discrete-time Brownian ratchet. We postulate the use of games of this type as a toy model for a number of physical and biological processes and raise a number of open questions for future research.

Posted ContentDOI
TL;DR: In this article, the authors characterize the equilibrium sets of an intrinsic common agency game with direct externalities between principals both under complete and asymmetric information, and show that a unique equilibrium may be selected by conveniently perturbing the information structure.
Abstract: This paper characterizes the equilibrium sets of an intrinsic common agency game with direct externalities between principals both under complete and asymmetric information. Direct externalities arise when the contracting variable of one principal affects directly the other principal's payoff. Out-of-equilibrium messages are used by principals to precommit themselves to distort their strategic behavior. We characterize pure-strategy symmetric equilibria arising in such games under complete information and show their multiplicity. We then introduce asymmetric information to refine the set of feasible conjectures. We show that a unique equilibrium may be selected by conveniently perturbing the information structure. Both under complete and asymmetric information, we show that the equilibrium outputs of the intrinsic common agency game are also equilibrium outputs of the delegated common agency game, although the two games differ in terms of the distribution of surplus they involve.

Journal ArticleDOI
TL;DR: Repeated interactions among a fixed set of “low rationality” players who have status quo actions, randomly sample other actions, and change their status quo if the sampled action yields a higher payoff generates a random process, the better-reply dynamics.

Journal ArticleDOI
01 May 2001
TL;DR: This paper proves the existence of a subgame-perfect uniform e-equilibrium under some assumptions on the payoff structure and presents a new method for dealing with n-player games.
Abstract: Quitting games aren-player sequential games in which, at any stage, each player has the choice betweencontinuing andquitting. The game ends as soon asat least one player chooses to quit; playeri then receives a payoffr iS , which depends on the setS of players that did choose to quit. If the game never ends, the payoff to each player is 0.The paper has four goals: (i) We prove the existence of a subgame-perfect uniform e-equilibrium under some assumptions on the payoff structure; (ii) we study the structure of the e-equilibrium strategies; (iii) we present a new method for dealing withn-player games; and (iv) we study an example of a four-player quitting game where the "simplest" equilibrium is cyclic with Period 2.We also discuss the relation to Dynkin's stopping games and provide a generalization of our result to these games.

Journal ArticleDOI
TL;DR: This paper provides a set of theoretical results to identify the structure of equilibrium payoffs with special attention to the payoff of the agent and illustrates this user guide on a wide and diverse family of applications including auctions, competition for an input, economic influence, and private production of public goods.

Posted Content
TL;DR: In this paper, the adaptive experience-weighted attraction (EWA) learning model was extended to capture sophisticated learning and strategic teaching in repeated games and the generalized model was used for reputation formation.
Abstract: Most learning models assume players are adaptive (ie, they respond only to their own previous experience and ignore others' payoff information) and behavior is not sensitive to the way in which players are matched Empirical evidence suggests otherwise In this paper, we extend our adaptive experience-weighted attraction (EWA) learning model to capture sophisticated learning and strategic teaching in repeated games The generalized model assumes there is a mixture of adaptive learners and sophisticated players An adaptive learner adjusts his behavior the EWA way A sophisticated player rationally best-responds to her forecasts of all other behaviors A sophisticated player can be either myopic or farsighted A farsighted player develops multiple-period rather than single-period forecasts of others' behaviors and chooses to "teach" the other players by choosing a strategy scenario that gives her the highest discounted net present value We estimate the model using data from p-beauty contests and repeated trust games with incomplete information The generalized model is better than the adaptive EWA model in describing and predicting behavior Including teaching also allows an empirical learning-based approach to reputation formation which predicts better than a quantal-response extension of the standard type-based approach

Journal ArticleDOI
TL;DR: In this article, the authors study a coordination game with randomly changing payoffs and small frictions in changing actions, and they find that players must coordinate on the risk-dominant equilibrium.
Abstract: We study a coordination game with randomly changing payoffs and small frictions in changing actions. Using only backwards induction, we find that players must coordinate on the risk-dominant equilibrium. More precisely, a continuum of fully rational players are randomly matched to play a symmetric 2 x 2 game. The payoff matrix changes according to a random walk. Players observe these payoffs and the population distribution of actions as they evolve. The game has frictions: opportunities to change strategies arrive from independent random processes, so that the players are locked into their actions for some time. As the frictions disappear, each player ignores what the others are doing and switches at her first opportunity to the risk-dominant action. History dependence emerges in some cases when frictions remain positive.

Journal ArticleDOI
TL;DR: It is concluded that there are strategic situations in which it is impossible in principle for perfectly rational agents to learn to predict the future behavior of other perfectly rational Agents based solely on their observed actions.
Abstract: A foundational assumption in economics is that people are rational: they choose optimal plans of action given their predictions about future states of the world. In games of strategy this means that each player's strategy should be optimal given his or her prediction of the opponents' strategies. We demonstrate that there is an inherent tension between rationality and prediction when players are uncertain about their opponents' payoff functions. Specifically, there are games in which it is impossible for perfectly rational players to learn to predict the future behavior of their opponents (even approximately) no matter what learning rule they use. The reason is that in trying to predict the next-period behavior of an opponent, a rational player must take an action this period that the opponent can observe. This observation may cause the opponent to alter his next-period behavior, thus invalidating the first player's prediction. The resulting feedback loop has the property that, a positive fraction of the time, the predicted probability of some action next period differs substantially from the actual probability with which the action is going to occur. We conclude that there are strategic situations in which it is impossible in principle for perfectly rational agents to learn to predict the future behavior of other perfectly rational agents based solely on their observed actions.

Journal ArticleDOI
TL;DR: A generalized framework is presented which can describe the long-term dynamics of the Ultimatum Game and also explain the evolution of fairness in a one-parameter UltIMatum Game.

Book ChapterDOI
01 Aug 2001
TL;DR: In this article, the authors consider the possibility of autonomous agents engaging in implicit negotiation via their tacit interactions in repeated general-sum games, where an agent using a "best response" strategy maximizes its own payoff assuming its behavior has no effect on its opponent.
Abstract: In business-related interactions such as the on-going high-stakes FCC spectrum auctions, explicit communication among participants is regarded as collusion, and is therefore illegal. In this paper, we consider the possibility of autonomous agents engaging in implicit negotiation via their tacit interactions. In repeated general-sum games, our testbed for studying this type of interaction, an agent using a "best response" strategy maximizes its own payoff assuming its behavior has no effect on its opponent. This notion of best response requires some degree of learning to determine the fixed opponent behavior. Against an unchanging opponent, the best-response agent performs optimally, and can be thought of as a "follower," since it adapts to its opponent. However, pairing two best-response agents in a repeated game can result in suboptimal behavior. We demonstrate this suboptimality in several different games using variants of Q-learning as an example of a best-response strategy.We then examine two "leader" strategies that induce better performance from opponent followers via stubbornness and threats. These tactics are forms of implicit negotiation in that they aim to achieve a mutually beneficial outcome without using explicit communication outside of the game.

Journal Article
TL;DR: In this paper, the authors apply abstraction to solve large stochastic imperfect-information games, specifically variants of poker, and examine several different medium-size poker variants and give encouraging results for abstractionbased methods on these games.
Abstract: ion is a method often applied to keep the combinatorial explosion under control and to solve problems of large complexity. Our work focuses on applying abstraction to solve large stochastic imperfect-information games, specifically variants of poker. We examine several different medium-size poker variants and give encouraging results for abstraction-based methods on these games.

Posted Content
TL;DR: Du et al. as mentioned in this paper investigated phase transition-like behavior of quantum games, and suggested a method which would help to illuminate the origin of such kind of behavior, for the particular case of the generalized Prisoners' Dilemma, even though the classical game behaves the same, the quantum game exhibits different and interesting phase-transition-like behaviour.
Abstract: The discontinuous dependence of the properties of a quantum game on its entanglement has been shown up to be very much like phase transitions viewed in the entanglement-payoff diagram [J. Du et al., Phys. Rev. Lett, 88, 137902 (2002)]. In this paper we investigate such phase-transition-like behavior of quantum games, by suggesting a method which would help to illuminate the origin of such kind of behavior. For the particular case of the generalized Prisoners' Dilemma, we find that, for different settings of the numerical values in the payoff table, even though the classical game behaves the same, the quantum game exhibits different and interesting phase-transition-like behavior.