scispace - formally typeset
Search or ask a question

Showing papers on "Stochastic game published in 1995"


Journal ArticleDOI
TL;DR: This article studied the effect of word-of-mouth communication on the behavior of a population of identical players in a stochastic decision environment and found that the structure of the communication process determines whether all agents end up making identical choices, with less communication making this conformity more likely.
Abstract: This paper studies the way that word-of-mouth communication aggregates the information of individual agents. We find that the structure of the communication process determines whether all agents end up making identical choices, with less communication making this conformity more likely. Despite the players' naive decision rules and the stochastic decision environment, word-of-mouth communication may lead all players to adopt the action that is on average superior. These socially efficient outcomes tend to occur when each agent samples only a few others. I. INTRODUCTION Economic agents must often make decisions without knowing the costs and benefits of the possible choices. Given the frequency with which such situations arise, it is understandable that agents often choose not to perform studies or experiments, but instead rely on whatever information they have obtained via casual word-of-mouth communication. Reliance on this sort of easily obtained information appears to be common in circumstances ranging from consumers choosing restaurants or auto mechanics to business managers evaluating alternative organizational structures. This paper studies two related environments in arguing that individuals' reliance on word-of-mouth communication has interesting implications for their aggregate behavior. First, motivated by the diffusion of new technologies, we consider a choice between two competing products with unequal qualities or payoffs, and show that the structure of communication is important in determining whether the population as a whole is likely to learn to use the superior product. Second, we consider a choice between two products or practices that are equally good, and ask whether consumers are likely to "herd" onto a single choice, or whether "diversity" will obtain even in the long run. We explore the implications of word-of-mouth communication in a simple nonstrategic environment. There is a large population of identical players, each of whom repeatedly chooses between two possible actions. Each player's payoff is determined by his own

903 citations


Proceedings ArticleDOI
23 Oct 1995
TL;DR: A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given.
Abstract: In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. In a sequence of T plays, we prove that the expected per-round payoff of our algorithm approaches that of the best arm at the rate O(T/sup -1/3/), and we give an improved rate of convergence when the best arm has fairly low payoff. We also consider a setting in which the player has a team of "experts" advising him on which arm to play; here, we give a strategy that will guarantee expected payoff close to that of the best expert. Finally, we apply our result to the problem of learning to play an unknown repeated matrix game against an all-powerful adversary.

807 citations


Patent
22 Sep 1995
TL;DR: In this article, a chip receptacle is provided at each player's location of a blackjack table for accepting the side bet, and a player's key operated display selects a predetermined number of consecutive wins.
Abstract: A betting apparatus incorporated into a game of chance enabling a player to make a side bet. A chip receptacle is provided at each player's location of a blackjack table for accepting the side bet. A player's key operated display selects a predetermined number of consecutive wins. A microprocessor cooperating with a sensor identifies the denomination of one or more chips placed in the chip receptacle and, together with a number of consecutive wins selected by the player, displays a payoff amount for a selected number of consecutive wins. The hands are played following conventional rules. The betting receptacle cover seals the chips after completion of a betting phase, under control of the dealer, and signals the beginning of a new game. Each player's location is provided with a Loss button, operated by the dealer when a player loses. A Push button may be provided for each player position when that player has a hand equal in value to a dealer's hand to indicate a tie. The microprocessor adds one to the consecutive win count display when a player wins a game, each time the dealer's game button is operated. When the number of consecutive wins displayed equals the number of consecutive wins selected, an audio/visual alarm indicates a win. Other embodiments incorporate the betting apparatus in all casino games, including table games, slot machines and video games. The chip receptacle may be substituted by a coin receptacle in slot machine and video games.

305 citations


Journal ArticleDOI
TL;DR: This article showed that a contrite version of tit-for-tat is even more effective at quickly restoring mutual cooperation without the risk of exploitation when the other players have adapted to noise.
Abstract: Noise in the form of random errors in implementing a choice is a common problem in real-world interactions. Recent research has identified three approaches to coping with noise: adding generosity to a reciprocating strategy; adding contrition to a reciprocating strategy; and using an entirely different strategy, Pavlov, based on the idea of switching choice whenever the previous payoff was low. Tournament studies, ecological simulation, and theoretical analysis demonstrate (1) a generous version of tit-for-tat is a highly effective strategy when the players it meets have not adapted to noise; (2) if the other players have adapted to noise, a contrite version of tit-for-tat is even more effective at quickly restoring mutual cooperation without the risk of exploitation; and (3) Pavlov is not robust.

298 citations


Journal ArticleDOI
TL;DR: In this article, the existence of a saddle point in the bounded case is obtained if the Isaacs' condition holds, and this technique is also a very simple approach for finding an optimal strategy in the case of controlled diffusions.

266 citations


Journal ArticleDOI
TL;DR: In this paper, a folk theorem for stochastic games is presented, which subsumes a number of results obtained earlier and applies to a wide range of games studied in the economics literature.

256 citations


Journal ArticleDOI
TL;DR: The EPI approach provides an understanding of the relationship between measurement and physical law and derives the Lagrangian that implies both the Klein-Gordon equation and the Dirac equation of quantum mechanics.
Abstract: The Lagrangians of physics arise out of a mathematical game between a ``smart'' measurer and nature (personified by a demon). Each contestant wants to maximize his level of Fisher information I. The game is zero sum, by conservation of information in the closed system. The payoff of the game introduces a variational principle---extreme physical information (EPI)---which fixes both the Lagrangian and the physical constant of each scenario. The EPI approach provides an understanding of the relationship between measurement and physical law. EPI also defines a prescription for constructing Lagrangians. The prior knowledge required for this purpose is a rule of symmetry or conservation that implies a unitary transformation for which I remains invariant. As an example, when applied to the smart measurement of the space-time coordinate of a particle, the symmetry used is that between position-time space and momentum-energy space. Then the unitary transformation is the Fourier one, and EPI derives the following: the equivalence of energy, momentum, and mass; the constancy of Planck's parameter h; and the Lagrangian that implies both the Klein-Gordon equation and the Dirac equation of quantum mechanics.

237 citations


Journal ArticleDOI
TL;DR: In this paper, the results of an experiment studying the choices of subjects playing mixed extensions of three variants of simple 2 × 2 non-constant sum, strictly competitive games of the same form (Matching Pennies) are presented.

216 citations


Journal ArticleDOI
TL;DR: In this article, the authors compare the use of Harsanyi and Selten's risk dominance and payoff dominance as equilibrium selection criteria, and present experimental evidence that suggests the existence of a payoff dominated risk dominant equilibrium is a necessary and sufficient condition for coordination failure.

186 citations


Journal ArticleDOI
A. Jorge Padilla1
TL;DR: In this paper, the degree of collusiveness of a market with consumer switching costs is analyzed in an infinite-horizon model of duopolistic competition, where firms compete for the demand for a homogeneous good by setting prices simultaneously in each period.

153 citations


Journal ArticleDOI
TL;DR: In this paper, the authors characterize the set of strategies that are stable with respect to a stochastic dynamic adaptive process in a finite two-player game played by a population of players.
Abstract: We add a round of pre-play communication to a finite two-player game played by a population of players. Pre-play communication is cheap talk in the sense that it does not directly enter the payoffs. The paper characterizes the set of strategies that are stable with respect to a stochastic dynamic adaptive process. Periodically players have an opportunity to change their strategy with a strategy that is more successful against the current population. Any strategy that weakly improves upon the current poorest performer in the population enters with positive probability. When there is no conflict of interest between the players, only the efficient outcome is stable with respect to these dynamics. For general games the set of stable payoffs is typically large. Every efficient payoff recurs infinitely often.

Journal ArticleDOI
TL;DR: In this article, the results of two experiments designed to study tacit coordination in a class of market entry games with linear payoff functions, binary decisions, and zero entry costs are reported, in which each of n = 20 players must decide on each trial whether or not to enter a market whose capacity is public knowledge.

Journal ArticleDOI
TL;DR: In this paper, a group of individuals repeatedly play a fixed extensive-form game, using past play to forecast future actions, and each player maximizes his own immediate expected payoff, believing that others' play corresponds to the historical frequencies of past play.

Journal ArticleDOI
TL;DR: In this article, it was shown that the negotiation game can have multiple perfect equilibria, including inefficient ones, provided that players are sufficiently patient, and the length of delay depends only on the payoff structure of the disagreement game and not on the discount factor.
Abstract: Rubinstein's alternating-offers bargaining model is enriched by assuming that players' payoffs in disagreement periods are determined by a normal form game. It is shown that such a model can have multiple perfect equilibria, including inefficient ones, provided that players are sufficiently patient. Delay is possible even though there is perfect information and the players are fully rational. The length of delay depends only on the payoff structure of the disagreement game and not on the discount factor. Not all feasible and individually rational payoffs of the disagreement game can be supported as average disagreement payoffs. Indeed, some negotiation games have a unique perfect equilibrium with immediate agreement.

Journal ArticleDOI
TL;DR: This work describes a randomized algorithm for the simple stochastic game problem that requires 2O(?n) expected operations for games with n vertices and is the first subexponential time algorithm for this problem.
Abstract: We describe a randomized algorithm for the simple stochastic game problem that requires 2O(?n) expected operations for games with n vertices. This is the first subexponential time algorithm for this problem.

Journal ArticleDOI
TL;DR: In this paper, order independent equilibria (OIE) were introduced for noncooperative games of coalition formation, based on an underlying game in coalitional form, and a strategy profile is an OIE if, for any specification of first movers in the sequential game, it remains an equilibrium and leads to the same payoff.

Journal Article
TL;DR: In this article, the authors characterized hyperstable (n(1,n(2)-solutions for repeated alternating-move 2 x 2 games with finite action spaces, where the memory capacity of the players has no influence on the set of solutions.

Journal ArticleDOI
TL;DR: In this article, the authors show that a (n 1, n 2 )-equilibrium solution always exists and is cyclical, and the memory capacity of the players has no influence on the set of solutions.

Journal ArticleDOI
TL;DR: In this paper, the authors studied repeated games with a finite number of players, a fixed number of actions, discounted payoffs, and perfect recall, and the players' initial expectations were given by a common prior distribution over player types, a type being a discount rate and payoff matrix.

Book
01 Jan 1995
TL;DR: In this paper, the authors propose a linear pursuit evasion game with a state-constraint for a highly maneuverable evader and a turnpike theory for infinite-horizon Open-Loop Differential Games with Decoupled Controls.
Abstract: I. Minimax control.- Expected Values, Feared Values, and Partial Information Optimal Control.- H?-Control of Nonlinear Singularly Perturbed Systems and Invariant Manifolds.- A Hybrid (Differential-Stochastic) Zero-Sum Game with Fast Stochastic Part.- H?-Control of Markovian Jump Systems and Solutions to Associated Piecewise-Deterministic Differential Games.- The Big Match on the Integers.- II. Pursuit evasion.- Synthesis of Optimal Strategies for Differential Games by Neural Networks.- A Linear Pursuit-Evasion Game with a State Constraint for a Highly Maneuverable Evader.- Three-Dimensional Air Combat: Numerical Solution of Complex Differential Games.- Control of Informational Sets in a Pursuit Problem.- Decision Support System for Medium Range Aerial Duels Combining Elements of Pursuit-Evasion Game Solutions with AI Techniques.- Optimal Selection of Observation Times in a Costly Information Game.- Pursuit Games with Costly Information: Application to the ASW Helicopter Versus Submarine Game.- Linear Avoidance in the Case of Interaction of Controlled Objects Groups.- III. Solution methods.- Convergence of Discrete Schemes for Discontinuous Value Functions of Pursuit-Evasion Games.- Undiscounted Zero Sum Differential Games with Stopping Times.- Guarantee Result in Differential Games with Terminal Payoff.- IV. Nonzero-sum games, theory.- Lyapunov Iterations for Solving Coupled Algebraic Riccati Equations of Nash Differential Games and the Algebraic Riccati Equation of Zero-Sum Games.- A Turnpike Theory for Infinite Horizon Open-Loop Differential Games with Decoupled Controls.- Team-Optimal Closed-Loop Stackelberg Strategies for Discrete-Time Descriptor Systems.- On Independence of Irrelevant Alternatives and Dynamic Programming in Dynamic Bargaining Games.- The Shapley Value for Differential Games.- V. Nonzero-sum games, applications.- Dynamic Game Theory and Management Strategy.- Endogenous Growth as a Dynamic Game.- Searching for Degenerate Dynamics in Animal Conflict Game Models involving Sexual Reproduction.

Journal ArticleDOI
TL;DR: In this article, the authors introduce a dynamical model of mutation in evolutionary games, in which all possible mixtures of n pure strategies are admitted, and the case of n = 2 pure strategies is investigated in detail.

Journal ArticleDOI
TL;DR: In this paper, it was shown that monotonicity of the best-reponse functions in a two-player game is not sufficient to derive predictions about the order of moves.

Journal ArticleDOI
TL;DR: In this article, a class of two-person games with ratio payoff functions can be solved using equivalent primal-dual linear programming formulations, which may be used to conduct the efficiency evaluation currently done by the CCR ratio model of Data Envelopment Analysis (DEA).
Abstract: This paper demonstrates that a class of two-person games with ratio payoff functions can be solved using equivalent primal-dual linear programming formulations The game’s solution contains specialized information which may be used to conduct the efficiency evaluation currently done by the CCR ratio model of Data Envelopment Analysis (DEA) Consequently a rigorous connection between DEA’s CCR model and the theory of games is established Interpretations of these new solutions are discussed in the context of current ongoing applications

Journal ArticleDOI
TL;DR: In this article, the authors consider the undiscounted repeated game obtained by the infinite repetition of such a two-player stage game and show that if supergame strategies are restricted to be computable within Church's thesis, the only pair of payoffs which survives any computable tremble with sufficiently large support is the Pareto-efficient pair.
Abstract: A common interest game is a game in which there exists a unique pair of payoffs which strictly Pareto-dominates all other payoffs. We consider the undiscounted repeated game obtained by the infinite repetition of such a two-player stage game. We show that if supergame strategies are restricted to be computable within Church's thesis, the only pair of payoffs which survives any computable tremble with sufficiently large support is the Pareto-efficient pair. The result is driven by the ability of the players to use the early stages of the game to communicate their intention to play cooperatively in the future.

Journal ArticleDOI
TL;DR: A survey of stochastic games in queues, where both tools and applications are considered, and the structural properties of best policies of the controller, worst-case policies of nature, and of the value function are illustrated.
Abstract: Zero-sum stochastic games model situations where two persons, called players, control some dynamic system, and both have opposite objectives. One player wishes typically to minimize a cost which has to be paid to the other player. Such a game may also be used to model problems with a single controller who has only partial information on the system: the dynamic of the system may depend on some parameter that is unknown to the controller, and may vary in time in an unpredictable way. A worst-case criterion may be considered, where the unknown parameter is assumed to be chosen by “nature” (called player 1), and the objective of the controller (player 2) is then to design a policy that guarantees the best performance under worst-case behaviour of nature. The purpose of this paper is to present a survey of stochastic games in queues, where both tools and applications are considered. The first part is devoted to the tools. We present some existing tools for solving finite horizon and infinite horizon discounted Markov games with unbounded cost, and develop new ones that are typically applicable in queueing problems. We then present some new tools and theory of expected average cost stochastic games with unbounded cost. In the second part of the paper we present a survey on existing results on worst-case control of queues, and illustrate the structural properties of best policies of the controller, worst-case policies of nature, and of the value function. Using the theory developed in the first part of the paper, we extend some of the above results, which were known to hold for finite horizon costs or for the discounted cost, to the expected average cost.

Journal ArticleDOI
TL;DR: In this article, an analytic expression for the integral of the option payoffs over the conditional density is available, and the remaining integration amounts to valuing the payoff function given by the results of the first integration.
Abstract: Spread options are options whose payoff is based on the difference in the prices of two underlying assets. The price of a spread option is the (discounted) double integral of the option payoffs over the risk-neutral joint distribution of the terminal prices of the two underlying assets. Analytic expressions for the values of spread puts and calls in a Black-Scholes environment are not known, and various numerical algorithms must be used. This article presents an accurate and efficient approach for pricing European-style spread options on equities, foreign currencies, and commodities. The key to the approach is to recognize that the joint density of the terminal prices of the underlying assets can be factored into the product of univariate marginal and conditional densities, and that an analytic expression for the integral of the option payoffs over the conditional density is available. The remaining integration amounts to valuing the payoff function given by the results of the first integration. This payoff function is approximated by a portfolio of ordinary puts and calls, and valued accordingly. The approach is more accurate than existing bivariate binomial schemes, and fast enough for practical applications. It also allows for accurate and efficient computation of the partial derivatives of the option price, i.e., the Greek letter risks.

Book ChapterDOI
24 Aug 1995
TL;DR: A pseudopolynomial time algorithm for the solution of mean payoff games, a family of perfect information games introduced by Ehrenfeucht and Mycielski, the decision problem for which is in NP ∩ co-NP.
Abstract: We study the complexity of finding the values and optimal strategies of mean payoff games, a family of perfect information games introduced by Ehrenfeucht and Mycielski. We describe a pseudopolynomial time algorithm for the solution of such games, the decision problem for which is in NP ∩ co-NP. Finally, we describe a polynomial reduction from mean payoff games to the simple stochastic games studied by Condon. These games are also known to be in NP ∩ co-NP, but no polynomial or pseudo-polynomial time algorithm is known for them.

Journal ArticleDOI
TL;DR: In this paper, the degree of attainment of a fuzzy goal for games in fuzzy and multiobjective environments is examined and the equilibrium solution with respect to the degree is defined in terms of the degree this paper.
Abstract: Equilibrium solutions in terms of the degree of attainment of a fuzzy goal for games in fuzzy and multiobjective environments are examined. We introduce a fuzzy goal for a payoff in order to incorporate ambiguity of human judgments and assume that a player tries to maximize his degree of attainment of the fuzzy goal. A fuzzy goal for a payoff and the equilibrium solution with respect to the degree of attainment of a fuzzy goal are defined. Two basic methods, one by weighting coefficients and the other by a minimum component, are employed to aggregate multiple fuzzy goals. When the membership functions are linear, computational methods for the equilibrium solutions are developed. It is shown that the equilibrium solutions are equal to the optimal solutions of mathematical programming problems in both cases. The relations between the equilibrium solutions for multiobjective bimatrix games incorporating fuzzy goals and the Pareto-optimal equilibrium solutions are considered.

Journal ArticleDOI
TL;DR: In an anonymous game, the payoff of a player depends upon the player's own action and the action distribution of all the players as discussed by the authors, and it is shown that if the set of actions is finite, or countably infinite and compact then there is a symmetric equilibrium distribution.

Proceedings ArticleDOI
23 Oct 1995
TL;DR: This work examines games and adversaries for which the learning algorithm's past actions may strongly affect the adversary's future willingness to "cooperate" (that is, permit high payoff), and therefore require carefully planned actions on the part of the learning algorithms.
Abstract: We examine the problem of learning to play various games optimally against resource-bounded adversaries, with an explicit emphasis on the computational efficiency of the learning algorithm. We are especially interested in providing efficient algorithms for games other than penny-matching (in which payoff is received for matching the adversary's action in the current round), and for adversaries other than the classically studied finite automata. In particular, we examine games and adversaries for which the learning algorithm's past actions may strongly affect the adversary's future willingness to "cooperate" (that is, permit high payoff), and therefore require carefully planned actions on the part of the learning algorithm. For example, in the game we call contract, both sides play O or 1 on each round, but our side receives payoff only if we play 1 in synchrony with the adversary; unlike penny-matching, playing O in synchrony with the adversary pays nothing. The name of the game is derived from the example of signing a contract, which becomes valid only if both parties sign (play 1).