scispace - formally typeset
Search or ask a question

Showing papers on "Stochastic game published in 1986"


Journal ArticleDOI
TL;DR: This article showed that the Folk Theorem always holds in two-player games with no discounting at all, and that it always holds even in the case of infinite repeated games with two players.
Abstract: When either there are only two players or a "full dimensionality" condition holds, any individually rational payoff vector of a one-shot game of complete information can arise in a perfect equilibrium of the infinitely-repeated game if players are sufficiently patient. In contrast to earlier work, mixed strategies are allowed in determining the individually rational payoffs (even when only realized actions are observable). Any individually rational payoffs of a one-shot game can be approximated by sequential equilibrium payoffs of a long but finite game of incomplete information, where players' payoffs are almost certainly as in the one-shot game. THAT STRATEGIC RIVALRY in a long-term relationship may differ from that of a one-shot game is by now quite a familiar idea. Repeated play allows players to respond to each other's actions, and so each player must consider the reactions of his opponents in making his decision. The fear of retaliation may thus lead to outcomes that otherwise would not occur. The most dramatic expression of this phenomenon is the celebrated "Folk Theorem" for repeated games. An outcome that Pareto dominates the minimax point is called individually rational. The Folk Theorem asserts that any individually rational outcome can arise as a Nash equilibrium in infinitely repeated games with sufficiently little discounting. As Aumann and Shapley [3] and Rubinstein [20] have shown, the same result is true when we replace the word "Nash" by "(subgame) perfect" and assume no discounting at all. Because the Aumann-Shapley/Rubinstein result supposes literally no discounting, one may wonder whether the exact counterpart of the Folk Theorem holds for perfect equilibrium, i.e., whether as the discount factor tends to one, the set of perfect equilibrium outcomes converges to the individually rational set. After all, agents in most games of economic interest are not completely patient; the no discounting case is of interest as an approximation. It turns out that this counterpart is false. There can be a discontinuity (formally, a failure of lower hemicontinuity) where the discount factor, 8, equals one, as we show in Example 3. Nonetheless the games in which discontinuities occur are quite degenerate, and, in the end, we can give a qualified "yes" (Theorem 2) to the question of whether the Folk Theorem holds with discounting. In particular, it always holds in two-player games (Theorem 1). This last result contrasts with the recent work of Radner-Myerson-Maskin [18] showing that, even in two-player games, the equilibrium set may not be continuous at 8 = 1 in

1,909 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied two-person supergames where each player is restricted to carry out his strategies by finite automata and showed that cooperation cannot be the outcome of a solution of the infinitely repeated prisoner's dilemma.

635 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present an example of a repeated partnership game with imperfect monitoring in which all supergame equilibria with positive discount rates are bounded away from full efficiency uniformly in the discount rate, provided the latter is strictly positive.
Abstract: In this note we present an example of a repeated partnership game with imperfect monitoring in which all supergame equilibria with positive discount rates are bounded away from full efficiency uniformly in the discount rate, provided the latter is strictly positive. On the other hand, if the players do not discount the future, then every efficient one-period payoff vector that dominates the one-period equilibrium payoff vector can be attained by an equilibrium of the repeated game. Thus the correspondence that maps the players' discount rate into the corresponding set of repeated-game equilibrium payoff vectors is discontinuous at the point at which the discount rate is zero.

191 citations


Journal ArticleDOI
TL;DR: This work proves several properties of the sets of feasible payoffs and Nash equilibrium payoffs for the n-stage game and for the λ-discounted game and determines the set of equilibrium payoff for the Prisoner's Dilemma corresponding to the critical value of the discount factor.
Abstract: We consider N person repeated games with complete information and standard signalling. We first prove several properties of the sets of feasible payoffs and Nash equilibrium payoffs for the n-stage game and for the λ-discounted game. In the second part we determine the set of equilibrium payoffs for the Prisoner's Dilemma corresponding to the critical value of the discount factor.

177 citations


Journal ArticleDOI
Roy Radner1
TL;DR: In this paper, the long-run average of a player's expected one-period utilities in a partnership game is used to sustain Nash equilibria of the supergame, even if the players cannot observe other players' actions or information, but can only observe the resulting consequences.
Abstract: In a partnership game, each player's utility depends on the other players' actions through a commonly observed consequence (e.g. output, profit, price), which is itself a function of the players' actions and an exogenous stochastic environment. If a partnership game is repeated infinitely, and each player's payoff in the infinite game (supergame) is the long-run average of his expected one-period utilities, then efficient combinations of one-period actions can be sustained as Nash equilibria of the supergame even if the players cannot observe other players' actions or information, but can only observe the resulting consequences.

175 citations


Journal ArticleDOI
TL;DR: In this article, a new model of a cooperative game with a continuum of players is developed, in which only finite coalitions -one containing only finite numbers of players -are permitted to form.

102 citations


Journal ArticleDOI
TL;DR: It is shown that, if updating is done in sufficiently small steps, the group will converge to the policy that maximizes the long-term expected reward per step.
Abstract: The principal contribution of this paper is a new result on the decentralized control of finite Markov chains with unknown transition probabilities and rewords. One decentralized decision maker is associated with each state in which two or more actions (decisions) are available. Each decision maker uses a simple learning scheme, requiring minimal information, to update its action choice. It is shown that, if updating is done in sufficiently small steps, the group will converge to the policy that maximizes the long-term expected reward per step. The analysis is based on learning in sequential stochastic games and on certain properties, derived in this paper, of ergodic Markov chains. A new result on convergence in identical payoff games with a unique equilibrium point is also presented.

102 citations


Journal ArticleDOI
TL;DR: For example, this article showed that for nonatomic games with strategy sets in a Banach space, the Radon-Nikodym property is not satisfied for the infinite-dimensional spaces typically studied in economics; see Bewley (2) and Mas-Colell (18).
Abstract: Schmeidler's results on the equilibrium points of nonatomic games with strategy sets in Euclidean «-space are generalized to nonatomic games with strategy sets in a Banach space Our results also extend previous work of the author which assumed the Banach space to be separable and its dual to possess the Radon-Nikodym property Our proofs use recent results in functional analysis 1 Introduction In this paper we study Cournot-Nash equilibria of games with a continuum of players, each of whom has a strategy set in a Banach space The importance of such games for economic theory has been recently underscored by Dubey, Mas-Colell and Shubik (7); see their Example 2, in particular However, (7) is primarily concerned with the equivalence of Walrasian and Cournot-Nash equilibria Our focus here is on the existence theory The results reported here can be seen as a continuation of research initiated by Schmeidler (19), who formulated and studied nonatomic games over zz-dimensional Euclidean space They also extend earlier work of the author in at least three important respects To begin with, (14) assumed that each player's strategy set lies in a separable Banach space whose dual has the Radon-Nikodym property This condition is not satisfied for the infinite-dimensional spaces typically studied in economics; see Bewley (2) and Mas-Colell (18) It is of interest that such an assumption can be relaxed at the cost of a rather mild uniformity assumption on the distribution of strategy sets Secondly, it seems desirable to have results for con- jugate Banach spaces which are not separable and for which the strategy sets are not weakly compact but only norm bounded and weak* closed We also present such results here They are especially relevant for commodity spaces studied in (2 and 18) Finally, unlike (14), our results on the existence of mixed-strategy equilibria rely only on preference orderings rather than on payoff functions It is not known to us whether our conditions on preferences are sufficient for them to be represented as jointly measurable payoff functions Such theorems are difficult to obtain even for a finite-dimensional setting; see Wesley (21)

58 citations


Journal ArticleDOI
TL;DR: In this paper, a non-zero sum stochastic game is given where the set of Nash Equilibrium Payoffs in the finitely repeated game and in the game with discount factor is reduced to the threat point and the corresponding set for the infinitely repeated game is disjoint from this point.
Abstract: An example of a non-zero sum stochastic game is given where: i) the set of Nash Equilibrium Payoffs in the finitely repeated game and in the game with discount factor is reduced to the threat point; ii) the corresponding set for the infinitely repeated game is disjoint from this point and equals the set of feasible, individually rational and Pareto optimal payoffs.

51 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that competitive equilibrium can be seen as the limit of a hierarchical game in which the rights (or information) of the players can be strictly ordered.

42 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider an auction or fair division game where every bidder knows his true value of the single object but is only incompletely informed about the true values of his competitors.
Abstract: Consider an auction or fair division game where every bidder knows his true value of the single object but is only incompletely informed about the true values of his competitors. By imposing the axiom of envy freeness with respect to stated preferences the set of pricing rules is restricted to the prices between the highest and second highest bid. Whereas for auctions one also can satisfy incentive compatibility, the same is not true for fair division games. We analyse and compare the different pricing rules, partly incentive compatible and partly not, by deriving the optimal bidding strategies. By comparing the payoff expectations induced by the various pricing rules we can prove directly a special equivalence statement saying that expected payoffs do not depend on the pricing rule. It is interesting that in fair division games equivalence of pricing rules is only valid if information is sufficiently incomplete.


Journal ArticleDOI
TL;DR: In this paper, the authors considered a differential game where players accumulate capital, their payoff functions depend upon the capital stocks of both players and their cost functions are convex, and they showed that every equilibrium of the infinite horizon game converges to the unique stationary equilibrium.

Book ChapterDOI
TL;DR: Various algorithms for numerical solutions of discounted stochastic games and a new mathematical programming formulation which permits the numerical solution of a game by using a non-linear programming code is presented.

Book ChapterDOI
01 Jan 1986
TL;DR: The paradox of Prisoner’s Dilemma is that each player gets — independent from what the other plays — a higher payoff when playing D, which is a so called dominant strategy.
Abstract: The simplest model of a conflict between two parties is a 2 × 2 game. Each player has two strategies, say c (to cooperate) and d (to defect). If one takes only the rank order of the payoffs into account there are 78 nonequivalent 2 × 2 games (Rapoport, Guyer, Gordon 1976). The most extensively analysed of these 78 is Prisoner’s Dilemma (see Table I). Following the taxonomy of Rapoport (Rapoport et al. 1976) we define a 2 × 2 game as Prisoner’s Dilemma, whenever $$T > R > P > S$$ (1) The paradox of Prisoner’s Dilemma is that each player gets — independent from what the other plays — a higher payoff when playing D. To defect is a so called dominant strategy. But when both players act individually rational and chose D both suffer from their individual rationality. Mutual cooperation would be collectively rational. The most outstanding situation which has (on an abstract level) such a structure is the arms race between the two superpowers.

Journal ArticleDOI
TL;DR: It is shown that an undiscounted stochastic game possesses optimal stationary strategies if and only if a global minimum with objective value zero can be found to an appropriate nonlinear program with linear constraints.
Abstract: We show that an undiscounted stochastic game possesses optimal stationary strategies if and only if a global minimum with objective value zero can be found to an appropriate nonlinear program with linear constraints. This nonlinear program arises as a method for solving a certain bilinear system, satisfaction of which is also equivalent to finding a stationary optimal solution for the game. The objective function of the program is a nonnegatively valued quadric polynomial.

Book
01 Jan 1986
TL;DR: In this paper, a general description of non-coincident games with non-opposed interests is presented. But this model does not take into account Mutual Information of Co-players about their interests.
Abstract: I / A General Description of Models with Non-Coincident Interests.- 1. Typical Examples of Games with Non-opposed Interests.- 2. General Description of Noncooperative Games Taking into Account Mutual Information of Co-players About Their Interests.- 3. Situations, Strategies, and Players? Knowledge About Each Other?s Moves.- 4. On the Problem of Rational Choice of Strategy.- 5. Exchange of Information and Extension of the Strategy Concept.- 6. Formal Description of Compromises and Coalitions.- II / Some Principles Involved in Choosing Rational Strategies.- 7. Optimization and Averaging.- 8. The Maximin.- 9. Absolutely Optimal Strategies and Penalizing Strategies.- 10. The Best-Guaranteed Payoff Under Exchange of Information and Fixed Order of Moves.- 11. Equilibria.- 12. Advantages and Disadvantages of Coalitions.- 13. Stability of Coalitional Decisions in Repetitive Games.- III / The Guaranteed Payoff Principle in Two-Person Games.- 14. Qualitative Games.- 15. Games with Excluded Situations.- 16. Games with Fixed Order of Moves Without Excluded Situations.- 17. Exact Information About the Co-player?s Choice: Inexact Information About His Interests.- 18. Player 1 Ignorant of Player 2?s Move.- 19. The Problem of the Maximal Guaranteed Payoff and Approximation of Games.- 20. Other Cases Involving State of Knowledge Metagames.- 21. Games with Auxiliary Performance Functions.- 22. Dynamics of Two-person Games.- 23. Remarks.- IV / Some Game Models for Multiple Players.- 24. On the Theory of Three-person Games.- 25. Equilibria and Stable Collective Decisions in Repeated Games.- 26. Side Payments as Means of Control in Hierarchical Systems.

Journal ArticleDOI
TL;DR: In this article, a Borel space framework for zero-sum discrete-time stochastic games is introduced, which is a game theoretic extension of some nonstationary dynamic programming models in the sense of Hinderer.

Journal ArticleDOI
TL;DR: In this article, a general minimax theorem whose assumptions and conclusions are phrased only in terms of the data of the problem is given. But this theorem requires that players have isin -optimal strategies with finite support, both because those are the easiest to describe in intrinsic terms, and because in any game where the value would not exist in strategies with infinite support, all known general minmax theorems implicitly select as ''value'' either the sup inf or the inf sup by in effect restricting either player I or player II arbitrarily to strategies having finite support.
Abstract: The author's aim is to get a `general minimax theorem' whose assumptions and conclusions are phrased only in terms of the data of the problem, i.e., the pair of pure strategy sets /b S/ and /b T/ and the payoff function on /b S/*/b T/. For the assumptions, this means that one wants to avoid any assumption of the type `there exists a topology (or a measurable structure) on /b S/ and (or) /b T / such that ... ' For the conclusions, one is led to require that players have isin -optimal strategies with finite support, both because those are the easiest to describe in intrinsic terms, and because in any game where the value would not exist in strategies with finite support, all known general minmax theorems implicitly select as `value' either the sup inf or the inf sup by in effect restricting either player I or player II arbitrarily to strategies with finite support-so that the resulting `value' is completely arbitrary and misleading.

Journal ArticleDOI
01 May 1986
TL;DR: The notion of reward scheme safety is formalized, an asymptotically safe reward scheme is exhibited, and its safety is demonstrated, an analog of the proof of Fisher's fundamental theorem of natural selection.
Abstract: The reward-allocation problem for production systems in delayed-payoff situations is formalized in a conceptual model in which the environment of the system is a finite automaton. The environment state and the state of the system's local memory determine which productions are in the current conflict set. Productions are selected from the conflict set with probabilities proportional to their activations. Each selected production updates the local memory and furnishes the next input symbol to the environment. A reward scheme examines the payoff that is output by the environment and adjusts the activations in an attempt to increase average payoff per unit time. A reward scheme is safe if it is generally biased towards improvement. The notion of reward scheme safety is formalized, an asymptotically safe reward scheme is exhibited, and its safety is demonstrated. The demonstration is an analog of the proof of Fisher's fundamental theorem of natural selection.

Journal ArticleDOI
TL;DR: This paper attempts to decompose a game v into two different components, one lying in the null space of the Shapley value and the other in its orthogonal complement, and arrives at a new explicit formula for the Shapleys value which, unlike the typical one, involves averages of player worths across coalition sizes.
Abstract: In this paper we attempt to decompose a game v into two different components, one lying in the null space of the Shapley value and the other in its orthogonal complement. We observe that the Shapley value of the former must be 0, so that the Shapley value of the latter coincides with the value of the original game. In this way we arrive at a new explicit formula for the Shapley value which, unlike the typical one, involves averages of player worths across coalition sizes. Central to our ideas is the game-theoretic contrast between the spaces in which each component lies.

Journal ArticleDOI
TL;DR: For an alternating offer bargaining game with an uncertain horizon, the equilibrium payoff to a player is greater, the less risk averse is he relative to his opponent, and the equilibrium partition converges to the Nash bargaining solution as the horizon becomes infinite in length as discussed by the authors.

Journal ArticleDOI
TL;DR: In this paper, it was shown that an optimal solution to an appropriately constructed quadratic program provides a stationary Nash equilibrium point of a two-person, general-sum, single-controller stochastic game.

Journal ArticleDOI
TL;DR: In this article, the authors considered the D-solution, which is a more general concept than usual Pareto minimum solution in a game process, and they used a convex cone instead of the nonnegative orthant R to define the domination structure for a multiobjective decision problem.

Journal ArticleDOI
TL;DR: In this article, the authors defined a game form with perfect information (GFPI) as a game in which players are assumed to choose only pure strategies, and the outcome space is the set of all endpoints of the game tree.
Abstract: A game in extensive form with perfect information and without chance moves is defined as a triple (K, P, h) of a finite game treeK, a player partitionP, and a payoff functionh. The pair (K, P) is called a game form with perfect information, or simply a GFPI. Here, the outcome space is the set of all endpoints of the treeK. Cooperative behavior in game (K, P, h) is studied. Players are assumed to choose only pure strategies. Moulin has shown that the α-core and the β-core of game (K, P, h) are identical; they are, therefore, called the core of (K, P,h). A GFPI (K, P) is called stable if for everyh the game (K, P, h) has a nonempty core. A complete characterization of the class of all stable GFPI's is provided in this paper.

Book ChapterDOI
01 Jan 1986
TL;DR: In this chapter the activities of a safeguards authority are modeled in terms of a noncooperative two-person game with the authority as the first and the operator as the second player.
Abstract: In this chapter the activities of a safeguards authority are modeled in terms of a noncooperative two-person game with the authority as the first and the operator as the second player. Both for sequential and for nonsequential safeguards procedures it is shown that the general game can be analyzed by means of two auxiliary games. In the nonsequential case the first auxiliary game no longer contains the payoff parameters of the two players, and thus, dealing only with random sampling and measurement error problems, provides solutions which are suited for practical applications. In the sequential case the situation is more complicated: Only under rather restrictive assumptions does one again get the independence of the payoff parameter values.

Journal ArticleDOI
TL;DR: In this article, a stochastic pursuit-evasion differential game involving two players, E and P, moving in the plane is considered, where player E (the evader) has complete observation of the position and velocity of player P, whereas player P (the pursuer) can measure the distanced (P, E) between P and E but receives noise-corrupted measurements of the bearing β of E from P.
Abstract: A stochastic pursuit-evasion differential game involving two players, E and P, moving in the plane is considered. It is assumed that player E (the evader) has complete observation of the position and velocity of player P, whereas player P (the pursuer) can measure the distanced (P, E) between P and E but receives noise-corrupted measurements of the bearing β of E from P. Three cases are dealt with: (a) using the noise-corrupted measurements of β, player P applies the proportional navigation guidance law; (b) P has complete observation ofd (P, E) and β (this case is treated for the sake of completeness); (c) using the noise-corrupted measurements of β, P applies an erroneous line-of sight guidance law. For each of the cases, sufficient conditions on optimal strategies are derived. In each of the cases, these conditions require the solution of a nonlinear partial differential equation on a in ℝ2. Finally, optimal strategies are computed by solving the corresponding equations numerically.

Journal ArticleDOI
TL;DR: Four stochastic pursuit-evasion differential games involving two players, P and E, moving in the plane are considered and the results indicate that the correct measurement of the direction of the segment PE is more important than the measurement ofThe distance (P, E).
Abstract: Four stochastic pursuit-evasion differential games involving two players, P and E, moving in the plane are considered. The difference between the games lies in their information structures. In each of the games, sufficient conditions on optimal feedback strategies, in the cases of complete information, and on weak optimal feedback strategies, in the cases of incomplete information, are derived. Optimal strategies are computed for the cases of complete information and weak suboptimal strategies for the cases of incomplete information. The results indicate that the correct measurement of the direction of the segment PE is more important than the measurement of the distance (P, E).

Journal ArticleDOI
TL;DR: In this paper, the effect of perturbations on the solution and values of a two-player non-zero-sum bimatrix game is analyzed. But the perturbation is not considered in this paper.

Journal ArticleDOI
TL;DR: It is proved that for each player there exists a nonempty set of easy initial states for which the player possesses an optimal stationary strategy for stochastic games with finite state and action spaces.
Abstract: This paper deals with undiscounted infinite stage two-person zero-sum stochastic games with finite state and action spaces. It was recently shown that such games possess a value. But in general there are no optimal strategies. We prove that for each player there exists a nonempty set of easy initial states, i.e., starting states for which the player possesses an optimal stationary strategy. This result is proved with the aid of facts derived by Bewley and Kohlberg for the limit discount equation for stochastic games.