What is the probability distribution of regret matching?

In regret matching, once player computes his average regret for each action , he chooses an action , , according to the probability distribution defined asfor any , provided that the denominator above is positive; otherwise, is the uniform distribution over .

What is the proof of Proposition 3.1?

Theorem 3.1: In any finite generalized ordinal potential game in which no player is indifferent between distinct strategies as in Assumption 2.2, the action profiles generated by a fading memory JSFP with inertia process satisfying Assumption 2.1 converge to a pure Nash equilibrium almost surely.

What is the total congestion experienced by all drivers on the network?

The total congestion experienced by all drivers on the network isDefine a new congestion game where each driver’s utility takes the formwhere is the toll imposed on road which is a function of the number of users of road .

What is the resulting congestion game with tolls?

When the tolling scheme set forth in Proposition 4.1 is applied to the congestion game example considered previously, the resulting congestion game with tolls is a potential game in which no player is indifferent between distinct strategies.

(Open Access) Joint Strategy Fictitious Play With Inertia for Potential Games (2009) | Jason R. Marden

Q: What contributions have the authors mentioned in the paper "Joint strategy fictitious play with inertia for potential games" ?

The authors consider multi-player repeated games involving a large number of players with large strategy spaces and enmeshed utility structures. The authors will show that Joint Strategy Fictitious Play ( JSFP ), a close variant of FP, alleviates both the informational and computational burden of FP. Furthermore, the authors introduce JSFP with inertia, i. e., a probabilistic reluctance to change strategies, and establish the convergence to a pure Nash equilibrium in all generalized ordinal potential games in both cases of averaged or exponentially discounted historical data. The authors illustrate JSFP with inertia on the specific class of congestion games, a subset of generalized ordinal potential games. In particular, the authors illustrate the main results on a distributed traffic routing problem and derive tolling procedures that can lead to optimized total traffic congestion.

Q: What is the expected utility of a player’s action in a JSFP game?

When written in this form, JSFP appears to have a computational burden for each player that is even higher than that of FP, since tracking the empirical frequencies of the joint actions of the other players is more demanding for playerthan tracking the empirical frequencies of the actions of the other players individually, where denotes the set of probability distributions on a finite set .

Q: What is the average utility of a player if they had used the same actions?

Substituting (4) into (5) results inwhich is the average utility player would have received if action had been chosen at every stage up to time and other players used the same actions.

208 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 2, FEBRUARY 2009

Joint Strategy Fictitious Play With

Inertia for Potential Games

Jason R. Marden, Gürdal Arslan, and Jeff S. Shamma

Abstract—We consider multi-player repeated games involving a

large number of players with large strategy spaces and enmeshed

utility structures. In these “large-scale” games, players are inher-

ently faced with limitations in both their observational and com-

putational capabilities. Accordingly, players in large-scale games

need to make their decisions using algorithms that accommodate

limitations in information gathering and processing. This disquali-

ﬁes some of the well known decision making models such as “Ficti-

tious Play” (FP), in which each player must monitor the individual

actions of every other player and must optimize over a high dimen-

sional probability space. We will show that Joint Strategy Fictitious

Play (JSFP), a close variant of FP, alleviates both the informational

and computational burden of FP. Furthermore, we introduce JSFP

with inertia, i.e., a probabilistic reluctance to change strategies, and

establish the convergence to a pure Nash equilibrium in all general-

ized ordinal potential games in both cases of averaged or exponen-

tially discounted historical data. We illustrate JSFP with inertia on

the speciﬁc class of congestion games, a subset of generalized or-

dinal potential games. In particular, we illustrate the main results

on a distributed trafﬁc routing problem and derive tolling proce-

dures that can lead to optimized total trafﬁc congestion.

Index Terms—Fictitious play (FP), joint strategy ﬁctitious play

(JSFP).

I. INTRODUCTION

E consider “large-scale” repeated games involving

a large number of players, each of whom selects a

strategy from a possibly large strategy set. A player’s reward,

or utility, depends on the actions taken by all players. The

game is repeated over multiple stages, and this allows players

to adapt their strategies in response to the available information

gathered over prior stages. This setup falls under the general

subject of “learning in games” [2], [3], and there are a variety

of algorithms and accompanying analysis that examine the long

term behavior of these algorithms.

Manuscript received December 07, 2006. Current version published Feb-

ruary 11, 2009. This work was supported by NSF Grants CMS-0339228,

ECS-0501394, and ECCS-0547692, and ARO Grant W911NF-04-1-0316. This

paper appeared in part at the 44th IEEE Conference on Decision and Control,

2005. Recommended by Associate Editor F. Bullo.

J. R. Marden is with the Social and Information Sciences Laboratory,

California Institute of Technology, Pasadena, CA 91107 USA (e-mail:

marden@caltech.edu).

G. Arslan is with the Department of Electrical Engineering, University of

Hawaii at Manoa, Honolulu, HI 96822 USA (e-mail: gurdal@hawaii.edu).

J. S. Shamma is with the School of Electrical and Computer Engineering,

Georgia Institute of Technology, Atlanta, GA 30332-0250 USA (e-mail:

shamma@gatech.edu.

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TAC.2008.2010885

In large-scale games players are inherently faced with limi-

tations in both their observational and computational capabil-

ities. Accordingly, players in such large-scale games need to

make their decisions using algorithms that accommodate lim-

itations in information gathering and processing. This limits

the feasibility of different learning algorithms. For example,

the well-studied algorithm “Fictitious Play” (FP) requires in-

dividual players to individually monitor the actions of other

players and to optimize their strategies according to a proba-

bility distribution function over the joint actions of other players.

Clearly, such information gathering and processing is not fea-

sible in a large-scale game.

The main objective of this paper [1] is to study a variant of FP

called Joint Strategy Fictitious Play (JSFP) [2], [4], [5]. We will

argue that JSFP is a plausible decision making model for certain

large-scale games. We will introduce a modiﬁcation of JSFP to

include inertia, in which there is a probabilistic reluctance of

any player to change strategies. We will establish that JSFP with

inertia converges to a pure Nash equilibrium for a class of games

known as generalized ordinal potential games, which includes

so-called congestion games as a special case [6].

Our motivating example for a large-scale congestion game is

distributed trafﬁc routing [7], in which a large number of vehi-

cles make daily routing decisions to optimize their own objec-

tives in response to their own observations. In this setting, ob-

serving and responding to the individual actions of all vehicles

on a daily basis would be a formidable task for any individual

driver. A more realistic measurement on the information tracked

and processed by an individual driver is the daily aggregate con-

gestion on the roads that are of interest to that driver [8]. It turns

out that JSFP accommodates such information aggregation.

We will now review some of the well known decision making

models and discuss their limitations in large-scale games. See

the monographs [2], [3], [9]–[11] and survey article [12] for a

more comprehensive review.

The well known FP algorithm requires that each player views

all other players as independent decision makers [2]. In the FP

framework, each player observes the decisions made by all other

players and computes the empirical frequencies (i.e. running av-

erages) of these observed decisions. Then, each player best re-

sponds to the empirical frequencies of other players’ decisions

by ﬁrst computing the expected utility for each strategy choice

under the assumption that the other players will independently

make their decisions probabilistically according to the observed

empirical frequencies. FP is known to be convergent to a Nash

equilibrium in potential games, but need not converge for other

classes of games. General convergence issues are discussed in

[13]–[15].

The paper [16] introduces a version of FP, called “sampled

FP”, that seeks to avoid computing an expected utility based on

MARDEN et al.: JOINT STRATEGY FICTITIOUS PLAY WITH INERTIA 209

the empirical frequencies, because for large scale games, this

expected utility computation can be prohibitively demanding.

In sampled FP, each player selects samples from the strategy

space of every other player according to the empirical frequen-

cies of that player’s past decisions. A player then computes an

average utility for each strategy choice based off of these sam-

ples. Each player still has to observe the decisions made by all

other players to compute the empirical frequencies of these ob-

served decisions. Sampled FP is proved to be convergent in iden-

tical interest games, but the number of samples needed to guar-

antee convergence grows unboundedly.

There are convergent learning algorithms for a large class of

coordination games called “weakly acyclic” games [9]. In adap-

tive play [17] players have ﬁnite recall and respond to the re-

cent history of other players. Adaptive play requires each player

to track the individual behavior of all other players for recall

window lengths greater than one. Thus, as the size of player

memory grows, adaptive play suffers from the same computa-

tional setback as FP.

It turns out that there is a strong similarity between the JSFP

discussed herein and the regret matching algorithm [18]. A

player’s regret for a particular choice is deﬁned as the differ-

ence between 1) the utility that would have been received if

that particular choice was played for all the previous stages and

2) the average utility actually received in the previous stages.

A player using the regret matching algorithm updates a regret

vector for each possible choice, and selects actions according

to a probability proportional to positive regret. In JSFP, a player

chooses an action by myopically maximizing the anticipated

utility based on past observations, which is effectively equiv-

alent to regret modulo a bias term. A current open question is

whether player choices would converge in coordination-type

games when all players use the regret matching algorithm

(except for the special case of two-player games [19]). There

are ﬁnite memory versions of the regret matching algorithm

and various generalizations [3], such as playing best or better

responses to regret over the last

stages, that are proven to

be convergent in weakly acyclic games when players use some

sort of inertia. These ﬁnite memory algorithms do not require

each player to track the behavior of other players individually.

Rather, each player needs to remember the utilities actually

received and the utilities that could have been received in the

last

stages. In contrast, a player using JSFP best responds

according to accumulated experience over the entire history by

using a simple recursion which can also incorporate exponential

discounting of the historical data.

There are also payoff based dynamics, where each player ob-

serves only the actual utilities received and uses a Reinforce-

ment Learning (RL) algorithm [20], [21] to make future choices.

Convergence of player choices when all players use an RL-like

algorithm is proved for identical interest games [22]–[24] as-

suming that learning takes place at multiple time scales. Finally,

the payoff based dynamics with ﬁnite-memory presented in [25]

leads to a Pareto-optimal outcome in generic common interest

games.

Regarding the distributed routing setting of Section IV, there

are papers that analyze different routing strategies in conges-

tion games with “inﬁnitesimal” players, i.e., a continuum of

players as opposed to a large, but ﬁnite, number of players. Ref-

erences [26]–[28] analyze the convergence properties of a class

of routing strategies that is a variation of the replicator dynamics

in congestion games, also referred to as symmetric games, under

a variety of settings. Reference [29] analyzes the convergence

properties of no-regret algorithms in such congestion games and

also considers congestion games with discrete players, as con-

sidered in this paper, but the results hold only for a highly struc-

tured symmetric game.

The remainder of the paper is organized as follows. Section II,

sets up JSFP and goes on to establish convergence to a pure

Nash equilibrium for JSFP with inertia in all generalized or-

dinal potential games. Section III presents a fading memory

variant of JSFP, and likewise establishes convergence to a pure

Nash equilibrium. Section IV presents an illustrative example

for trafﬁc congestion games. Section IV goes on to illustrate the

use of tolls to achieve a socially optimal equilibrium and derives

conditions for this equilibrium to be unique. Finally, Section V

presents some concluding remarks.

II. J

OINT

STRATEGY

FICTITIOUS

PLAY

WITH INERTIA

A. Setup

Consider a ﬁnite game with

-player set

where each player has an action set and a utility

function

where .

For

, let denote the proﬁle of

player actions other than player

, i.e.,

With this notation, we will sometimes write a proﬁle of actions

. Similarly, we may write as .

A proﬁle

of actions is called a pure Nash equilibrium

if, for all players

(1)

We will consider the class of games known as “generalized

ordinal potential games”, deﬁned as follows.

Deﬁnition 2.1 (Potential Games): A ﬁnite

-player game

with action sets

and utility functions is a po-

tential game if, for some potential function

for every player, for every and for every

.Itisageneralized ordinal potential game if, for some po-

tential function

for every player, and for every and for every

In a repeated version of this setup, at every stage

, each player, , selects an action .

This selection is a function of the information available to

player

up to stage . Both the action selection function and

We will henceforth refer to a pure Nash equilibrium simply as an equilib-

rium.

210 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 2, FEBRUARY 2009

the available information depend on the underlying learning

process.

B. Fictitious Play

We start with the well known Fictitious Play (FP) process [2].

Deﬁne the empirical frequency,

, as the percentage of

stages at which player

has chosen the action up to

time

, i.e.,

where is player ’s action at time and is the

indicator function. Now deﬁne the empirical frequency vector

for player

where is the cardinality of the action set .

The action of player

at time is based on the (incorrect)

presumption that other players are playing randomly and inde-

pendently according to their empirical frequencies. Under this

presumption, the expected utility for the action

(2)

where

and

. In the FP process, player uses this expected

utility by selecting an action at time

from the set

The set is called player ’s best response to

. In case of a non-unique best response, player makes

a random selection from

It is known that the empirical frequencies generated by FP

converge to a Nash equilibrium in potential games [30].

Note that FP as described above requires each player to ob-

serve the actions made by every other individual player. More-

over, choosing an action based on the predictions (2) amounts

to enumerating all possible joint actions in

at every stage

for each player. Hence, FP is computationally prohibitive as a

decision making model in large-scale games.

C. JSFP

In JSFP, each player tracks the empirical frequencies of the

joint actions of all other players. In contrast to FP, the action of

player

at time is based on the (still incorrect) presumption

that other players are playing randomly but jointly according to

their joint empirical frequencies, i.e., each player views all other

players as a collective group.

Let

be the percentage of stages at which all players

chose the joint action proﬁle

up to time , i.e.,

(3)

Let

denote the empirical frequency vector formed by the

components

. Note that the dimension of is the

cardinality

Similarly, let

be the percentage of stages at which

players other then player

have chosen the joint action proﬁle

up to time , i.e.,

(4)

which, given

, can also be expressed as

Let denote the empirical frequency vector formed by

the components

. Note that the dimension of

is the cardinality .

Similarly to FP, player

’s action at time is based on an

expected utility for the action

, but now based on the

joint action model of opponents given by

(5)

In the JSFP process, player

uses this expected utility by se-

lecting an action at time

from the set

Note that the utility as expressed in (5) is linear in .

When written in this form, JSFP appears to have a computa-

tional burden for each player that is even higher than that of FP,

since tracking the empirical frequencies

of the

joint actions of the other players is more demanding for player

than tracking the empirical frequencies

of the actions of the other players individually, where de-

notes the set of probability distributions on a ﬁnite set

.How-

ever, it is possible to rewrite JSFP to signiﬁcantly reduce the

computational burden on each player.

To choose an action at any time,

, player using JSFP needs

only the predicted utilities

for each . Sub-

stituting (4) into (5) results in

which is the average utility player would have received if ac-

tion

had been chosen at every stage up to time and other

players used the same actions. Let

This average utility,

, admits the following simple recur-

sion

Note that we use the same notation for the related quantities

(

y ;y

)

(

y ;q

)

, and

(

y ;z

)

, where the latter two are derived from the ﬁrst as

deﬁned in (2) and (5), respectively.

MARDEN et al.: JOINT STRATEGY FICTITIOUS PLAY WITH INERTIA 211

The important implication is that JSFP dynamics can be imple-

mented without requiring each player to track the empirical fre-

quencies of the joint actions of the other players and without

requiring each player to compute an expectation over the space

of the joint actions of all other players. Rather, each player using

JSFP merely updates the predicted utilities for each available ac-

tion using the recursion above, and chooses an action each stage

with maximal predicted utility.

An interesting feature of JSFP is that each strict Nash equilib-

rium has an “absorption” property as summarized in Proposition

2.1.

Proposition 2.1: In any ﬁnite

-person game, if at any time

, the joint action generated by a JSFP process is a

strict Nash equilibrium, then

for all .

Proof: For each player

and for all actions ,

Since is a strict Nash equilibrium, we know that for all

actions

By writing in terms of and

Therefore, is the only best response to , i.e., for

all

A strict Nash equilibrium need not possess this absorption

property in general for standard FP when there are more than

two players.

The convergence properties, even for potential games, of

JSFP in the case of more than two players is unresolved.

will establish convergence of JSFP in the case where players

use some sort of inertia, i.e., players are reluctant to switch to a

better action.

D. JSFP With Inertia

The JSFP with inertia process is deﬁned as follows. Players

choose their actions according to the following rules:

JSFP-1: If the action

chosen by player at time

belongs to , then .

To see this, consider the following 3 player identical interest game. For all

P 2P

, let

a; b

. Let the utility be deﬁned as follows:

(

a; b; a

(

b; a; a

)=1

(

a; a; a

(

b; b; a

)=0

(

a; a; b

(

b; b; b

)=1

(

a; b; b

(

b; a; b

100

. Suppose the ﬁrst action played is

(1) =

a; a; a

. In the FP process each player will seek to deviate in the

ensuing stage,

(2) =

b; b; b

. The joint action

b; b; b

is a strict Nash equi-

librium. One can easily verify that the ensuing action in a FP process will be

(3) =

a; b; a

. Therefore, a strict Nash equilibrium is not absorbing in the

FP process with more than 2 players.

For two player games, JSFP and standard FP are equivalent, hence the con-

vergence results for FP hold for JSFP.

JSFP-2: Otherwise, player chooses an action, ,at

time

according to the probability distribution

where is a parameter representing player ’s will-

ingness to optimize at time

, is any prob-

ability distribution whose support is contained in the set

, and is the probability distribution

with full support on the action

, i.e.,

where the “1” occurs in the coordinate of associated

with

According to these rules, player

will stay with the previous

action

with probability even when there is

a perceived opportunity for utility improvement. We make the

following standing assumption on the players’ willingness to

optimize.

Assumption 2.1: There exist constants

and such that for

all time

and for all players

This assumption implies that players are always willing to opti-

mize with some nonzero inertia.

The following result shows a similar absorption property of

pure Nash equilibria in a JSFP with inertia process.

Proposition 2.2: In any ﬁnite

-person game, if at any time

the joint action generated by a JSFP with inertia

process is 1) a pure Nash equilibrium and 2) the action

for all players , then for

all

We will omit the proof of Proposition 2.2 as it follows very

closely to the proof of Proposition 2.1.

E. Convergence to Nash Equilibrium

The following establishes the main result regarding the con-

vergence of JSFP with inertia.

We will assume that no player is indifferent between distinct

strategies.

Assumption 2.2: Player utilities satisfy the following: for all

players

, actions , , and joint actions

(6)

Theorem 2.1: In any ﬁnite generalized ordinal potential game

in which no player is indifferent between distinct strategies as in

This assumption can be relaxed to holding for sufﬁciently large

, as opposed

to all

One could alternatively assume that all pure equilibria are strict.

212 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 2, FEBRUARY 2009

Assumption 2.2, the action proﬁles generated by JSFP with

inertia under Assumption 2.1 converge to a pure Nash equilib-

rium almost surely.

We provide a complete proof of Theorem 2.1 in the Appendix.

We encourage the reader to ﬁrst review the proof of fading

memory JSFP with inertia in Theorem 3.1 of the following sec-

tion.

F. Relationship Between Regret Matching and JSFP

It turns out that JSFP is strongly related to the learning algo-

rithm regret matching, from [18], in which players choose their

actions based on their regret for not choosing particular actions

in the past steps.

Deﬁne the average regret of player

for an action

at time as

(7)

In other words, player

’s average regret for would rep-

resent the average improvement in his utility if he had chosen

in all past steps and all other players’ actions had re-

mained unaltered. Notice that the average regret in (7) can also

be expressed in terms of empirical frequencies, i.e.,

where

In regret matching, once player computes his average re-

gret for each action

, he chooses an action , ,

according to the probability distribution

deﬁned as

for any , provided that the denominator above is positive;

otherwise,

is the uniform distribution over . Roughly

speaking, a player using regret matching chooses a particular

action at any step with probability proportional to the average

regret for not choosing that particular action in the past steps.

This is in contrast to JSFP, where each player would only select

the action that yielded the highest regret.

If all players use regret matching, then the empirical fre-

quency

of the joint actions converges almost surely to the

set of coarse correlated equilibria, a generalization of Nash equi-

libria, in any game [18]. We prove that if all players use JSFP

with inertia, then the action proﬁle converges almost surely to a

pure Nash equilibrium, albeit in the special glass of generalized

ordinal potential games. The convergence properties of regret

matching (with or without inertia) in potential games remains

an open question.

III. F

ADING MEMORY

JSFP WITH

INERTIA

We now analyze the case where players view recent infor-

mation as more important. In fading memory JSFP with inertia,

players replace true empirical frequencies with weighted empir-

ical frequencies deﬁned by the recursion

for all times where is a parameter

with

being the discount factor. Let denote the

weighted empirical frequency vector formed by the compo-

nents

. Note that the dimension of is

the cardinality

One can identify the limiting cases of the discount factor.

When

we have “Cournot” beliefs, where only the most

recent information matters. In the case when

is not a constant,

but rather

, all past information is given equal impor-

tance as analyzed in Section II.

Utility prediction and action selection with fading memory

are done in the same way as in Section II, and in particular, in

accordance with rules JSFP-1 and JSFP-2. To make a decision,

player

needs only the weighted average utility that would

have been received for each action, which is deﬁned for action

One can easily verify that the weighted average utility

for action admits the recursion

Once again, player is not required to track the weighted em-

pirical frequency vector

or required to compute expecta-

tions over

As before, pure Nash equilibria have an absorption property

under fading memory JSFP with inertia.

Proposition 3.1: In any ﬁnite

-person game, if at any time

the joint action generated by a fading memory JSFP

with inertia process is 1) a pure Nash equilibrium and 2) the

action

for all players , then

for all .

We will omit the proof of Proposition 3.1 as it follows very

closely to the proof of Proposition 2.1.

The following theorem establishes convergence to Nash equi-

librium for fading memory JSFP with inertia.

Theorem 3.1: In any ﬁnite generalized ordinal potential game

in which no player is indifferent between distinct strategies as in

Assumption 2.2, the action proﬁles

generated by a fading

memory JSFP with inertia process satisfying Assumption 2.1

converge to a pure Nash equilibrium almost surely.

Proof: The proof follows a similar structure to the proof

of Theorem 6.2 in [3]. At time

, let . There exists

a positive constant

, independent of , such that if the current

action

is repeated consecutive stages, i.e.

, then

Joint Strategy Fictitious Play With Inertia for Potential Games

Figures

Citations

Cooperative Control and Potential Games

Autonomous Vehicle-Target Assignment: A Game-Theoretical Formulation

Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation

Opportunistic Spectrum Access in Cognitive Radio Networks: Global Optimization Using Local Interaction Games

Designing Games for Distributed Optimization

References

Reinforcement Learning: An Introduction

Discrete Choice Analysis: Theory and Application to Travel Demand

Evolutionary games and population dynamics

Dynamic Noncooperative Game Theory

Neuro-Dynamic Programming.

Related Papers (5)

REGULAR ARTICLEPotential Games

The theory of learning in games

Cooperative Control and Potential Games

Fictitious Play Property for Games with Identical Interests

The Statistical Mechanics of Strategic Interaction

Frequently Asked Questions (7)

Q1. What contributions have the authors mentioned in the paper "Joint strategy fictitious play with inertia for potential games" ?

Q2. What is the probability distribution of regret matching?

Q3. What is the proof of Proposition 3.1?

Q4. What is the expected utility of a player’s action in a JSFP game?

Q5. What is the total congestion experienced by all drivers on the network?

Q6. What is the resulting congestion game with tolls?

Q7. What is the average utility of a player if they had used the same actions?