scispace - formally typeset
Open AccessJournal ArticleDOI

Joint Strategy Fictitious Play With Inertia for Potential Games

TLDR
The convergence of JSFP to a pure Nash equilibrium in congestion games, or equivalently in finite potential games, when players use some inertia in their decisions and in both cases of with or without exponential discounting of the historical data.
Abstract
We consider multi-player repeated games involving a large number of players with large strategy spaces and enmeshed utility structures. In these ldquolarge-scalerdquo games, players are inherently faced with limitations in both their observational and computational capabilities. Accordingly, players in large-scale games need to make their decisions using algorithms that accommodate limitations in information gathering and processing. This disqualifies some of the well known decision making models such as ldquoFictitious Playrdquo (FP), in which each player must monitor the individual actions of every other player and must optimize over a high dimensional probability space. We will show that Joint Strategy Fictitious Play (JSFP), a close variant of FP, alleviates both the informational and computational burden of FP. Furthermore, we introduce JSFP with inertia, i.e., a probabilistic reluctance to change strategies, and establish the convergence to a pure Nash equilibrium in all generalized ordinal potential games in both cases of averaged or exponentially discounted historical data. We illustrate JSFP with inertia on the specific class of congestion games, a subset of generalized ordinal potential games. In particular, we illustrate the main results on a distributed traffic routing problem and derive tolling procedures that can lead to optimized total traffic congestion.

read more

Content maybe subject to copyright    Report

208 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 2, FEBRUARY 2009
Joint Strategy Fictitious Play With
Inertia for Potential Games
Jason R. Marden, Gürdal Arslan, and Jeff S. Shamma
Abstract—We consider multi-player repeated games involving a
large number of players with large strategy spaces and enmeshed
utility structures. In these “large-scale” games, players are inher-
ently faced with limitations in both their observational and com-
putational capabilities. Accordingly, players in large-scale games
need to make their decisions using algorithms that accommodate
limitations in information gathering and processing. This disquali-
fies some of the well known decision making models such as “Ficti-
tious Play” (FP), in which each player must monitor the individual
actions of every other player and must optimize over a high dimen-
sional probability space. We will show that Joint Strategy Fictitious
Play (JSFP), a close variant of FP, alleviates both the informational
and computational burden of FP. Furthermore, we introduce JSFP
with inertia, i.e., a probabilistic reluctance to change strategies, and
establish the convergence to a pure Nash equilibrium in all general-
ized ordinal potential games in both cases of averaged or exponen-
tially discounted historical data. We illustrate JSFP with inertia on
the specific class of congestion games, a subset of generalized or-
dinal potential games. In particular, we illustrate the main results
on a distributed traffic routing problem and derive tolling proce-
dures that can lead to optimized total traffic congestion.
Index Terms—Fictitious play (FP), joint strategy fictitious play
(JSFP).
I. INTRODUCTION
W
E consider “large-scale” repeated games involving
a large number of players, each of whom selects a
strategy from a possibly large strategy set. A player’s reward,
or utility, depends on the actions taken by all players. The
game is repeated over multiple stages, and this allows players
to adapt their strategies in response to the available information
gathered over prior stages. This setup falls under the general
subject of “learning in games” [2], [3], and there are a variety
of algorithms and accompanying analysis that examine the long
term behavior of these algorithms.
Manuscript received December 07, 2006. Current version published Feb-
ruary 11, 2009. This work was supported by NSF Grants CMS-0339228,
ECS-0501394, and ECCS-0547692, and ARO Grant W911NF-04-1-0316. This
paper appeared in part at the 44th IEEE Conference on Decision and Control,
2005. Recommended by Associate Editor F. Bullo.
J. R. Marden is with the Social and Information Sciences Laboratory,
California Institute of Technology, Pasadena, CA 91107 USA (e-mail:
marden@caltech.edu).
G. Arslan is with the Department of Electrical Engineering, University of
Hawaii at Manoa, Honolulu, HI 96822 USA (e-mail: gurdal@hawaii.edu).
J. S. Shamma is with the School of Electrical and Computer Engineering,
Georgia Institute of Technology, Atlanta, GA 30332-0250 USA (e-mail:
shamma@gatech.edu.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TAC.2008.2010885
In large-scale games players are inherently faced with limi-
tations in both their observational and computational capabil-
ities. Accordingly, players in such large-scale games need to
make their decisions using algorithms that accommodate lim-
itations in information gathering and processing. This limits
the feasibility of different learning algorithms. For example,
the well-studied algorithm “Fictitious Play” (FP) requires in-
dividual players to individually monitor the actions of other
players and to optimize their strategies according to a proba-
bility distribution function over the joint actions of other players.
Clearly, such information gathering and processing is not fea-
sible in a large-scale game.
The main objective of this paper [1] is to study a variant of FP
called Joint Strategy Fictitious Play (JSFP) [2], [4], [5]. We will
argue that JSFP is a plausible decision making model for certain
large-scale games. We will introduce a modification of JSFP to
include inertia, in which there is a probabilistic reluctance of
any player to change strategies. We will establish that JSFP with
inertia converges to a pure Nash equilibrium for a class of games
known as generalized ordinal potential games, which includes
so-called congestion games as a special case [6].
Our motivating example for a large-scale congestion game is
distributed traffic routing [7], in which a large number of vehi-
cles make daily routing decisions to optimize their own objec-
tives in response to their own observations. In this setting, ob-
serving and responding to the individual actions of all vehicles
on a daily basis would be a formidable task for any individual
driver. A more realistic measurement on the information tracked
and processed by an individual driver is the daily aggregate con-
gestion on the roads that are of interest to that driver [8]. It turns
out that JSFP accommodates such information aggregation.
We will now review some of the well known decision making
models and discuss their limitations in large-scale games. See
the monographs [2], [3], [9]–[11] and survey article [12] for a
more comprehensive review.
The well known FP algorithm requires that each player views
all other players as independent decision makers [2]. In the FP
framework, each player observes the decisions made by all other
players and computes the empirical frequencies (i.e. running av-
erages) of these observed decisions. Then, each player best re-
sponds to the empirical frequencies of other players’ decisions
by first computing the expected utility for each strategy choice
under the assumption that the other players will independently
make their decisions probabilistically according to the observed
empirical frequencies. FP is known to be convergent to a Nash
equilibrium in potential games, but need not converge for other
classes of games. General convergence issues are discussed in
[13]–[15].
The paper [16] introduces a version of FP, called “sampled
FP”, that seeks to avoid computing an expected utility based on
0018-9286/$25.00 © 2009 IEEE

MARDEN et al.: JOINT STRATEGY FICTITIOUS PLAY WITH INERTIA 209
the empirical frequencies, because for large scale games, this
expected utility computation can be prohibitively demanding.
In sampled FP, each player selects samples from the strategy
space of every other player according to the empirical frequen-
cies of that player’s past decisions. A player then computes an
average utility for each strategy choice based off of these sam-
ples. Each player still has to observe the decisions made by all
other players to compute the empirical frequencies of these ob-
served decisions. Sampled FP is proved to be convergent in iden-
tical interest games, but the number of samples needed to guar-
antee convergence grows unboundedly.
There are convergent learning algorithms for a large class of
coordination games called “weakly acyclic” games [9]. In adap-
tive play [17] players have finite recall and respond to the re-
cent history of other players. Adaptive play requires each player
to track the individual behavior of all other players for recall
window lengths greater than one. Thus, as the size of player
memory grows, adaptive play suffers from the same computa-
tional setback as FP.
It turns out that there is a strong similarity between the JSFP
discussed herein and the regret matching algorithm [18]. A
player’s regret for a particular choice is defined as the differ-
ence between 1) the utility that would have been received if
that particular choice was played for all the previous stages and
2) the average utility actually received in the previous stages.
A player using the regret matching algorithm updates a regret
vector for each possible choice, and selects actions according
to a probability proportional to positive regret. In JSFP, a player
chooses an action by myopically maximizing the anticipated
utility based on past observations, which is effectively equiv-
alent to regret modulo a bias term. A current open question is
whether player choices would converge in coordination-type
games when all players use the regret matching algorithm
(except for the special case of two-player games [19]). There
are finite memory versions of the regret matching algorithm
and various generalizations [3], such as playing best or better
responses to regret over the last
stages, that are proven to
be convergent in weakly acyclic games when players use some
sort of inertia. These finite memory algorithms do not require
each player to track the behavior of other players individually.
Rather, each player needs to remember the utilities actually
received and the utilities that could have been received in the
last
stages. In contrast, a player using JSFP best responds
according to accumulated experience over the entire history by
using a simple recursion which can also incorporate exponential
discounting of the historical data.
There are also payoff based dynamics, where each player ob-
serves only the actual utilities received and uses a Reinforce-
ment Learning (RL) algorithm [20], [21] to make future choices.
Convergence of player choices when all players use an RL-like
algorithm is proved for identical interest games [22]–[24] as-
suming that learning takes place at multiple time scales. Finally,
the payoff based dynamics with finite-memory presented in [25]
leads to a Pareto-optimal outcome in generic common interest
games.
Regarding the distributed routing setting of Section IV, there
are papers that analyze different routing strategies in conges-
tion games with “infinitesimal” players, i.e., a continuum of
players as opposed to a large, but finite, number of players. Ref-
erences [26]–[28] analyze the convergence properties of a class
of routing strategies that is a variation of the replicator dynamics
in congestion games, also referred to as symmetric games, under
a variety of settings. Reference [29] analyzes the convergence
properties of no-regret algorithms in such congestion games and
also considers congestion games with discrete players, as con-
sidered in this paper, but the results hold only for a highly struc-
tured symmetric game.
The remainder of the paper is organized as follows. Section II,
sets up JSFP and goes on to establish convergence to a pure
Nash equilibrium for JSFP with inertia in all generalized or-
dinal potential games. Section III presents a fading memory
variant of JSFP, and likewise establishes convergence to a pure
Nash equilibrium. Section IV presents an illustrative example
for traffic congestion games. Section IV goes on to illustrate the
use of tolls to achieve a socially optimal equilibrium and derives
conditions for this equilibrium to be unique. Finally, Section V
presents some concluding remarks.
II. J
OINT
STRATEGY
FICTITIOUS
PLAY
WITH INERTIA
A. Setup
Consider a finite game with
-player set
where each player has an action set and a utility
function
where .
For
, let denote the profile of
player actions other than player
, i.e.,
With this notation, we will sometimes write a profile of actions
as
. Similarly, we may write as .
A profile
of actions is called a pure Nash equilibrium
1
if, for all players
(1)
We will consider the class of games known as “generalized
ordinal potential games”, defined as follows.
Definition 2.1 (Potential Games): A finite
-player game
with action sets
and utility functions is a po-
tential game if, for some potential function
for every player, for every and for every
.Itisageneralized ordinal potential game if, for some po-
tential function
for every player, and for every and for every
.
In a repeated version of this setup, at every stage
, each player, , selects an action .
This selection is a function of the information available to
player
up to stage . Both the action selection function and
1
We will henceforth refer to a pure Nash equilibrium simply as an equilib-
rium.

210 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 2, FEBRUARY 2009
the available information depend on the underlying learning
process.
B. Fictitious Play
We start with the well known Fictitious Play (FP) process [2].
Define the empirical frequency,
, as the percentage of
stages at which player
has chosen the action up to
time
, i.e.,
where is player ’s action at time and is the
indicator function. Now define the empirical frequency vector
for player
as
.
.
.
where is the cardinality of the action set .
The action of player
at time is based on the (incorrect)
presumption that other players are playing randomly and inde-
pendently according to their empirical frequencies. Under this
presumption, the expected utility for the action
is
(2)
where
and
. In the FP process, player uses this expected
utility by selecting an action at time
from the set
The set is called player ’s best response to
. In case of a non-unique best response, player makes
a random selection from
.
It is known that the empirical frequencies generated by FP
converge to a Nash equilibrium in potential games [30].
Note that FP as described above requires each player to ob-
serve the actions made by every other individual player. More-
over, choosing an action based on the predictions (2) amounts
to enumerating all possible joint actions in
at every stage
for each player. Hence, FP is computationally prohibitive as a
decision making model in large-scale games.
C. JSFP
In JSFP, each player tracks the empirical frequencies of the
joint actions of all other players. In contrast to FP, the action of
player
at time is based on the (still incorrect) presumption
that other players are playing randomly but jointly according to
their joint empirical frequencies, i.e., each player views all other
players as a collective group.
Let
be the percentage of stages at which all players
chose the joint action profile
up to time , i.e.,
(3)
Let
denote the empirical frequency vector formed by the
components
. Note that the dimension of is the
cardinality
.
Similarly, let
be the percentage of stages at which
players other then player
have chosen the joint action profile
up to time , i.e.,
(4)
which, given
, can also be expressed as
Let denote the empirical frequency vector formed by
the components
. Note that the dimension of
is the cardinality .
Similarly to FP, player
’s action at time is based on an
expected utility for the action
, but now based on the
joint action model of opponents given by
2
(5)
In the JSFP process, player
uses this expected utility by se-
lecting an action at time
from the set
Note that the utility as expressed in (5) is linear in .
When written in this form, JSFP appears to have a computa-
tional burden for each player that is even higher than that of FP,
since tracking the empirical frequencies
of the
joint actions of the other players is more demanding for player
than tracking the empirical frequencies
of the actions of the other players individually, where de-
notes the set of probability distributions on a finite set
.How-
ever, it is possible to rewrite JSFP to significantly reduce the
computational burden on each player.
To choose an action at any time,
, player using JSFP needs
only the predicted utilities
for each . Sub-
stituting (4) into (5) results in
which is the average utility player would have received if ac-
tion
had been chosen at every stage up to time and other
players used the same actions. Let
.
This average utility,
, admits the following simple recur-
sion
2
Note that we use the same notation for the related quantities
U
(
y ;y
)
,
U
(
y ;q
)
, and
U
(
y ;z
)
, where the latter two are derived from the first as
defined in (2) and (5), respectively.

MARDEN et al.: JOINT STRATEGY FICTITIOUS PLAY WITH INERTIA 211
The important implication is that JSFP dynamics can be imple-
mented without requiring each player to track the empirical fre-
quencies of the joint actions of the other players and without
requiring each player to compute an expectation over the space
of the joint actions of all other players. Rather, each player using
JSFP merely updates the predicted utilities for each available ac-
tion using the recursion above, and chooses an action each stage
with maximal predicted utility.
An interesting feature of JSFP is that each strict Nash equilib-
rium has an “absorption” property as summarized in Proposition
2.1.
Proposition 2.1: In any finite
-person game, if at any time
, the joint action generated by a JSFP process is a
strict Nash equilibrium, then
for all .
Proof: For each player
and for all actions ,
Since is a strict Nash equilibrium, we know that for all
actions
By writing in terms of and
Therefore, is the only best response to , i.e., for
all
A strict Nash equilibrium need not possess this absorption
property in general for standard FP when there are more than
two players.
3
The convergence properties, even for potential games, of
JSFP in the case of more than two players is unresolved.
4
We
will establish convergence of JSFP in the case where players
use some sort of inertia, i.e., players are reluctant to switch to a
better action.
D. JSFP With Inertia
The JSFP with inertia process is defined as follows. Players
choose their actions according to the following rules:
JSFP-1: If the action
chosen by player at time
belongs to , then .
3
To see this, consider the following 3 player identical interest game. For all
P 2P
, let
Y
=
f
a; b
g
. Let the utility be defined as follows:
U
(
a; b; a
)=
U
(
b; a; a
)=1
,
U
(
a; a; a
)=
U
(
b; b; a
)=0
,
U
(
a; a; b
)=
U
(
b; b; b
)=1
,
U
(
a; b; b
)=
0
1
,
U
(
b; a; b
)=
0
100
. Suppose the first action played is
y
(1) =
f
a; a; a
g
. In the FP process each player will seek to deviate in the
ensuing stage,
y
(2) =
f
b; b; b
g
. The joint action
f
b; b; b
g
is a strict Nash equi-
librium. One can easily verify that the ensuing action in a FP process will be
y
(3) =
f
a; b; a
g
. Therefore, a strict Nash equilibrium is not absorbing in the
FP process with more than 2 players.
4
For two player games, JSFP and standard FP are equivalent, hence the con-
vergence results for FP hold for JSFP.
JSFP-2: Otherwise, player chooses an action, ,at
time
according to the probability distribution
where is a parameter representing player ’s will-
ingness to optimize at time
, is any prob-
ability distribution whose support is contained in the set
, and is the probability distribution
with full support on the action
, i.e.,
.
.
.
.
.
.
where the “1” occurs in the coordinate of associated
with
.
According to these rules, player
will stay with the previous
action
with probability even when there is
a perceived opportunity for utility improvement. We make the
following standing assumption on the players’ willingness to
optimize.
Assumption 2.1: There exist constants
and such that for
all time
and for all players
This assumption implies that players are always willing to opti-
mize with some nonzero inertia.
5
The following result shows a similar absorption property of
pure Nash equilibria in a JSFP with inertia process.
Proposition 2.2: In any finite
-person game, if at any time
the joint action generated by a JSFP with inertia
process is 1) a pure Nash equilibrium and 2) the action
for all players , then for
all
.
We will omit the proof of Proposition 2.2 as it follows very
closely to the proof of Proposition 2.1.
E. Convergence to Nash Equilibrium
The following establishes the main result regarding the con-
vergence of JSFP with inertia.
We will assume that no player is indifferent between distinct
strategies.
6
Assumption 2.2: Player utilities satisfy the following: for all
players
, actions , , and joint actions
(6)
Theorem 2.1: In any finite generalized ordinal potential game
in which no player is indifferent between distinct strategies as in
5
This assumption can be relaxed to holding for sufficiently large
t
, as opposed
to all
t
.
6
One could alternatively assume that all pure equilibria are strict.

212 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 2, FEBRUARY 2009
Assumption 2.2, the action profiles generated by JSFP with
inertia under Assumption 2.1 converge to a pure Nash equilib-
rium almost surely.
We provide a complete proof of Theorem 2.1 in the Appendix.
We encourage the reader to first review the proof of fading
memory JSFP with inertia in Theorem 3.1 of the following sec-
tion.
F. Relationship Between Regret Matching and JSFP
It turns out that JSFP is strongly related to the learning algo-
rithm regret matching, from [18], in which players choose their
actions based on their regret for not choosing particular actions
in the past steps.
Define the average regret of player
for an action
at time as
(7)
In other words, player
’s average regret for would rep-
resent the average improvement in his utility if he had chosen
in all past steps and all other players’ actions had re-
mained unaltered. Notice that the average regret in (7) can also
be expressed in terms of empirical frequencies, i.e.,
where
In regret matching, once player computes his average re-
gret for each action
, he chooses an action , ,
according to the probability distribution
defined as
for any , provided that the denominator above is positive;
otherwise,
is the uniform distribution over . Roughly
speaking, a player using regret matching chooses a particular
action at any step with probability proportional to the average
regret for not choosing that particular action in the past steps.
This is in contrast to JSFP, where each player would only select
the action that yielded the highest regret.
If all players use regret matching, then the empirical fre-
quency
of the joint actions converges almost surely to the
set of coarse correlated equilibria, a generalization of Nash equi-
libria, in any game [18]. We prove that if all players use JSFP
with inertia, then the action profile converges almost surely to a
pure Nash equilibrium, albeit in the special glass of generalized
ordinal potential games. The convergence properties of regret
matching (with or without inertia) in potential games remains
an open question.
III. F
ADING MEMORY
JSFP WITH
INERTIA
We now analyze the case where players view recent infor-
mation as more important. In fading memory JSFP with inertia,
players replace true empirical frequencies with weighted empir-
ical frequencies defined by the recursion
for all times where is a parameter
with
being the discount factor. Let denote the
weighted empirical frequency vector formed by the compo-
nents
. Note that the dimension of is
the cardinality
.
One can identify the limiting cases of the discount factor.
When
we have “Cournot” beliefs, where only the most
recent information matters. In the case when
is not a constant,
but rather
, all past information is given equal impor-
tance as analyzed in Section II.
Utility prediction and action selection with fading memory
are done in the same way as in Section II, and in particular, in
accordance with rules JSFP-1 and JSFP-2. To make a decision,
player
needs only the weighted average utility that would
have been received for each action, which is defined for action
as
One can easily verify that the weighted average utility
for action admits the recursion
Once again, player is not required to track the weighted em-
pirical frequency vector
or required to compute expecta-
tions over
.
As before, pure Nash equilibria have an absorption property
under fading memory JSFP with inertia.
Proposition 3.1: In any finite
-person game, if at any time
the joint action generated by a fading memory JSFP
with inertia process is 1) a pure Nash equilibrium and 2) the
action
for all players , then
for all .
We will omit the proof of Proposition 3.1 as it follows very
closely to the proof of Proposition 2.1.
The following theorem establishes convergence to Nash equi-
librium for fading memory JSFP with inertia.
Theorem 3.1: In any finite generalized ordinal potential game
in which no player is indifferent between distinct strategies as in
Assumption 2.2, the action profiles
generated by a fading
memory JSFP with inertia process satisfying Assumption 2.1
converge to a pure Nash equilibrium almost surely.
Proof: The proof follows a similar structure to the proof
of Theorem 6.2 in [3]. At time
, let . There exists
a positive constant
, independent of , such that if the current
action
is repeated consecutive stages, i.e.
, then

Citations
More filters
Journal ArticleDOI

Cooperative Control and Potential Games

TL;DR: This work extends existing learning algorithms to accommodate restricted action sets caused by the limitations of agent capabilities and group based decision making, and introduces a new class of games called sometimes weakly acyclic games for time-varying objective functions and action sets, and provides distributed algorithms for convergence to an equilibrium.
Journal ArticleDOI

Autonomous Vehicle-Target Assignment: A Game-Theoretical Formulation

TL;DR: In this article, a game-theoretical approach is proposed to solve the problem of autonomous vehicle-target assignment, where a group of vehicles are expected to optimally assign themselves to a set of targets.
Journal ArticleDOI

Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation

TL;DR: The goal is to use these behavioral models as a prescriptive control approach in distributed multi-agent systems where the guaranteed limiting behavior would represent a desirable operating condition.
Journal ArticleDOI

Opportunistic Spectrum Access in Cognitive Radio Networks: Global Optimization Using Local Interaction Games

TL;DR: It is shown that with the proposed games, global optimization is achieved with local information, specifically, the local altruistic game maximized the network throughput and the local congestion game minimizes the network collision level.
Journal ArticleDOI

Designing Games for Distributed Optimization

TL;DR: A systematic methodology for designing local agent objective functions that guarantees an equivalence between the resulting Nash equilibria and the optimizers of the system level objective and that the resulting game possesses an inherent structure that can be exploited in distributed learning, e.g., potential games.
References
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Book

Discrete Choice Analysis: Theory and Application to Travel Demand

TL;DR: In this article, the authors present the methods of discrete choice analysis and their applications in the modeling of transportation systems and present a complete travel demand model system presented in chapter 11, which is intended as a graduate level text and a general professional reference.
Book

Evolutionary games and population dynamics

TL;DR: In this book the authors investigate the nonlinear dynamics of the self-regulation of social and economic behavior, and of the closely related interactions among species in ecological communities.
Book

Dynamic Noncooperative Game Theory

TL;DR: In this paper, the authors present a general formulation of non-cooperative finite games: N-Person nonzero-sum games, Pursuit-Evasion games, and Stackelberg Equilibria of infinite dynamic games.

Neuro-Dynamic Programming.

TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
Frequently Asked Questions (7)
Q1. What contributions have the authors mentioned in the paper "Joint strategy fictitious play with inertia for potential games" ?

The authors consider multi-player repeated games involving a large number of players with large strategy spaces and enmeshed utility structures. The authors will show that Joint Strategy Fictitious Play ( JSFP ), a close variant of FP, alleviates both the informational and computational burden of FP. Furthermore, the authors introduce JSFP with inertia, i. e., a probabilistic reluctance to change strategies, and establish the convergence to a pure Nash equilibrium in all generalized ordinal potential games in both cases of averaged or exponentially discounted historical data. The authors illustrate JSFP with inertia on the specific class of congestion games, a subset of generalized ordinal potential games. In particular, the authors illustrate the main results on a distributed traffic routing problem and derive tolling procedures that can lead to optimized total traffic congestion. 

In regret matching, once player computes his average regret for each action , he chooses an action , , according to the probability distribution defined asfor any , provided that the denominator above is positive; otherwise, is the uniform distribution over . 

Theorem 3.1: In any finite generalized ordinal potential game in which no player is indifferent between distinct strategies as in Assumption 2.2, the action profiles generated by a fading memory JSFP with inertia process satisfying Assumption 2.1 converge to a pure Nash equilibrium almost surely. 

When written in this form, JSFP appears to have a computational burden for each player that is even higher than that of FP, since tracking the empirical frequencies of the joint actions of the other players is more demanding for playerthan tracking the empirical frequencies of the actions of the other players individually, where denotes the set of probability distributions on a finite set . 

The total congestion experienced by all drivers on the network isDefine a new congestion game where each driver’s utility takes the formwhere is the toll imposed on road which is a function of the number of users of road . 

When the tolling scheme set forth in Proposition 4.1 is applied to the congestion game example considered previously, the resulting congestion game with tolls is a potential game in which no player is indifferent between distinct strategies. 

Substituting (4) into (5) results inwhich is the average utility player would have received if action had been chosen at every stage up to time and other players used the same actions.