scispace - formally typeset
Open AccessJournal ArticleDOI

Optimal Learning by Experimentation

Reads0
Chats0
TLDR
In this paper, the authors consider a problem of optimal learning by experimentation by a single decision maker and show that local properties of the payoff function are crucial in determining whether the agent eventually attains the true maximum payoff or not.
Abstract
This paper considers a problem of optimal learning by experimentation by a single decision maker. Most of the analysis is concerned with the characterisation of limit beliefs and actions. We take a two-stage approach to this problem: first, understand the case where the agent's payoff function is deterministic; then, address the additional issues arising when noise is present. Our analysis indicates that local properties of the payoff function (such as smoothness) are crucial in determining whether the agent eventually attains the true maximum payoff or not. The paper also makes a limited attempt at characterising optimal experimentation strategies.

read more

Content maybe subject to copyright    Report

OPmWLEARNINGBYEXPERIMENTATION
9 1 0 4
by
Philippe
Agllion•
Patrick
Bolton#
Christophe: Harrist
Bruno Jullien+
April
1990
Revised Decembe:
1990
We would like to thank Drew Fudenberg,
Jérry
Green, Andreu Mas-Colell, Eric
Maskin, Margaret Meyer, John Moore, Jean-Charles Rochet, Iraj
Saniee,
and Jean
Tirole
for helpful comment
..
We
also
benefitted from discussions with seminar participants at
Stanford, Harvard, Chicago, UCLA, Comell, Berkeley, Toulouse, and Paris
*
}
t
+
DELTA and HEC
Laboratoire d'Econometrie Ecole Polytechnique
Nuffield College, Oxford
CEPREMAP

ABSTRACT
OPTIMAL
LEARNING
BY
EXPERIMENTATION
This
paper
analyses
the
dynamic
decision
problem
of
an
agent
who
is
initially
uncertain
as
to
the
true
shape
of
his
payoff
function,
but
who
obtains
information
aboutit
over
time
by
observing
the
outcome
of
his
past
decisions.
In
the
long
run,
the
action
is
a
short
run
optimum
given
the
beliefs,
but
may
not
be
an optimum
for
the
true
payoff
function.
We
derive
conditions
under
which
the
limit
action
is
optimal
for
the
true
payoff
function
and
establish
the
robustness
of
the
results.
Finally
we
study
the
adjustment
process
in
an example where
such
complete
learning
does
not
achieve
in
the
long
run.
Journal
of
Economie
Literature
: 020
Keywords
Learning,
Experimentation.
RESUME
APPRENTISSAGE
PAR
EXPERIMENTATION
Le
papier
analyse
le
problème de
choix
dynamique
d'un
individu
qui,
initialement,
ne
connait
pas
sa
fonction
de
gain,
mais
qui
obtient
de
l'information
au
cours
du temps
en
observant
le
résultat
de
ses
décisions
antérieures.
Dans
le
long
terme,
l'action
choisie
est
un optimum de
court
terme
étant
données
les
croyances,
mais
peut
ne
pas
être
optimale
pour
la
vraie
fonction
de
gain.
Nous
exhibons
des
conditions
sous
lesquelles
l'action
limite
est
un optimum
pour
la
vraie
fonction
de
gain
et
établissons
la
robustesse
des
résultats.
Finalement,
nous
étudions
le
processus
d'ajustement
dans
un exemple
l'apprentissage
reste
incomplet
dans
le
long
terme.
Journal
of
Economie
Literature
: 020
Mots
clef:
apprentissage,
expérimentation

1
1 Introduction
This paper analyses
the
dynamic decision problem of
an
agent who is initially
uncertain as
to
the
true
shape of his payoff function,
but
who obtains information
aboutit
over
time
by observing
the
outcome of his
past
decisions.
The
agent must select
an
action
every period from
the
same choice set over
an
infinite
number
of periods; his decision
problem changes over time only
to
the
extent
that
his information about his
true
payoff
function improves.
As
long as
the
agent has
not
learnt all relevant aspects of his objective
function he will be
in
pursuit of two con:flicting objectives:
the
maximisation of his
expected
short-run
payoff, and
the
maximisation of
the
informational content of
the
current action.1 We are primarily interested
in
the
limit outc:omes of this problem. Under
what conditions will
the
agent's expected
short-run
payoff converge
to
his true
optimum
payoff?
We
believe
that
this question is of importance
in
many areas of economics. For
example,
the
theory of imperfect competition generally assumes
that
individual firms know
ail
relevant aspects of
the
demand function.
This
assumption is often defended with
the
argument
that
if
the
true
demand fonction is initially unknown,
but
remains fixed over
time, firms eventually learn
ail
relevant aspects
of
demand from past experience. Thus, if
one is primarily interested
in
the
nature
of
long-run
imperfect competition one can usefully
simplify
the
analysis by supposing
at
the
outset
that
firms know perfectly
the
demand
fonction they face. A clear
statement
of this line
of
argument can already be found
in
Clower (1959):
"So long
as
one deals with a fixed
demand
function,
it
is reasonably sensible to
suppose
that
the
profit and price calculations of
the
monopolist are made with
reference to this situation
in
which, following various
trial-and-error
experiments
1
This
trade-off
arises
in
many
contexts. See Grossman, Kihlstrom
and
Mirman
(1977)
and
Kihlstrom, Mirman
and
Postlewaite (1984) for example.

2
with different prices, the monopolist knows
the
precise character of market demand
(at
least within some relevant range of price and output quantities)." (Clower
(1959) pp 707-708.)
While
it
is fairly obvious
that
trial...:..and-error experiments improve a firm's
knowledge about demand,
it
is much less clear
that
in
the
course of optimal
experimentation the firm ends up knowing
the
exact shape of market demand.
For
experimentation is costly, and optimal
learning
may
dictate
that
experimentation be
stopped before
all
relevant aspects of demand
are
known. In
fact there exist several
examples in the literature demonstrating
the
possibility
that
optimal experimentation may
not result in adequate learning, most notably Rothschild (1974), McLennan (1984), and
Easley and Kiefer (1988). (Adequate
learning
occurs when, with probability one,
the
agent
acquires enough information to allow
·hlm to
obtain
the
true maximum payoff.) On the
other hand,
it
is not too difficult to construct plausible examples where optimal
experimentation does result in adequate
learning.
Our paper is a first attempt
at
characterising those situations where adequate
learning obtains and those where it does not.
We
suggest a two-stage approach to the
problem of determining under what conditions optimal experimentation leads to adequate
learning: first understand
the
case where
the
agent's payoff function is deterministic,
so
that
the
agent's inference problem is not complicated by the presence of noise; then extend
this understanding
to
take into account
the
additional issues
that
arise when noise is
present.
In this paper
we
concentrate primarily on
the
first step of this approach. On the
positive side
we
show
that
adequate learning obtains
if:
(a) the payoff functior is analytic;
(b) the payoff function is smooth and quasiconcave;
(
c)
there
is
no discounting.
It
is worth painting out
the
intuition behind cases (a) and (b
):
in
each of these cases
the

3
agent can
learn
how
to
obtain the true
maximum
payoff from information gathered by local
experimentation. Such experimentation gives him
an
arbitrarily precise estimate of the
slope of the true payoff function at any given point, in case (b ),
so
that he learns,
roughly
speaking, in which direction
he
should change his action in order to increase his
short-run
payoff. Eventually he converges to a point where the estimate of the slope
is
zero;
at
this
point he obtains the true maximum payoff. Similarly, in case
(a),
local experimentation
provides arbitrarily precise global information about the payoff function
so
that
the agent
eventually leams where the maximum payoff is located by incurring arbitrarily small
experimentation costs.
On the negative side,
we
give examples to show that inadequate learning may
obtain when the payoff function
is:
(a) is smooth but net analytic;
(b) smooth but net quasiconcave;
(c) quasiconcave but net smooth.
(Inadequate learning occurs when, with probability one, the agent fails to acquire enough
information to allow him to obtain the true
maximum
payoff.) Inadequate learning may
obtain because local experimentation either
does
not provide
a11
relevant information ( cases
(a) and
(b)) or
does
not provide enough information to compensate for the costs involved
(case (c)).
As
Alchian (1950) puts
it,
case (b) can be understood with the help of the
following
analogy:
"A nearsighted grasshopper on a mound of rocks can crawl
to
the top of a particular
rock. But there
is
no
assurance
that
he can also get to the top of the mound,
for
he
might have to descend
for
a while or hop
to
new
rocks." (Alchian (1950)
p.
31.)
The possibility of both adequate and inadequate learning leads to the question of
which
is
more likely. One way of posing
the
question more precisely at the theoretical level
is
to ask what the generic outcome is. We argue
that,
in the deterministic problem,

Citations
More filters
Journal ArticleDOI

Social Mobility and Redistributive Politics

TL;DR: In this article, the authors model rational agents as trying to learn from their dynastic income mobility experience the relative importance of effort and predetermined factors in the generation of income inequality and therefore the magnitude of these incentive costs.
Journal ArticleDOI

Learning from neighbours

TL;DR: In this article, the authors develop a general framework to study the relationship between the structure of these neighbourhoods and the process of social learning, and identify a property of neighbourhood structures-local independence which greatly facilitates social learning.
Journal ArticleDOI

Pathological outcomes of observational learning

TL;DR: In this paper, the authors explore how Bayes-rational individuals learn sequentially from the discrete actions of others, and they admit heterogeneous preferences, and show that confounded learning may arise, where history offers no decisive lessons for anyone.
Journal ArticleDOI

Uncertainty and learning in pharmaceutical demand

Gregory S. Crawford, +1 more
- 01 Jul 2005 - 
TL;DR: A dynamic matching model of demand under uncertainty in which patients learn from prescription experience about the effectiveness of alternative drugs is estimated, indicating that while there is substantial heterogeneity in drug efficacy across patients, learning enables patients and their doctors to dramatically reduce the costs of uncertainty in pharmaceutical markets.
Journal ArticleDOI

Strategic Ignorance as a Self-Disciplining Device

TL;DR: In this article, the authors analyse the decision of an agent with time-inconsistent preferences to consume a good that exerts an externality on future welfare, and show that when the agent cannot commit to future consumption and learning decisions, incomplete learning may occur on a Markov perfect equilibrium path of the resulting intrapersonal game.
References
More filters
Journal ArticleDOI

Uncertainty, Evolution, and Economic Theory

TL;DR: In this article, a modification of economic analysis to incorporate incomplete information and uncertain foresight as axioms is suggested, which embodies the principles of biological evolution and natural selection by interpreting the economic system as an adoptive mechanism which chooses among exploratory actions generated by the adaptive pursuit of "success" or "profit".
Book

Controlled Stochastic Processes

TL;DR: In this article, a discrete-parameter controlled stochastic process is proposed to solve the optimization problem of continuous-time control of Markov chains with incomplete observations, where the objective is to find the optimal stopping point of the Markov chain.
Frequently Asked Questions (1)
Q1. What are the contributions in this paper?

This paper analyses the dynamic decision problem of an agent who is initially uncertain as to the true shape of his payoff function, but who obtains information aboutit over time by observing the outcome of his past decisions. Finally the authors study the adjustment process in an example where such complete learning does not achieve in the long run.