scispace - formally typeset
Search or ask a question
Journal ArticleDOI

On time-inconsistent stochastic control in continuous time

01 Apr 2017-Finance and Stochastics (Springer Berlin Heidelberg)-Vol. 21, Iss: 2, pp 331-360
TL;DR: This paper studies a class of continuous-time stochastic control problems which are time-inconsistent in the sense that they do not admit a Bellman optimality principle, and derives an extension of the standard Hamilton–Jacobi–Bellman equation in the form of a system of nonlinear equations for the determination of the equilibrium strategy as well as the equilibrium value function.
Abstract: In this paper, which is a continuation of the discrete-time paper (Bjork and Murgoci in Finance Stoch. 18:545–592, 2004), we study a class of continuous-time stochastic control problems which, in various ways, are time-inconsistent in the sense that they do not admit a Bellman optimality principle. We study these problems within a game-theoretic framework, and we look for Nash subgame perfect equilibrium points. For a general controlled continuous-time Markov process and a fairly general objective functional, we derive an extension of the standard Hamilton–Jacobi–Bellman equation, in the form of a system of nonlinear equations, for the determination of the equilibrium strategy as well as the equilibrium value function. The main theoretical result is a verification theorem. As an application of the general theory, we study a time-inconsistent linear-quadratic regulator. We also present a study of time-inconsistency within the framework of a general equilibrium production economy of Cox–Ingersoll–Ross type (Cox et al. in Econometrica 53:363–384, 1985).

Content maybe subject to copyright    Report

Finance Stoch (2017) 21:331–360
DOI 10.1007/s00780-017-0327-5
On time-inconsistent stochastic control in continuous
time
Tomas Björk
1
·Mariana Khapko
2
·
Agatha Murgoci
3
Received: 9 April 2014 / Accepted: 29 November 2016 / Published online: 13 March 2017
© The Author(s) 2017. This article is published with open access at Springerlink.com
Abstract In this paper, which is a continuation of the discrete-time paper (Björk
and Murgoci in Finance Stoch. 18:545–592, 2004), we study a class of continuous-
time stochastic control problems which, in various ways, are time-inconsistent in the
sense that they do not admit a Bellman optimality principle. We study these prob-
lems within a game-theoretic framework, and we look for Nash subgame perfect
equilibrium points. For a general controlled continuous-time Markov process and a
fairly general objective functional, we derive an extension of the standard Hamilton–
Jacobi–Bellman equation, in the form of a system of nonlinear equations, for the
determination of the equilibrium strategy as well as the equilibrium value function.
The main theoretical result is a verification theorem. As an application of the gen-
eral theory, we study a time-inconsistent linear-quadratic regulator. We also present
a study of time-inconsistency within the framework of a general equilibrium produc-
tion economy of Cox–Ingersoll–Ross type (Cox et al. in Econometrica 53:363–384,
1985).
B
T. Björk
tomas.bjork@hhs.se
M. Khapko
mariana.khapko@rotman.utoronto.ca
A. Murgoci
agatha.murgoci@econ.au.dk
1
Department of Finance, Stockholm School of Economics, Box 6501, 113 83 Stockholm, Sweden
2
Department of Management (UTSc), Rotman School of Management, University of Toronto,
105 St. George Street, Toronto, ON, M5S 3E6, Canada
3
Department of Economics and Business Economics, Aarhus University, Fuglesangs Allé 4,
8210 Aarhus V, Denmark

332 T. Björk et al.
Keywords Time-consistency · Time-inconsistency · Time-inconsistent control ·
Dynamic programming · Stochastic control · Bellman equation · Hyperbolic
discounting · Mean-variance · Equilibrium
Mathematics Subject Classification 49L99 · 49N90 · 60J70 · 91A10 · 91A80 ·
91B02 · 91B25 · 91B51 · 91G80
JEL Classification C61 · C72 · C73 · D5 · G11 · G12
1 Introduction
The purpose of this paper is to study a class of stochastic control problems in con-
tinuous time which have the property of being time-inconsistent in the sense that
they do not allow a Bellman optimality principle. As a consequence, the very con-
cept of optimality becomes problematic, since a strategy which is optimal given a
specific starting point in time and space may be non-optimal when viewed from a
later date and a different state. In this paper, we attack a fairly general class of time-
inconsistent problems by using a game-theoretic approach; so instead of searching for
optimal strategies, we search for subgame perfect Nash equilibrium strategies. The
paper presents a continuous-time version of the discrete-time theory developed in our
previous paper [5]. Since we build heavily on the discrete-time paper, the reader is
referred to that for motivating examples and more detailed discussions on conceptual
issues.
1.1 Previous literature
For a detailed discussion of the game-theoretic approach to time-inconsistency using
Nash equilibrium points as above, the reader is referred to [5].Alistofsomeofthe
most important papers on the subject is given by [2, 6, 814, 16, 1825].
All the papers above deal with particular model choices, and different authors
use different methods in order to solve the problems. To our knowledge, the present
paper, which is the continuous-time part of the working paper [4], is the first attempt
to study a reasonably general (albeit Markovian) class of time-inconsistent control
problems in continuous time. We should, however, like to stress that for the present
paper, we have been greatly inspired by [2, 9, 11].
1.2 Structure of the paper
The structure of the paper is roughly as follows.
In Sect. 2, we present the basic setup, and in Sect. 3, we discuss the concept of equi-
librium. This replaces in our setting the optimality concept for a standard stochastic
control problem, and in Definition 3.4, we give a precise definition of the equilib-
rium control and the equilibrium value function.

Time-inconsistent control 333
Since the equilibrium concept in continuous time is quite delicate, we build the
continuous-time theory on the discrete-time theory previously developed in [5]. In
Sect. 4, we start to study the continuous-time problem by going to the limit for a
discretized problem, and using the results from [5]. This leads to an extension of
the standard HJB equation to a system of equations with an embedded static opti-
mization problem. The limiting procedure described above is done in an informal
manner. It is largely heuristic, and it thus remains to clarify precisely how the de-
rived extended HJB system is related to the precisely defined equilibrium problem
under consideration.
The needed clarification is in fact delivered in Sect. 5. In Theorem 5.2, which is
the main theoretical result of the paper, we give a precise statement and proof of a
verification theorem. This theorem says that a solution to the extended HJB system
does indeed deliver the equilibrium control and equilibrium value function to our
original problem.
In Sect. 6, the results of Sect. 5 are extended to a more general reward functional.
Section 7 treats the infinite-horizon case.
In Sect. 8, we study a time-inconsistent version of the linear-quadratic regulator to
illustrate how the theory works in a concrete case.
Section 9 is devoted to a rather detailed study of a general equilibrium model for a
production economy with time-inconsistent preferences.
In Sect. 10, we review some remaining open problems.
For extensions of the theory as well as worked out examples such as point process
models, non-exponential discounting, mean-variance control, and state-dependent
risk, see the working paper overview [3].
2 The model
We now turn to the formal continuous-time theory. In order to present this, we need
some input data.
Definition 2.1 The following objects are given exogenously:
1. A drift mapping μ :R
+
×R
n
×R
k
R
n
.
2. A diffusion mapping σ : R
+
×R
n
×R
k
M(n,d), where M(n,d) denotes the
set of all n ×d matrices.
3. A control constraint mapping U :R
+
×R
n
2
R
k
.
4. A mapping F :R
n
×R
n
R.
5. A mapping G : R
n
×R
n
R.
We now consider, on the time interval [0,T], a controlled SDE of the form
dX
t
=μ(t, X
t
,u
t
)dt +σ(t,X
t
,u
t
)dW
t
, (2.1)
where the state process X is n-dimensional, the Wiener process W is d-dimensional,
and the control process u is k-dimensional, with the constraint u
t
U(t,X
t
).

334 T. Björk et al.
Loosely speaking, our objective is to maximize, for every initial point (t, x),a
reward functional of the form
E
t,x
[
F(x,X
T
)
]
+G(x, E
t,x
[
X
T
]
).
This functional is not of a form which is suitable for dynamic programming, and this
will be discussed in detail below, but first we need to specify our class of controls. In
this paper, we restrict the controls to admissible feedback control laws.
Definition 2.2 An admissible control law is a map u :[0,TR
n
R
k
satisfying
the following conditions:
1. For each (t, x) ∈[0,TR
n
,wehaveu(t, x) U(t,x).
2. For each initial point (s, y) ∈[0,TR
n
, the SDE
dX
t
=μ
t,X
t
, u(t, X
t
)
dt +σ
t,X
t
, u(t, X
t
)
dW
t
,X
s
=y
has a unique strong solution denoted by X
u
.
The class of admissible control laws is denoted by U. We sometimes use the notation
u
t
(x) instead of u(t, x).
We now go on to define the controlled infinitesimal generator of the SDE above. In
the present paper, we use the (somewhat non-standard) convention that the infinitesi-
mal operator acts on the time variable as well as on the space variable; so it includes
the term
∂t
.
Definition 2.3 Consider the SDE (2.1), and let
denote matrix transpose.
For any fixed u R
k
, the functions μ
u
, σ
u
and C
u
are defined by
μ
u
(t, x) = μ(t, x, u), σ
u
(t, x) =σ(t,x,u),
C
u
(t, x) = σ(t,x,u)σ(t,x,u)
.
For any admissible control law u, the functions μ
u
, σ
u
, C
u
(t, x) are defined by
μ
u
(t, x) = μ
t,x,u(t, x)
u
(t, x) =σ
t,x,u(t, x)
,
C
u
(t, x) = σ
t,x,u(t, x)
σ
t,x,u(t, x)
.
For any fixed u R
k
, the operator A
u
is defined by
A
u
=
∂t
+
n
i=1
μ
u
i
(t, x)
∂x
i
+
1
2
n
i,j=1
C
u
ij
(t, x)
2
∂x
i
∂x
j
.
For any admissible control law u, the operator A
u
is defined by
A
u
=
∂t
+
n
i=1
μ
u
i
(t, x)
∂x
i
+
1
2
n
i,j=1
C
u
ij
(t, x)
2
∂x
i
∂x
j
.

Time-inconsistent control 335
3 Problem formulation
In order to formulate our problem, we need an objective functional. We thus consider
the two functions F and G from Definition 2.1.
Definition 3.1 For a fixed (t, x) ∈[0,TR
n
and a fixed admissible control law u,
the corresponding reward functional J is defined by
J(t,x,u) =E
t,x
[F(x,X
u
T
)]+G(x, E
t,x
[X
u
T
]). (3.1)
Remark 3.2 In Sect. 6, we consider a more general reward functional. The restriction
to the functional (3.1) above is done in order to minimize the notational complexity
of the derivations below, which otherwise would be somewhat messy.
In order to have a nondegenerate problem, we need a formal integrability assump-
tion.
Assumption 3.3 We assume that for each initial point (t, x) ∈[0,TR
n
and each
admissible control law u,wehave
E
t,x
[|F(x,X
u
T
)|]< ,E
t,x
[|X
u
T
|]<
and hence
G(x, E
t,x
[X
u
T
])<.
Our objective is loosely that of maximizing J(t,x,u) for each (t, x), but concep-
tually this turns out to be far from trivial, so instead of optimal controls we will study
equilibrium controls. The equilibrium concept is made precise in Definition 3.4 be-
low, but in order to motivate that definition, we need a brief discussion concerning
the reward functional above.
We immediately note that in contrast to a standard optimal control problem, the
family of reward functionals above are not connected by a Bellman optimality prin-
ciple. The reasons for this are as follows:
The present state x appears in the function F .
In the second term, we have (even apart from the appearance of the present state x)
a nonlinear function G operating on the expected value E
t,x
[X
u
T
].
Since we do not have a Bellman optimality principle, it is in fact unclear what we
should mean by the term “optimal”, since the optimality concept would differ at dif-
ferent initial times t and for different initial states x.
The approach of this paper is to adopt a game-theoretic perspective and look for
subgame perfect Nash equilibrium points. Loosely speaking, we view the game as
follows:
Consider a non-cooperative game where we have one player for each point in
time t. We refer to this player as “Player t”.
For each fixed t, Player t can only control the process X exactly at time t .He/she
does that by choosing a control function u(t, ·); so the action taken at time t with
state X
t
is given by u(t, X
t
).

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors consider the stochastic optimal control problem of McKean-Vlasov stochastically differential equation where the coefficients may depend upon the joint law of the state and control.
Abstract: We consider the stochastic optimal control problem of McKean-Vlasov stochastic differential equation where the coefficients may depend upon the joint law of the state and control. By using feedback controls, we reformulate the problem into a deterministic control problem with only the marginal distribution of the process as controlled state variable, and prove that dynamic programming principle holds in its general form. Then, by relying on the notion of differentiability with respect to pro\-bability measures recently introduced by P.L. Lions in [32], and a special Ito formula for flows of probability measures, we derive the (dynamic programming) Bellman equation for mean-field stochastic control problem, and prove a veri\-fication theorem in our McKean-Vlasov framework. We give explicit solutions to the Bellman equation for the linear quadratic mean-field control problem, with applications to the mean-variance portfolio selection and a systemic risk model. We also consider a notion of lifted visc-sity solutions for the Bellman equation, and show the viscosity property and uniqueness of the value function to the McKean-Vlasov control problem. Finally, we consider the case of McKean-Vlasov control problem with open-loop controls and discuss the associated dynamic programming equation that we compare with the case of closed-loop controls.

79 citations

Posted Content
TL;DR: In this article, the authors study the McKean-Vlasov optimal control problem with common noise in various formulations, namely the strong and weak formulation, as well as the Markovian and non-Markovian formulations, and allow for the law of the control process to appear in the state dynamics.
Abstract: We study the McKean-Vlasov optimal control problem with common noise in various formulations, namely the strong and weak formulation, as well as the Markovian and non-Markovian formulations, and allowing for the law of the control process to appear in the state dynamics. By interpreting the controls as probability measures on an appropriate canonical space with two filtrations, we then develop the classical measurable selection, conditioning and concatenation arguments in this new context, and establish the dynamic programming principle under general conditions.

53 citations


Cites background from "On time-inconsistent stochastic con..."

  • ...One of the main reasons is actually a very bleak one for us: due to the non–linear dependency with respect to the law of process, the problem is actually a time inconsistent control problem (like the classical mean–variance optimisation problem in finance, see the recent papers by Björk and Murgoci [14], Björk, Khapko, and Murgoci [15], and [42] for a more thorough discussion of this topic), and Bellman’s optimality principle does not hold in this case....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors develop a dynamic theory for stopping problems in continuous time under non-exponential discounting, covering discount functions that induce decreasing impatience, and look for equilibrium stopping policies, formulated as fixed points of an operator.
Abstract: Under non-exponential discounting, we develop a dynamic theory for stopping problems in continuous time. Our framework covers discount functions that induce decreasing impatience. Due to the inherent time inconsistency, we look for equilibrium stopping policies, formulated as fixed points of an operator. Under appropriate conditions, fixed-point iterations converge to equilibrium stopping policies. This iterative approach corresponds to the hierarchy of strategic reasoning in game theory and provides “agent-specific” results: it assigns one specific equilibrium stopping policy to each agent according to her initial behavior. In particular, it leads to a precise mathematical connection between the naive behavior and the sophisticated one. Our theory is illustrated in a real options model.

47 citations

Journal ArticleDOI
TL;DR: It is proved that any equilibrium can be obtained as a fixed point of an operator that takes the future selves' behaviors into account when the state process is one dimensional and the payoff functional satisfies some regularity conditions.
Abstract: We consider the problem of stopping a diffusion process with a payoff functional that renders the problem time-inconsistent. We study stopping decisions of naive agents who reoptimize continuously in time, as well as equilibrium strategies of sophisticated agents who anticipate but lack control over their future selves' behaviors. When the state process is one dimensional and the payoff functional satisfies some regularity conditions, we prove that any equilibrium can be obtained as a fixed point of an operator. This operator represents strategic reasoning that takes the future selves' behaviors into account. We then apply the general results to the case when the agents distort probability and the diffusion process is a geometric Brownian motion. The problem is inherently time-inconsistent as the level of distortion of a same event changes over time. We show how the strategic reasoning may turn a na¨ive agent into a sophisticated one. Moreover, we derive stopping strategies of the two types of agent for various parameter specifications of the problem, illustrating rich behaviors beyond the extreme ones such as "neverstopping" or "never-starting".

47 citations


Cites background or methods from "On time-inconsistent stochastic con..."

  • ...(2012), Björk et al. (2017), and Hu et al. (2017), an equilibrium for a control problem is defined as a control process that satisfies a firstorder inequality condition on some spike variation of the control. Ebert et al. (2017) apply this definition to a stopping problem by turning the latter into a control problem....

    [...]

  • ...(2012), Björk et al. (2017), and Hu et al. (2017), an equilibrium for a control problem is defined as a control process that satisfies a firstorder inequality condition on some spike variation of the control. Ebert et al. (2017) apply this definition to a stopping problem by turning the latter into a control problem. However, it remains a problem to rigorously establish the equivalence between this first-order condition and the zerothorder condition in the original definition of a subgame perfect equilibrium. In this paper we follow the formulation of Huang and Nguyen-Huu (2018) to define an equilibrium stopping law (although therein a special stopping problem with a non-exponential discount factor featuring decreasing impatience is considered). The idea of this formulation is that, for any given stopping law, the sophisticated agent improves it by a level of strategic reasoning through anticipating his future selves’ behaviors. The agent then performs additional levels of similar reasoning until he cannot further improve it, which is an equilibrium. Mathematically, an equilibrium is a fixed-point of an operator that represents one level of this strategic thinking. This in particular coincides with the zeroth-order condition in the original definition of a subgame perfect equilibrium. (1)Refer to the classical papers Strotz (1955-56), Kydland and Prescott (1977) for detailed discussions on timeinconsistency....

    [...]

  • ...Starting with Ekeland and Lazrak (2006), and followed among others by Ekeland and Lazrak (2010), Yong (2012), Hu, Jin, and Zhou (2012), Björk, Khapko, and Murgoci (2017), and Hu, Jin, and Zhou (2017), an equilibrium for a control problem is defined as a control process that satisfies a firstorder…...

    [...]

  • ...(1992), Camerer and Ho (1994), Wu and Gonzalez (1996), Birnbaum and McIntosh (1996), and Prelec (1998), among others, suggests three main models of w: 1) the one-factor model...

    [...]

  • ...(2012), Björk et al. (2017), and Hu et al. (2017), an equilibrium for a control problem is defined as a control process that satisfies a firstorder inequality condition on some spike variation of the control....

    [...]

Journal ArticleDOI
TL;DR: In this article, a game-theoretic framework for time-inconsistent stopping problems where the time inconsistency is due to the consideration of a non-linear function of an expected reward is developed.

35 citations

References
More filters
Book ChapterDOI
TL;DR: In this article, the authors present a problem which has not heretofore been analysed and provide a theory to explain, under different circumstances, three related phenomena: (1) spendthriftiness; (2) the deliberate regimenting of one's future economic behaviour, even at a cost; and (3) thrift.
Abstract: This paper presents a problem which I believe has not heretofore been analysed2 and provides a theory to explain, under different circumstances, three related phenomena: (1) spendthriftiness; (2) the deliberate regimenting of one’s future economic behaviour— even at a cost; and (3) thrift. The senses in which we deal with these topics can probably not be very well understood, however, until after the paper has been read; but a few sentences at this point may shed some light on what we are up to.

3,427 citations

Journal ArticleDOI
TL;DR: In this paper, a continuous time general equilibrium model of a simple but complete economy is developed to examine the behavior of asset prices and their stochastic properties are determined endogenously, and the model is fully consistent with rational expectations and maximizing behavior on the part of all agents.
Abstract: This paper develops a continuous time general equilibrium model of a simple but complete economy and uses it to examine the behavior of asset prices. In this model, asset prices and their stochastic properties are determined endogenously. One principal result is a partial differential equation which asset prices must satisfy. The solution of this equation gives the equilibrium price of any asset in terms of the underlying real variables in the economy. IN THIS PAPER, we develop a general equilibrium asset pricing model for use in applied research. An important feature of the model is its integration of real and financial markets. Among other things, the model endogenously determines the stochastic process followed by the equilibrium price of any financial asset and shows how this process depends on the underlying real variables. The model is fully consistent with rational expectations and maximizing behavior on the part of all agents. Our framework is general enough to include many of the fundamental forces affecting asset markets, yet it is tractable enough to be specialized easily to produce specific testable results. Furthermore, the model can be extended in a number of straightforward ways. Consequently, it is well suited to a wide variety of applications. For example, in a companion paper, Cox, Ingersoll, and Ross [7], we use the model to develop a theory of the term structure of interest rates. Many studies have been concerned with various aspects of asset pricing under uncertainty. The most relevant to our work are the important papers on intertemporal asset pricing by Merton [19] and Lucas [16]. Working in a continuous time framework, Merton derives a relationship among the equilibrium expected rates of return on assets. He shows that when investment opportunities are changing randomly over time this relationship will include effects which have no analogue in a static one period model. Lucas considers an economy with homogeneous individuals and a single consumption good which is produced by a number of processes. The random output of these processes is exogenously determined and perishable. Assets are defined as claims to all or a part of the output of a process, and the equilibrium determines the asset prices. Our theory draws on some elements of both of these papers. Like Merton, we formulate our model in continuous time and make full use of the analytical tractability that this affords. The economic structure of our model is somewhat similar to that of Lucas. However, we include both endogenous production and

1,999 citations


"On time-inconsistent stochastic con..." refers methods in this paper

  • ...The model under consideration is a time-inconsistent analogue of the classic Cox–Ingersoll–Ross model in [7]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors highlight the question whether second-best saving is greater or smaller than first-best savings when given future saving is non-optimal from the standpoint of the present generation.
Abstract: This chapter highlights the question whether second-best saving is greater or smaller than first-best saving when given future saving is non-optimal from the standpoint of the present generation. The chapter presents the postulation that all generations expect each succeeding generation to choose the saving ratio that is second-best in its eyes. This somewhat game-theoretic model leads to the concept of an equilibrium sequence of saving-income ratios having the property that no generation acting alone can do better and all generations act so as to warrant the expectations of the future saving ratios. The chapter presents a comparison of this equilibrium and the first-best optimum. The concept and calculation of the second-best optimum is of interest even if that analysis does not explain actual national saving because society as a whole has no notion of such an optimum.

1,367 citations

Journal ArticleDOI
TL;DR: In this paper, the decision problem of a hyperbolic consumer who faces stochastic income and a borrowing constraint is solved by using the bounded variation calculus to derive the Hyperbolic Euler Relation, a natural generalization of the standard exponential Euler relation.
Abstract: Laboratory and field studies of time preference find that discount rates are much greater in the short-run than in the long-run. Hyperbolic discount functions capture this property. This paper solves the decision problem of a hyperbolic consumer who faces stochastic income and a borrowing constraint. The paper uses the bounded variation calculus to derive the Hyperbolic Euler Relation, a natural generalization of the standard Exponential Euler Relation. The Hyperbolic Euler Relation implies that consumers act as if they have endogenous rates of time preference that rise and fall with the future marginal propensity to consume (e.g., discount rates that endogenously range from 5% to 41% for the example discussed in the paper).

485 citations

Journal ArticleDOI
TL;DR: In this paper, the authors considered the case when the risk aversion depends dynamically on current wealth and provided an analytical solution where the equilibrium dollar amount invested in the risky asset is proportional to current wealth.
Abstract: The objective of this paper is to study the mean–variance portfolio optimization in continuous time. Since this problem is time inconsistent we attack it by placing the problem within a game theoretic framework and look for subgame perfect Nash equilibrium strategies. This particular problem has already been studied in [2] where the authors assumed a constant risk aversion parameter. This assumption leads to an equilibrium control where the dollar amount invested in the risky asset is independent of current wealth, and we argue that this result is unrealistic from an economic point of view. In order to have a more realistic model we instead study the case when the risk aversion depends dynamically on current wealth. This is a substantially more complicated problem than the one with constant risk aversion but, using the general theory of time inconsistent control developed in [4], we provide a fairly detailed analysis on the general case. In particular, when the risk aversion is inversely proportional to wealth, we provide an analytical solution where the equilibrium dollar amount invested in the risky asset is proportional to current wealth. The equilibrium for this model thus appears more reasonable than the one for the model with constant risk aversion.

460 citations


"On time-inconsistent stochastic con..." refers background in this paper

  • ...Other “quadratic” control problems are considered in [2, 6, 8], which study mean-variance problems within the present gametheoretic framework....

    [...]