scispace - formally typeset
Search or ask a question

Showing papers on "Bellman equation published in 2003"


Journal ArticleDOI
TL;DR: In this article, the authors use a statistical theory of detection to quantify how much model misspecification the decision maker should fear, given his historical data record, and establish a tight link between the market price of uncertainty and a bound on the error in statistically discriminating between an approximating and a worst case model.
Abstract: A representative agent fears that his model, a continuous time Markov process with jump and diffusion components, is misspecified and therefore uses robust control theory to make decisions. Under the decision maker’s approximating model, cautious behavior puts adjustments for model misspecification into market prices for risk factors. We use a statistical theory of detection to quantify how much model misspecification the decision maker should fear, given his historical data record. A semigroup is a collection of objects connected by something like the law of iterated expectations. The law of iterated expectations defines the semigroup for a Markov process, while similar laws define other semigroups. Related semigroups describe (1) an approximating model; (2) a model misspecification adjustment to the continuation value in the decision maker’s Bellman equation; (3) asset prices; and (4) the behavior of the model detection statistics that we use to calibrate how much robustness the decision maker prefers. Semigroups 2, 3, and 4 establish a tight link between the market price of uncertainty and a bound on the error in statistically discriminating between an approximating and a worst case model. (JEL: C00, D51, D81, E1, G12)

534 citations


Journal ArticleDOI
TL;DR: In this paper, a new characterization of excessive functions for arbitrary one-dimensional regular diffusion processes is provided, using the notion of concavity, and a new perspective and new facts about the principle of smooth-fit in the context of optimal stopping are presented.

289 citations


Proceedings Article
Rémi Munos1
21 Aug 2003
TL;DR: In this article, the authors provide error bounds for approximate policy iterative using quadratic norms, and illustrate those results in the case of feature-based linear function approximation, where most function approximators (such as linear regression) select the best fit in a given class of parameterized functions by minimizing some (weighted) Quadratic norm.
Abstract: In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration results -in discounted problems- from a contraction property of the back-up operator, guaranteeing convergence to its fixed-point. When approximation is considered, known results in Approximate Policy Iteration provide bounds on the closeness to optimality of the approximate value function obtained by successive policy improvement steps as a function of the maximum norm of value determination errors during policy evaluation steps. Unfortunately, such results have limited practical range since most function approximators (such as linear regression) select the best fit in a given class of parameterized functions by minimizing some (weighted) quadratic norm. In this paper, we provide error bounds for Approximate Policy Iteration using quadratic norms, and illustrate those results in the case of feature-based linear function approximation.

288 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the problem of expected utility maximization in incomplete markets and proved a necessary and sufficient condition on both the utility function and the model for the dual problem to hold true.
Abstract: Following [10] we continue the study of the problem of expected utility maximization in incomplete markets. Our goal is to nd minimal conditions on a model and a utility function for the validity of several key assertions of the theory to hold true. In [10] we proved that a minimal condition on the utility function alone, i.e. a minimal market independent condition, is that the asymptotic elasticity of the utility function is strictly less than 1. In this paper we show that a necessary and sucient condition on both, the utility function and the model, is that the value function of the dual problem is nite.

282 citations


Journal ArticleDOI
TL;DR: In this paper, the authors define a Stackelberg equilibrium with robust decision makers in which the leader and the follower have different worst-case models despite sharing a common approximating model.

162 citations


Journal ArticleDOI
TL;DR: It is shown that a finite-horizon version of the robust control criterion appearing in recent papers by Hansen, Sargent, and their coauthors can be described as recursive utility, which in continuous time takes the form of the Stochastic Differential Utility of Duffie and Epstein (1992).
Abstract: This paper shows that a finite-horizon version of the robust control criterion appearing in recent papers by Hansen, Sargent, and their coauthors can be described as recursive utility, which in continuous time takes the form of the Stochastic Differential Utility (SDU) of Duffie and Epstein (1992). While it has previously been noted that Bellman equations arising in robust control settings are of the same form as Bellman equations arising from SDU maximization, here this connection is shown directly without reference to any underlying dynamics, or Markov structure.

122 citations


Journal ArticleDOI
TL;DR: An optimal consumption and investment model in continuous time is considered, which is an extension of the original Merton's problem, and the asset prices are affected by correlated economic factors, modelled as diffusion processes.
Abstract: We consider an optimal consumption and investment model in continuous time, which is an extension of the original Merton's problem. In the proposed model, the asset prices are affected by correlated economic factors, modelled as diffusion processes. Writing the value function in a special form, it can be seen that another optimal control problem is involved and studying its associated HJB equation smoothness properties of the original value function can be derived as well as optimal policies.

118 citations


Journal ArticleDOI
TL;DR: A framework for the condition-based maintenance optimization of a technical system which can be in one of N operational states or in a failure state is considered and an algorithm for the calculation of the value function is presented.
Abstract: In this paper, we present a framework for the condition-based maintenance optimization. A technical system which can be in one of N operational states or in a failure state is considered. The system state is not observable, except the failure state. The information that is stochastically related to the system state is obtained through condition monitoring at equidistant inspection times. The system can be replaced at any time; a preventive replacement is less costly than failure replacement. The objective is to find a replacement policy minimizing the long run expected average cost per unit time. The replacement problem is formulated as an optimal stopping problem with partial information and transformed to a problem with complete information by applying the projection theorem to a smooth semimartingale process in the objective function. The dynamic equation is derived and analyzed in the piecewise deterministic Markov process stopping framework. The contraction property is shown and an algorithm for the calculation of the value function is presented, illustrated by an example.

113 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied the problem of the existence and uniqueness of solutions to the Bellman equation in the presence of unbounded returns and provided sufficient conditions for the existence of solutions that can be applied to fairly general models.
Abstract: We study the problem of the existence and uniqueness of solutions to the Bellman equation in the presence of unbounded returns. We introduce a new approach based both on consideration of a metric on the space of all continuous functions over the state space, and on the application of some metric fixed point theorems. With appropriate conditions we prove uniqueness of solutions with respect to the whole space of continuous functions. Furthermore, the paper provides new sufficient conditions for the existence of solutions that can be applied to fairly general models. It is also proven that the fixed point coincides with the value function and that it can be approached by successive iterations of the Bellman operator.

98 citations


Journal ArticleDOI
Huyên Pham1
TL;DR: This work considers an investment model where the objective is to overperform a given benchmark or index and uses large deviations techniques for stating the value function of this criterion of outperformance management, providing an objective probabilistic interpretation of the usually subjective degree of risk aversion in CRRA utility function.
Abstract: We consider an investment model where the objective is to overperform a given benchmark or index. We study this portfolio management problem for a long term horizon. This asymptotic criterion leads to a large deviation probability control problem. Its dual problem is an ergodic risk sensitive control problem on the optimal logarithmic moment generating function that is explicitly derived. A careful study of its domain and its behavior at the boundary of the domain is required. We then use large deviations techniques for stating the value function of this criterion of outperformance management. This provides in turn an objective probabilistic interpretation of the usually subjective degree of risk aversion in CRRA utility function.

84 citations


Journal ArticleDOI
TL;DR: It is proved that the Bellman equation to the problem of optimal investment for an insurer with an insurance business modelled by a compound Poisson or a compound Cox process has a classical solution.
Abstract: . An optimal control problem is considered where a risky asset is used for investment, and this investment is financed by initial wealth as well as by a state dependent income. The objective function is accumulated discounted expected utility of wealth, where the utility function is nondecreasing and bounded. This problem is investigated for constant as well as for stochastic discount rate, where the stochastic model is a time homogeneous finite state Markov process. We prove that the Bellman equation to this optimization problem has a classical solution and give a verification argument. Based on this we deal with the problem of optimal investment for an insurer with an insurance business modelled by a compound Poisson or a compound Cox process, under the presence of constant as well as (finite state space Markov) stochastic interest rate.

Journal ArticleDOI
TL;DR: The method presented is purely analytic and rather general and is able to handle finite difference methods with variable diffusion coefficients without the reduction of order of convergence observed by Krylov in the nonlinear case.
Abstract: We provide estimates on the rate of convergence for approximation schemes for Bellman equations associated with optimal stopping of controlled diffusion processes. These results extend (and slightly improve) the recent results by Barles & Jakobsen to the more difficult time-dependent case. The added difficulties are due to the presence of boundary conditions (initial conditions!) and the new structure of the equation which is now a parabolic variational inequality. The method presented is purely analytic and rather general and is based on earlier work by Krylov and Barles & Jakobsen. As applications we consider so-called control schemes based on the dynamic programming principle and finite difference methods (though not in the most general case). In the optimal stopping case these methods are similar to the Brennan & Schwartz scheme. A simple observation allows us to obtain the optimal rate 1/2 for the finite difference methods, and this is an improvement over previous results by Krylov and Barles & Jakobsen. Finally, we present an idea that allows us to improve all the above-mentioned results in the linear case. In particular, we are able to handle finite difference methods with variable diffusion coefficients without the reduction of order of convergence observed by Krylov in the nonlinear case.

Journal ArticleDOI
TL;DR: The method is capable to generate global control solutions when state and control constraints are present and is global in the sense that controls for all initial conditions in a region of the state space are obtained.

Journal ArticleDOI
TL;DR: In this article, the authors study Merton's classical portfolio optimization problem for an investor who can trade in a risk-free bond and a stock, where the goal of the investor is to allocate money so that her expected utility from terminal wealth is maximized.
Abstract: We study Merton's classical portfolio optimization problem for an investor who can trade in a risk-free bond and a stock. The goal of the investor is to allocate money so that her expected utility from terminal wealth is maximized. The special feature of the problem studied in this paper is the inclusion of stochastic volatility in the dynamics of the risky asset. The model we use is driven by a superposition of non-Gaussian Ornstein-Uhlenbeck processes and it was recently proposed and intensively investigated for real market data by Barndorff-Nielsen and Shephard (2001). Using the dynamic programming method, explicit trading strategies and expressions for the value function via Feynman-Kac formulas are derived and verified for power utilities. Some numerical examples are also presented.

Journal ArticleDOI
TL;DR: This paper proposes a simple analytical model called M time scale Markov decision process (MMDPs) for hierarchically structured sequential decision making processes, where decisions in each level in the M-level hierarchy are made in M different discrete time scales.
Abstract: This paper proposes a simple analytical model called M time scale Markov decision process (MMDPs) for hierarchically structured sequential decision making processes, where decisions in each level in the M-level hierarchy are made in M different discrete time scales. In this model, the state-space and the control-space of each level in the hierarchy are nonoverlapping with those of the other levels, respectively, and the hierarchy is structured in a "pyramid" sense such that a decision made at level m (slower time scale) state and/or the state will affect the evolutionary decision making process of the lower level m+1 (faster time scale) until a new decision is made at the higher level but the lower level decisions themselves do not affect the transition dynamics of higher levels. The performance produced by the lower level decisions will affect the higher level decisions. A hierarchical objective function is defined such that the finite-horizon value of following a (nonstationary) policy at level m+1 over a decision epoch of level m plus an immediate reward at level m is the single-step reward for the decision making process at level m. From this we define "multi-level optimal value function" and derive "multi-level optimality equation." We discuss how to solve MMDPs exactly and study some approximation methods, along with heuristic sampling-based schemes, to solve MMDPs.

Proceedings ArticleDOI
09 Dec 2003
TL;DR: An auxiliary partial differential equation is proposed with which one can evaluate multiple additive cost metrics for paths which are generated by value functions; solving this auxiliary equation adds little more work to the value function computation.
Abstract: We examine the problem of planning a path through a low dimensional continuous state space subject to upper bounds on several additive cost metrics. For the single cost case, previously published research has proposed constructing the paths by gradient descent on a local minima free value function. This value function is the solution of the Eikonal partial differential equation, and efficient algorithms have been designed to compute it. In this paper we propose an auxiliary partial differential equation with which we can evaluate multiple additive cost metrics for paths which are generated by value functions; solving this auxiliary equation adds little more work to the value function computation. We then propose an algorithm which generates paths whose costs lie on the Pareto optimal surface for each possible destination location, and we can choose from these paths those which satisfy the constraints. The procedure is practical when the sum of the state space dimension and number of cost metrics is roughly six or below.

Journal ArticleDOI
TL;DR: This work considers a stochastic control problem that has emerged in the economics literature as an investment model under uncertainty and finds that this has a priori rather unexpected features.
Abstract: We consider a stochastic control problem that has emerged in the economics literature as an investment model under uncertainty This problem combines features of both stochastic impulse control and optimal stopping The aim is to discover the form of the optimal strategy It turns out that this has a priori rather unexpected features The results that we establish are of an explicit nature We also construct an example whose value function does not possess C1 regularity

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of minimizing a hedging error, measured by a positive convex random function, in an incomplete financial market model, where the dynamics of asset prices is given by an Rd-valued continuous semimartingale.
Abstract: We consider a problem of minimization of a hedging error, measured by a positive convex random function, in an incomplete financial market model, where the dynamics of asset prices is given by an Rd-valued continuous semimartingale. Under some regularity assumptions we derive a backward stochastic PDE for the value function of the problem and show that the strategy is optimal if and only if the corresponding wealth process satisfies a certain forward-SDE. As an example the case of mean-variance hedging is considered.

Journal ArticleDOI
TL;DR: This work applies price-directed control to the problem of replenishing inventory to subsets of products/locations, such as in the distribution of industrial gases, so as to minimize long-run time average replenishment costs.
Abstract: The idea of price-directed control is to use an operating policy that exploits optimal dual prices from a mathematical programming relaxation of the underlying control problem. We apply it to the problem of replenishing inventory to subsets of products/locations, such as in the distribution of industrial gases, so as to minimize long-run time average replenishment costs. Given a marginal value for each product/location, whenever there is a stockout the dispatcher compares the total value of each feasible replenishment with its cost, and chooses one that maximizes the surplus. We derive this operating policy using a linear functional approximation to the optimal value function of a semi-Markov decision process on continuous spaces. This approximation also leads to a math program whose optimal dual prices yield values and whose optimal objective value gives a lower bound on system performance. We use duality theory to show that optimal prices satisfy several structural properties and can be interpreted as estimates of lowest achievable marginal costs. On real-world instances, the price-directed policy achieves superior, near optimal performance as compared with other approaches.

Journal ArticleDOI
TL;DR: In this article, the authors consider optimal control problems where the state X(t) at time t of the system is given by a stochastic differential delay equation, and derive an associated (finite dimensional) Hamilton-Jacobi-Bellman equation for the value function of such problems.
Abstract: We consider optimal control problems where the state X(t) at time t of the system is given by a stochastic differential delay equation. The growth at time t not only depends on the present value X(t), but also on X(t-δ) and some sliding average of previous values. Moreover, this dependence may be nonlinear. Using the dynamic programming principle we derive an associated (finite dimensional) Hamilton-Jacobi-Bellman equation for the value function of such problems. This (finite dimensional) HJB equation has solutions if and only if the coefficients satisfy a particular system of first order PDEs. We introduce viscosity solutions for the type of HJB-equations that we consider, and prove that under certain conditions, the value function is the unique viscosity solution to the HJB-equation. We also give numerical examples for two cases where the HJB-equation reduces to a finite dimensional one.

Journal ArticleDOI
TL;DR: A key structural property for the decision function is proved, and this property is exploited in the development of continuous value function approximations that form the basis of an approximate dispatch rule.
Abstract: We address the problem of dispatching a vehicle with different product classes. There is a common dispatch cost, but holding costs that vary by product class. The problem exhibits multidimensional state, outcome and action spaces, and as a result is computationally intractable using either discrete dynamic programming methods, or even as a deterministic integer program. We prove a key structural property for the decision function, and exploit this property in the development of continuous value function approximations that form the basis of an approximate dispatch rule. Comparisons on single product-class problems, where optimal solutions are available, demonstrate solutions that are within a few percent of optimal. The algorithm is then applied to a problem with 100 product classes, and comparisons against a carefully tuned myopic heuristic demonstrate significant improvements. © 2003 Wiley Periodicals, Inc. Naval Research Logistics 50: 742–769, 2003.

Journal ArticleDOI
TL;DR: In this article, an approach based on simulation, function approximation and evolutionary improvement aimed towards simplifying online optimization is presented, where closed loop data from a suboptimal control law, such as MPC based on successive linearization, are used to obtain an approximation of the cost-to-go function, which is subsequently improved through iterations of the Bellman equation.
Abstract: Optimal control of systems with complex nonlinear behaviour such as steady state multiplicity results in a nonlinear optimization problem that needs to be solved online at each sample time. We present an approach based on simulation, function approximation and evolutionary improvement aimed towards simplifying online optimization. Closed loop data from a suboptimal control law, such as MPC based on successive linearization, are used to obtain an approximation of the ‘cost-to-go’ function, which is subsequently improved through iterations of the Bellman equation. Using this offline-computed cost approximation, an infinite horizon problem is converted to an equivalent single stage problem—substantially reducing the computational burden. This approach is tested on continuous culture of microbes growing on a nutrient medium containing two substrates that exhibits steady state multiplicity. Extrapolation of the cost-to-go function approximator can lead to deterioration of online performance. Some remedies to prevent such problems caused by extrapolation are proposed. Copyright © 2003 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the authors analyse the logarithmic penalty method for converting an optimal control problem into an unconstrained one, the latter being solved by a shooting algorithm.
Abstract: The paper deals with optimal control problems of ordinary differential equations with bound control constraints. We analyse the logarithmic penalty method for converting the problem into an unconstrained one, the latter being solved by a shooting algorithm. Convergence of the value function and optimal controls is obtained for linear quadratic problems, and more generally when the control variable enters linearly in the state equation and in a quadratic way in the cost function. We display some numerical results on two examples: an aircraft maneuver, and the stabilization of an oscillating system. Copyright © 2003 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the H/sub/spl infin/infin// problem for a nonlinear system is considered, and an entirely new class of methods for obtaining the "correct" solution of such PDEs are developed based on the linearity of the associated semigroup over the max-plus (or, in some cases, min-plus) algebra.
Abstract: The H/sub /spl infin// problem for a nonlinear system is considered. The corresponding dynamic programming equation is a fully nonlinear, first-order, steady-state partial differential equation (PDE), possessing a term which is quadratic in the gradient. The solutions are typically nonsmooth, and further, there is nonuniqueness among the class of viscosity solutions. In the case where one tests a feedback control to see if it yields an H/sub /spl infin// controller, the PDE is a Hamilton-Jacobi-Bellman equation. In the case where the "optimal" feedback control is being determined as well, the problem takes the form of a differential game, and the PDE is, in general, an Isaacs equation. The computation of the solution of a nonlinear, steady-state, first-order PDE is typically quite difficult. In this paper, we develop an entirely new class of methods for obtaining the "correct" solution of such PDEs. These methods are based on the linearity of the associated semigroup over the max-plus (or, in some cases, min-plus) algebra. In particular, solution of the PDE is reduced to solution of a max-plus (or min-plus) eigenvector problem for known unique eigenvalue 0 (the max-plus multiplicative identity). It is demonstrated that the eigenvector is unique, and that the power method converges to it. An example is included.

Journal ArticleDOI
TL;DR: In this article, the authors studied the relationship between an optimal regulation problem on the infinite horizon and stabilizability in affine control systems, where the value function of the optimal regulation is not smooth and feedback laws involved in stabilization may be discontinuous.
Abstract: For affine control systems, we study the relationship between an optimal regulation problem on the infinite horizon and stabilizability. We are interested in the case the value function of the optimal regulation problem is not smooth and feedback laws involved in stabilizability may be discontinuous.

Journal ArticleDOI
TL;DR: In this article, the authors develop a method of proof that allows to dispense with the assumption that returns are bounded from above, in which they only imply that long run average (expected) growth is sufficiently discounted, in contrast with classical assumptions either absolutely bounding growth or bounding each period (instead of long run) maximum growth.
Abstract: Finding solutions to the Bellman equation often relies on restrictive boundedness assumptions. In this paper we develop a method of proof that allows to dispense with the assumption that returns are bounded from above. In applications our assumptions only imply that long run average (expected) growth is sufficiently discounted, in sharp contrast with classical assumptions either absolutely bounding growth or bounding each period (instead of long run) maximum (instead of average) growth. We discuss our work in relation to the literature and provide several examples.

Journal ArticleDOI
TL;DR: In this article, the authors proved that the optimal cost of a Mayer's problem for a control system with singular perturbation does converge to the optimal costs associated with control systems obtained by averaging.
Abstract: In the present paper, we prove that the optimal cost of a Mayer's problem for a control system with singular perturbation does converge to the optimal cost of a Mayer's problem associated with control system obtained by averaging. The main novelty of our result lies on the fact that we did not require any continuity assumption on the final cost.

Journal ArticleDOI
TL;DR: This work presents existence and uniqueness results for an equilibrium in an M-person Nash game with quadratic performance criteria and a linear difference equation as constraint, describing the system dynamics under an open-loop information pattern.
Abstract: We present existence and uniqueness results for an equilibrium in an M-person Nash game with quadratic performance criteria and a linear difference equation as constraint, describing the system dynamics under an open-loop information pattern. The approach used is the construction of a value function which leads to existence assertions in terms of solvability of certain symmetric and nonsymmetric Riccati difference equations.

Journal ArticleDOI
01 Nov 2003
TL;DR: In this article, the authors considered the problem of optimal dividend payment under the constraint that the controlled risk process has a ruin probability which does not exceed a given bound, and the solution to this constraint optimization problem is given in a modified Hamilton-Jacobi-Bellman (HJB) equation.
Abstract: We consider optimal dividend payment under the constraint that the controlled risk process has a ruin probability which does not exceed a given bound. The underlying simple model has independent identically distributed total claims per year and a constant yearly premium, all integers. The solution to this constraint optimization problem is given in a modified Hamilton-Jacobi-Bellman (HJB) equation. It is shown that this equation has a solution, and a verification argument is given showing that the solution of the HJB equation is the value function of the optimization problem. The optimal dividend payment strategy is given in the usual feedback form.

Journal ArticleDOI
TL;DR: The equality of 1/β* with the maximal Perron/Frobenius eigenvalue of the MDP links the problem and the results to topics studied intensively in the literature.
Abstract: This paper deals with a Markovian decision process with an absorbing set J0. We are interested in the largest number β*≥1, called the critical discount factor, such that for all discount factors β smaller than β* the limit V of the N-stage value function VN for N →∞ exists and is finite for each choice of the one-stage reward function. Several representations of β* are given. The equality of 1/β* with the maximal Perron/Frobenius eigenvalue of the MDP links our problem and our results to topics studied intensively (mostly for β=1) in the literature. We derive in a unified way a large number of conditions, some of which are known, which are equivalent either to β 1. In particular, the latter is equivalent to transience of the MDP. A few of our findings are extended with the aid of results in Rieder (1976) to models with standard Borel state and action space. We also complement an algorithm of policy iteration type, due to Mandl/Seneta (1969), for the computation of β*. Finally we determine β* explicitly in two models with stochastically monotone transition law.