scispace - formally typeset
Search or ask a question

Showing papers on "Bellman equation published in 1989"


Book
01 Jan 1989
TL;DR: In this article, a deterministic model of optimal growth is proposed, and a stochastic model is proposed for optimal growth with linear utility and linear systems and linear approximations.
Abstract: I. THE RECURSIVE APPROACH 1. Introduction 2. An Overview 2.1 A Deterministic Model of Optimal Growth 2.2 A Stochastic Model of Optimal Growth 2.3 Competitive Equilibrium Growth 2.4 Conclusions and Plans II. DETERMINISTIC MODELS 3. Mathematical Preliminaries 3.1 Metric Spaces and Normed Vector Spaces 3.2 The Contraction Mapping Theorem 3.3 The Theorem of the Maximum 4. Dynamic Programming under Certainty 4.1 The Principle of Optimality 4.2 Bounded Returns 4.3 Constant Returns to Scale 4.4 Unbounded Returns 4.5 Euler Equations 5. Applications of Dynamic Programming under Certainty 5.1 The One-Sector Model of Optimal Growth 5.2 A "Cake-Eating" Problem 5.3 Optimal Growth with Linear Utility 5.4 Growth with Technical Progress 5.5 A Tree-Cutting Problem 5.6 Learning by Doing 5.7 Human Capital Accumulation 5.8 Growth with Human Capital 5.9 Investment with Convex Costs 5.10 Investment with Constant Returns 5.11 Recursive Preferences 5.12 Theory of the Consumer with Recursive Preferences 5.13 A Pareto Problem with Recursive Preferences 5.14 An (s, S) Inventory Problem 5.15 The Inventory Problem in Continuous Time 5.16 A Seller with Unknown Demand 5.17 A Consumption-Savings Problem 6. Deterministic Dynamics 6.1 One-Dimensional Examples 6.2 Global Stability: Liapounov Functions 6.3 Linear Systems and Linear Approximations 6.4 Euler Equations 6.5 Applications III. STOCHASTIC MODELS 7. Measure Theory and Integration 7.1 Measurable Spaces 7.2 Measures 7.3 Measurable Functions 7.4 Integration 7.5 Product Spaces 7.6 The Monotone Class Lemma

2,991 citations


Book ChapterDOI
01 Jan 1989
TL;DR: In the long history of mathematics, stochastic optimal control is a rather recent development using Bellman's Principle of Optimality along with measure-theoretic and functional-analytic methods.
Abstract: In the long history of mathematics, stochastic optimal control is a rather recent development. Using Bellman’s Principle of Optimality along with measure-theoretic and functional-analytic methods, several mathematicians such as H. Kushner, W. Fleming, R. Rishel. W.M. Wonham and J.M. Bismut, among many others, made important contributions to this new area of mathematical research during the 1960s and early 1970s. For a complete mathematical exposition of the continuous time case see Fleming and Rishel (1975) and for the discrete time case see Bertsekas and Shreve (1978).

415 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide conditions on the primitives of a continuous-time economy under which there exist equilibria obeying the Consumption-Based Capital Asset Pricing Model (CCAPM).
Abstract: The paper provides conditions on the primitives of a continuous-time economy under which there exist equilibria obeying the Consumption-Based Capital Asset Pricing Model (CCAPM). The paper also extends the equilibrium characterization of interest rates of Cox, Ingersoll, and Ross (1985) to multi-agent economies. We do not use a Markovian state assumption. THIS WORK PROVIDES sufficient conditions on agents' primitives for the validity of the Consumption-Based Capital Asset Pricing Model (CCAPM) of Breeden (1979). As a necessary condition, Breeden showed that in a continuous-time equilibrium satisfying certain regularity conditions, one can characterize returns on securities as follows. The expected "instantaneous" rate of return on any security in excess of the riskless interest rate (the security's expected excess rate of return) is a multiple common to all securities of the "instantaneous covariance" of this excess return with aggregate consumption increments. This common multiple is the Arrow-Pratt measure of risk aversion of a representative agent. (Rubinstein (1976) published a discrete-time precursor of this result.) The exis- tence of equilibria satisfying Breeden's regularity conditions had been an open issue. We also show that the validity of the CCAPM does not depend on Breeden's assumption of Markov state information, and present a general asset pricing model extending the results of Cox, Ingersoll, and Ross (1985) as well as the discrete-time results of Rubinstein (1976) and Lucas (1978) to a multi-agent environment. Since the CCAPM was first proposed, much effort has been directed at finding sufficient conditions on the model primitives: the given assets, the agents' preferences, the agents' consumption endowments, and (in a production econ- omy) the feasible production sets. Conditions sufficient for the existence of continuous-time equilibria were shown in Duffie (1986), but the equilibria demonstrated were not shown to satisfy the additional regularity required for the CCAPM. In particular, Breeden assumed that all agents choose pointwise interior consumption rates, in order to characterize asset prices via the first order conditions of the Bellman equation. Interiority was also assumed by Huang (1987) in demonstrating a representative agent characterization of equilibrium, an approach exploited here. The use of dynamic programming and the Bellman equation, aside from the difficulty it imposes in verifying the existence of interior 1 Financial support from the National Science Foundation is gratefully acknowledged. We thank

215 citations


Journal ArticleDOI
TL;DR: In this paper, the authors considered infinite state Markov decision processes with unbounded costs and provided sufficient conditions for the existence of a distinguished state of smallest discounted value and a single stationary policy inducing an irreducible, ergodic Markov chain for which the average cost of a first passage from any state to the distinguished state is finite.
Abstract: We deal with infinite state Markov decision processes with unbounded costs. Three simple conditions, based on the optimal discounted value function, guarantee the existence of an expected average cost optimal stationary policy. Sufficient conditions are the existence of a distinguished state of smallest discounted value and a single stationary policy inducing an irreducible, ergodic Markov chain for which the average cost of a first passage from any state to the distinguished state is finite. A result to verify this is also given. Two examples illustrate the ease of applying the criteria.

199 citations


DOI
01 Jan 1989
TL;DR: The thesis develops methods to solve discrete-time finite-state partially observable Markov decision processes and proves that the policy improvement step in iterative discretization procedure can be replaced by the approximation version of linear support algorithm.
Abstract: The thesis develops methods to solve discrete-time finite-state partially observable Markov decision processes. For the infinite horizon problem, only discounted reward case is considered. For the finite horizon problem, two new algorithms are developed. The first algorithm is called the relaxed region algorithm. For each support in the value function, this algorithm determines a region not smaller than its support region and modifies it implicitly in later steps until the exact support region is found. The second algorithm, called linear support algorithm, systematically approximates the value function until all supports in the value function are found. The most important feature of this algorithm is that it can be modified to find an approximate value function. It has been shown that these two algorithms are more efficient than the one-pass algorithm. For the infinite horizon problem, it is first shown that the approximation version of linear support algorithm can be used to substitute the policy improvement step in a standard successive approximation method to obtain an $\epsilon$-optimal value function. Next, an iterative discretization procedure is developed which uses a small number of states to find new supports and improve the value function between two policy improvement steps. Since only a finite number of states are chosen in this process, some techniques developed for finite MDP can be applied here. Finally, we prove that the policy improvement step in iterative discretization procedure can be replaced by the approximation version of linear support algorithm. The last part of the thesis deals with problems with continuous signals. We first show that if the signal processes are uniformly distributed, then the problem can be reformulated as a problem with finite number of signals. Then the result is extended to where the signal processes are step functions. Since step functions can be easily used to approximate most of the probability distributions, this method can be used to approximate most of the problems with continuous signals. Finally, we present some conditions which guarantee that the linear support can be computed for any given state, then the methods developed for finite signal cases can be easily modified and applied to problems for which the conditions hold.

173 citations


Journal ArticleDOI
TL;DR: In this paper, the optimal control process is constructed by solving the Skorokhod problem of reflecting the two-dimensional Brownian motion along a free boundary in the $ - abla V$ direction.
Abstract: It is desired to control a two-dimensional Brownian motion by adding a (possibly singularly) continuous process to it so as to minimize an expected infinite-horizon discounted running cost. The Hamilton–Jacobi–Bellman characterization of the value function V is a variational inequality which has a unique twice continuously differentiable solution. The optimal control process is constructed by solving the Skorokhod problem of reflecting the two-dimensional Brownian motion along a free boundary in the $ - abla V$ direction.

133 citations



Journal ArticleDOI
TL;DR: In this article, the existence of optimal trajectories associated with a generalized solution to the Hamilton-Jacobi-Bellman equation arising in optimal control was studied and the value function of an optimal control problem verifies these "contingent inequalities".
Abstract: In this paper we study the existence of optimal trajectories associated with a generalized solution to the Hamilton-Jacobi-Bellman equation arising in optimal control. In general, we cannot expect such solutions to be differentiable. But, in a way analogous to the use of distributions in PDE, we replace the usual derivatives with “contingent epiderivatives” and the Hamilton-Jacobi equation by two “contingent Hamilton-Jacobi inequalities.” We show that the value function of an optimal control problem verifies these “contingent inequalities.” Our approach allows the following three results: (a) The upper semicontinuous solutions to contingent inequalities are monotone along the trajectories of the dynamical system. (b) With every continuous solutionV of the contingent inequalities, we can associate an optimal trajectory along whichV is constant. (c) For such solutions, we can construct optimal trajectories through the corresponding optimal feedback. They are also “viscosity solutions” of a Hamilton-Jacobi equation. Finally, we prove a relationship between superdifferentials of solutions introduced by Crandallet al. [10] and the Pontryagin principle and discuss the link of viscosity solutions with Clarke's approach to the Hamilton-Jacobi equation.

110 citations


Book ChapterDOI
01 Jan 1989
TL;DR: In this article, the authors consider the value function corresponding to the optimal control of Zakai's equation and study various regularity properties of this value function in the context of L2 distributions.
Abstract: We consider here the value function corresponding to the optimal control of Zakai's equation and we study various regularity properties of this value function in the context of L2 distributions. In particular, we show that it is the unique viscosity solution of the corresponding infinite dimensionnal Hamilton-Jacobi-Bellman equation.

92 citations


Journal ArticleDOI
TL;DR: In this article, the Dirichlet problem for nonlinear elliptic Bellman equations with coefficients in the Holder space is considered and it is proved that if 0$ SRC=http://ejioporg/images/0025-5726/33/3/A07/tex_im_858_img6gif/> is sufficiently small, then this problem is solvable in
Abstract: The Dirichlet problem in , on , is considered for nonlinear elliptic equations, including Bellman equations with coefficients in the Holder space It is proved that if 0$ SRC=http://ejioporg/images/0025-5726/33/3/A07/tex_im_858_img6gif/> is sufficiently small, then this problem is solvable in If in addition and , then the solution belongs to Bibliography: 18 titles

66 citations


Journal ArticleDOI
TL;DR: In this paper, an estimate for the solutions of the continuous time versus the discrete time Hamilton-Jacobi-Bellman equations is given, and the technique used is more analytic than probabilistic.
Abstract: Some estimates for the approximation of optimal stochastic control problems by discrete time problems are obtained. In particular an estimate for the solutions of the continuous time versus the discrete time Hamilton–Jacobi–Bellman equations is given. The technique used is more analytic than probabilistic.

Journal ArticleDOI
TL;DR: A technique for approximating the viscosity solution of the Bellman equation in deterministic control problems, based on discrete dynamic programming, leads to monotonically converging schemes and allows to prove a priori error estimates.
Abstract: This paper presents a technique for approximating the viscosity solution of the Bellman equation in deterministic control problems. This technique, based on discrete dynamic programming, leads to monotonically converging schemes and allows to prove a priori error estimates. Several computational algorithms leading to monotone convergence are reviewed and compared.

Journal ArticleDOI
TL;DR: A numerical technique for computing optimal impulse controls for P.D.P.s. under general conditions is presented and it is shown that iteration of the single-jump-or-intervention operator generates a sequence of functions converging to the value function of the problem.
Abstract: In a recent paper we presented a numerical technique for solving the optimal stopping problem for a piecewise-deterministic process (P.D.P.) by discretization of the state space. In this paper we apply these results to the impulse control problem. In the first part of the paper we study the impulse control of P.D.P.s. under general conditions. We show that iteration of the single-jump-or-intervention operator generates a sequence of functions converging to the value function of the problem. In the second part of the paper we present a numerical technique for computing optimal impulse controls for P.D.P.s. This technique reduces the problem to a sequence of one-dimensional minimizations. We conclude by presenting some numerical examples.

Journal ArticleDOI
TL;DR: In this article, a necessary condition in terms of lower directional Dini derivates of the value function is given, and a strengthened version of the necessary condition gives an optimal feedback control and a procedure for approximating optimal controls.
Abstract: Optimal control problems governed by ordinary differential equations with control constraints that are not necessarily compact are considered. Conditions imposed on the data and on the structure of the terminal sets imply that the minimum is attained and that the value function is locally Lipschitz. A necessary condition in terms of lower directional Dini derivates of the value function is given. The condition reduces to the Bellman–Hamilton–Jacobi (BHJ) condition at points of differentiability of the value, and for a subclass of the problems considered implies that the value is a viscosity solution of the BHJ equation. A strengthened version of the necessary condition gives an optimal feedback control and a procedure for approximating optimal controls.

Journal ArticleDOI
TL;DR: In this paper, several assertions concerning viscosity solutions of the Hamilton-Jacobi-Bellman equation for the optimal control problem of steering a system to zero in minimal time are proved.
Abstract: In this paper several assertions concerning viscosity solutions of the Hamilton–Jacobi–Bellman equation for the optimal control problem of steering a system to zero in minimal time are proved. First two rather general uniqueness theorems are established, asserting that any positive viscosity solution of the HJB equation must, in fact, agree with the minimal time function near zero; if also a boundary condition introduced by Bardi [SIAM J Control Optim., 27 (1988), pp. 776–785] is satisfied, then the agreement is global. Additionally, the Holder continuity of any subsolution of the HJB equation is proved in the case where the related dynamics satisfy a Hormander-type hypothesis. This last assertion amounts to a “half-derivative” analogue of a theorem of Crandall and Lions [Traps. Amer. Math. Soc., 277 (1.983), pp. 1–42] concerning Lipschitz viscosity solutions.

Journal ArticleDOI
TL;DR: The value function for a problem in the economics of the optimal accumulation of information is calculated as a fixed point of a contraction mapping by direct numerical iteration as discussed by the authors, and the optimal policy function is obtained as is the function defined as the sum of current expected reward and the discounted expected value of following the optimum policy in the future.

Journal ArticleDOI
TL;DR: The possibilistic linear program in this paper is an unconstrained linear program with several objective functions whose coefficients are represented by possibility distributions whose coefficient distributions are derived from a possibility distribution.
Abstract: In this paper, a possibilistic linear program is formulated when a measurable multiattribute value function is given. The possibilistic linear program in this paper is an unconstrained linear program with several objective functions whose coefficients are represented by possibility distributions. A possibility measure and a necessity measure are derived from a possibility distribution. Using fuzzy integrals of the measurable multiattribute value function with respect to the possibility measure and the necessity measure, the possible value and the necessary value are defined respectively. In an analogy of the expected utility, the principles of maximizing the possible value and the necessary value are considered as decision procedures under a possibility distribution. The possibilistic linear program is formulated based on these decision procedures and reduced to a nonlinear program. A solution method using linear programming technique is proposed. INFORMS Journal on Computing, ISSN 1091-9856, was publishe...

Journal ArticleDOI
TL;DR: In this article, the optimal control problem for systems governed by nonlinear "parabolic" state equations, using a dynamic programming approach, is studied, and the fact that the value function is a generalized viscosity solution of the associated Hamilton-Jacobi equation is proved.
Abstract: Optimal control problems, with no discount, are studied for systems governed by nonlinear “parabolic” state equations, using a dynamic programming approach.If the dynamics are stabilizable with respect to cost, then the fact that the value function is a generalized viscosity solution of the associated Hamilton–Jacobi equation is proved. This yields the feedback formula. Moreover, uniqueness is obtained under suitable stability assumptions.

Journal ArticleDOI
TL;DR: In this article, a dynamic programming equation is derived for general semi-Markov models under the long-run average reward criterion, focusing on it as a limiting case of optimization under discounting as the discount factor goes to one.

Journal ArticleDOI
TL;DR: In this article, the authors present a maximum theorem under convex structures but with weaker continuity requirements, and illustrate the usefulness of their results by an application to a problem encountered in the theory of optimal intertemporal allocation.

Journal ArticleDOI
TL;DR: In this paper, the authors studied the infinite-horizon deterministic control problem of minimizing the minimal long-run average cost growth rate, and proved that it is equal to minxL(x, 0) in the following two cases: one is the scalar casen = 1 and the other is when the integrand is in a separated form.
Abstract: We study the infinite-horizon deterministic control problem of minimizingź0T L(z, ź) dt, Tźź, whereL(z, ·) is convex inź for fixedz but not necessarily jointly convex in (z, ź). We prove the existence of a solution to the infinite-horizon Bellman equation and use it to define a differential inclusion, which reduces in certain cases to an ordinary differential equation. We discuss cases where solutions of this differential inclusion (equation) provide optimal solutions (in the overtaking optimality sense) to the optimization problem. A quantity of special interest is the minimal long-run average-cost growth rate. We compute it explicitly and show that it is equal to minxL(x, 0) in the following two cases: one is the scalar casen = 1 and the other is' when the integrand is in a separated form $$l(x) + g(\dot x)$$

Journal ArticleDOI
TL;DR: In this article, it was proved that Bellman equations are solvable in smooth strictly convex domains, and it was shown that the Bellman equation can be solved in a convex domain.
Abstract: It is proved that Bellman equations are solvable in smooth strictly convex domains.Bibliography: 6 titles.

Journal ArticleDOI
TL;DR: In this article, the authors give very accurate conditions to have Lipschitz behavior for the optimal solutions of a mathematical programming problem with natural perturbations in some fixed direction, and then use these conditions to obtain the directional derivative of the optimal value function.
Abstract: This paper gives very accurate conditions to have Lipschitz behaviour for the optimal solutions of a mathematical programming problem with natural perturbations in some fixed direction. This result is then used to obtain the directional derivative for the optimal value function.

Posted Content
TL;DR: In this paper, the accuracy of two versions of the procedure proposed by Kydland and Prescott (1980, 1982) for approximating the optional decision rules in problems in which the objective fails to be quadratic and the constraints linear.
Abstract: This paper studies the accuracy of two versions of the procedure proposed by Kydland and Prescott (1980, 1982) for approximating the optional decision rules in problems in which the objective fails to be quadratic and the constraints linear. The analysis is carried out in the context of a particular example: a version of the Brock-Mirman (1972) model of optimal economic growth. Although the model is not linear quadratic, its solution can nevertheless be computed with arbitrary accuracy using a variant of the value function iteration procedures described in Bertsekas (1976). I find that the Kydland-Prescott approximate decision rules are very similar to those implied by value function iteration.

Journal ArticleDOI
TL;DR: In this paper, the Parameter Iteration Method (PIM) is used to obtain an approximate solution for the problem of optimal replacement policy for a multi-item Markovian system.
Abstract: Many practical problems involve making optimal decisions for systems with state characterized by many components. These problems lead to dynamic programming problems with a very large number of state variables. Thus, an exact derivation of the optimal policy for such problems is not feasible to solve numerically due to the great amount of computer time and storage involved. This paper presents a practical method, denoted as the Parameter Iteration Method, for obtaining an approximate solution for the above described problem. The computational difficulty caused by the tremendously large dimensionality of the state variable is overcome by means of an iterative method which combines simulation and recursive estimation to compute successive approximations of the value function. The implementation of the Parameter Iteration Method is illustrated for the problem of optimal replacement policy for a multi-item Markovian system.

01 Jan 1989
TL;DR: In this paper, the authors presented a numerical procedure for optimal feedback control based on the dis- cretization of the infinitesimal generator using finite difference techniques and applied these techniques to the control of semi-actice suspensw~ for road vehicle.
Abstract: We study a class of ergodic stochastic contd problems for difision processes. We present a numerical ap prozimation to the optimal feedback control based on the dis- cretization of the infinitesimal generator using finite diflerence schemes. Finally, we apply these techniques to the control of semi-actice suspensw~ for road vehicle. This paper deals with a numerical procedure for optimal stochastic control problems and ita application to a non trivial example. This procedure consists in approximating the non linear Hamilton-Jacobi-Bellman partial differential equation which is formally satisfied by the minimal cost function. We use finite difference techniques and with a suitable choice of the schemes, the resulting discrete equation can be viewed as the dynamic programming equation for the minimal cost function for the optimal control of a certain Markov process with finite state apace (14). In section 1, we introduce a particular class - denoted by C - of ergodic control problems. Some characteristics of this problem are non classical (the diffusion is degenerate, the coefficients are non linear and discontinuous) and there is no available result concerning the HJB equation. This class of problems derives from I particnlar application in control of suspension system (3). In section 2, the approximation procedure is detailed in a more genera context than the class C. For the special case of the cl- C we have already stated two types of results (3): ex- istence and uniqueness property for the discrete HJB equation (with convergence of the algorithm used for solving it) and a convergence property of the approximation as the discretiza- tion step tends to 0. Finally, we apply these techniques to the suspension problem (3,2) and perform some numerical tests; related suboptimal and adaptive techniques may be found in 121-

Book
01 Nov 1989
TL;DR: In this article, the problem of controlling the production rate of a failure prone manufacturing system so as to minimize the discounted inventory, cost, where certain cost rates are specified for both positive and negative inventories, and there is a constant demand rate for the commodity produced.
Abstract: We address the problem of controlling the production rate of a failure prone manufacturing system so as to minimize the discounted inventory, cost, where certain cost rates are specified for both positive and negative inventories, and there is a constant demand rate for the commodity produced. The underlying theoretical problem is the optimal control of a continuous-time system with jump Markov disturbances, with an infinite horizon discounted cost criterion. We use two complementary approaches. First, proceeding informally, and using a combination of stochastic coupling, linear system arguments, stable and unstable eigenspaces, renewal theory, parametric optimization, etc., we arrive at a conjecture for the optimal policy. Then we address the previously ignored mathematical difficulties associated with differential equations with discontinuous right-hand sides, singularity of the optimal control problem, smoothness, and validity of the dynamic programming equation, etc., to give a rigorous proof of optimality of the conjectured policy. It is hoped that both approaches will find uses in other such problems also. We obtain the complete solution and show that the optimal solution is simply characterized by a certain critical number, which we call the optimal inventory level. If the current inventory level exceeds the optimal, one should not produce at all; if less, one should produce at the maximum rate; while if exactly equal, one should produce exactly enough to meet demand. We also give a simple explicit formula for the optimal inventory level.

Book ChapterDOI
01 Jan 1989
TL;DR: In this article, the authors study a case where the directional derivative is obtained with a nice formula when some corresponding optimal solutions have Lipschitzian or Holderian directional behaviour.
Abstract: The conditions for the existence of the directional derivative of the optimal value function in mathematical programming is a difficult question still not completely solved. Here we study a case where the directional derivative is obtained with a nice formula when some corresponding optimal solutions have Lipschitzian or Holderian directional behaviour. These calm properties for optimal solutions are obtained with near to minimal assumptions and regularity conditions (constraints qualification) as illustrated by examples.

Journal ArticleDOI
TL;DR: In this article, the optimal switching problem for systems governed by abstract semilinear evolution equations is considered, where the generator of semigroups is allowed to depend on the decision parameter and the nonlinear term in the evolution equation and the integral in the index are allowed to be unbounded.

Journal ArticleDOI
TL;DR: In this article, the value function of an optimally controlled stochasticswitching process can be shown to satisfy a boundary value problem for a fully nonlinear second-order elliptic differential equation of Hamilton-Jacobi-Bellman (HJB-) type.
Abstract: By the dynamic programming principle the value function of an optimally controlled stochasticswitching process can be shown to satisfy a boundary value problem for a fully nonlinear second-order elliptic differential equation of Hamilton-Jacobi-Bellman (HJB-) type. For the numerical solution of that HJB-equation we present a multi-grid algorithm whose main features arethe use of nonlinear Gauss-Seidel iteration in the smoothing process and an adaptive local choice of prolongations and restrictions in the coarse-to-fine and fine-to-coarse transfers. Local convergence is proved by combining nonlinear multi-grid convergence theory and elementarysubdifferential calculus. The efficiency of the algorithm is demonstrated for optimal advertising in stochastic dynamic sales response models of Vidale-Wolfe type.