scispace - formally typeset
Search or ask a question

Showing papers on "Bellman equation published in 1997"


Book
18 Dec 1997
TL;DR: In this paper, the main ideas on a model problem with continuous viscosity solutions of Hamilton-Jacobi equations are discussed. But the main idea of the main solutions is not discussed.
Abstract: Preface.- Basic notations.- Outline of the main ideas on a model problem.- Continuous viscosity solutions of Hamilton-Jacobi equations.- Optimal control problems with continuous value functions: unrestricted state space.- Optimal control problems with continuous value functions: restricted state space.- Discontinuous viscosity solutions and applications.- Approximation and perturbation problems.- Asymptotic problems.- Differential Games.- Numerical solution of Dynamic Programming.- Nonlinear H-infinity control by Pierpaolo Soravia.- Bibliography.- Index

2,747 citations


Journal ArticleDOI
TL;DR: In this article, reflected solutions of one-dimensional backward stochastic differential equations are studied and the authors prove uniqueness and existence both by a fixed point argument and by approximation via penalization.
Abstract: We study reflected solutions of one-dimensional backward stochastic differential equations. The “reflection” keeps the solution above a given stochastic process. We prove uniqueness and existence both by a fixed point argument and by approximation via penalization. We show that when the coefficient has a special form, then the solution of our problem is the value function of a mixed optimal stopping–optimal stochastic control problem. We finally show that, when put in a Markovian framework, the solution of our reflected BSDE provides a probabilistic formula for the unique viscosity solution of an obstacle problem for a parabolic partial differential equation.

781 citations


Journal ArticleDOI
TL;DR: This paper states sufficient conditions that guarantee that the Galerkin approximation converges to the solution of the GHJB equation and that the resulting approximate control is stabilizing on the same region as the initial control.

580 citations


Proceedings Article
01 Aug 1997
TL;DR: It is found that incremental pruning is presently the most efficient exact method for solving POMDPS.
Abstract: Most exact algorithms for general partially observable Markov decision processes (POMDPs) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We find that incremental pruning is presently the most efficient exact method for solving POMDPS.

441 citations


Book
30 Apr 1997
TL;DR: In this article, a generalized solution of Bellman's Differential Equation and multiplicative additive asymptotics is presented, which is based on the Maslov Optimziation Theory.
Abstract: Preface. 1. Idempotent Analysis. 2. Analysis of Operators on Idempotent Semimodules. 3. Generalized Solutions of Bellman's Differential Equation. 4. Quantization of the Bellman Equation and Multiplicative Asymptotics. References. Appendix: (P. Del Moral) Maslov Optimziation Theory. Optimality versus Randomness. Index.

425 citations


Journal ArticleDOI
TL;DR: In this article, the value function of the stochastic control problem is a smooth solution of the associated Hamilton-Jacobi-Bellman (HJB) equation and the optimal policy is shown to exist and given in a feedback form from the optimality conditions in the HJB equation.

256 citations


Journal ArticleDOI
TL;DR: This joint research aids in understanding the prediction problem as a whole and reveals additional requirements for seismostatistics and algorithmmic and exact solutions are indicated.
Abstract: In this review we consider an interdisciplinary problem of earthquake prediction involving economics. This joint research aids in understanding the prediction problem as a whole and reveals additional requirements for seismostatistics. We formulate the problem as an optimal control problem: Prossessing the possibility to declare several types of alerts, it is necessary to find an optimal changing alert types; each successful prediction prevents a certain amount of losses; total expected losses are integrated over the semi-infinite time interval. The discount factor is included in the model. Algorithmic and exact solutions are indicated. This paper is based on the recent results byMolchan (1990, 1991, 1992).

184 citations


Book
01 Jan 1997
TL;DR: In this article, a computer program for the solution of the General Inverse Problem using Dynamic Programming and Generalized Cross Validation Index is presented, which is a generalization of the generalized cross validation index.
Abstract: Dynamic Programming System Introduction The Simplest Exchange Bellman's Principle of Optimality First-Order Dynamic System General Multidimensional System Optimal Control as a Multistage Decision Process Matrices and Differential Equations Introduction Vector-Matrix Calculus The Exponential Matrix Approximations to the Exponential Matrix Eigenvalue Reduction The General Inverse Problem Introduction Generalized Cross Validation Dynamic Programming and Generalized Cross Validation Chandrasekhar Equations The Inverse Heat Conduction Problem Introduction One-Dimensional Example Two-Dimensional Example Eigenvalue Reduction Technique L-Curve Analysis The Inverse Structural Dynamics Problem Introduction Single-Degree-of-Freedom Cantilever Beam Problem Two-Dimensional Plate Problem Smoothing and Differentiating Noisy Data Introduction Polynomial Approximation Filtering a 60 Hz Signal Frequency Analysis Two-Dimensional Smoothing Nonlinear Systems Introduction Linearization Methods Nonlinear Inverse Heat Conduction Nonlinear Spring Example Successive Approximation in Policy Space Sequential Estimation and System Identification Introduction Sequential Estimation Multidimensional Sequential Estimation Extended Levenberg-Marquardt's Method Bibliography Appendix A. DYNAVAL: A Computer Program for the Solution of the General Inverse Problem Using Dynamic Programming and Generalized Cross-Validation Index

145 citations


Journal ArticleDOI
Lars Grüne1
TL;DR: In this paper, an adaptive finite difference scheme for the solution of the discrete first order Hamilton-Jacobi-Bellman equation is presented, based on local a posteriori error estimates and certain properties of these estimates are proved.
Abstract: In this paper an adaptive finite difference scheme for the solution of the discrete first order Hamilton-Jacobi-Bellman equation is presented. Local a posteriori error estimates are established and certain properties of these estimates are proved. Based on these estimates an adapting iteration for the discretization of the state space is developed. An implementation of the scheme for two-dimensional grids is given and numerical examples are discussed.

126 citations


Journal ArticleDOI
TL;DR: New results using viability theory are provided that allow us to study optimal time functions free from the controllability assumptions classically made in the partial differential equations approach.
Abstract: We study optimal times to reach a given closed target for controlled systems with a state constraint. Our goal is to characterize these optimal time functions in such a way that it is possible to compute them numerically and we do not need to compute trajectories of the controlled system. In this paper we provide new results using viability theory. This allows us to study optimal time functions free from the controllability assumptions classically made in the partial differential equations approach.

113 citations


Journal ArticleDOI
TL;DR: The properties of algorithms that characterize the solution of the Bellman equation of a stochastic dynamic program as the solution to a linear program are reviewed, and it is shown that fitting this approximation through linear programming provides upper and lower bounds on the problem.
Abstract: We review the properties of algorithms that characterize the solution of the Bellman equation of a stochastic dynamic program, as the solution to a linear program. The variables in this problem are the ordinates of the value function; hence, the number of variables grows with the state space. For situations in which this size becomes computationally burdensome, we suggest the use of low-dimensional cubic-spline approximations to the value function. We show that fitting this approximation through linear programming provides upper and lower bounds on the solution to the original large problem. The information contained in these bounds leads to inexpensive improvements in the accuracy of approximate solutions.

Proceedings Article
27 Jul 1997
TL;DR: Novel incremental versions of grid-based linear interpolation method and simple lower bound method with Sondik's updates are introduced and a new method for computing an initial upper bound - the fast informed bound method is introduced.
Abstract: Partially observable Markov decision processes (POMDPs) allow one to model complex dynamic decision or control problems that include both action outcome uncertainty and imperfect observability. The control problem is formulated as a dynamic optimization problem with a value function combining costs or rewards from multiple steps. In this paper we propose, analyse and test various incremental methods for computing bounds on the value function for control problems with infinite discounted horizon criteria. The methods described and tested include novel incremental versions of grid-based linear interpolation method and simple lower bound method with Sondik's updates. Both of these can work with arbitrary points of the belief space and can be enhanced by various heuristic point selection strategies. Also introduced is a new method for computing an initial upper bound - the fast informed bound method. This method is able to improve significantly on the standard and commonly used upper bound computed by the MDP-based method. The quality of resulting bounds are tested on a maze navigation problem with 20 states, 6 actions and 8 observations.

Journal ArticleDOI
TL;DR: In this paper, the authors studied the optimal production planning in a dynamic stochastic manufacturing system consisting of a single machine that is failure prone and facing a constant demand, and the objective was to choose the rate of production over time in order to minimize the long-run average cost of production and surplus.
Abstract: This paper is concerned with the optimal production planning in a dynamic stochastic manufacturing system consisting of a single machine that is failure prone and facing a constant demand. The objective is to choose the rate of production over time in order to minimize the long-run average cost of production and surplus. The analysis proceeds with a study of the corresponding problem with a discounted cost. It is shown using the vanishing discount approach that the Hamilton–Jacobi–Bellman equation for the average cost problem has a solution giving rise to the minimal average cost and the so-called potential function. The result helps in establishing a verification theorem. Finally, the optimal control policy is specified in terms of the potential function.

Dissertation
01 Jan 1997
TL;DR: Experimental results show that methods that preserve the shape of the value function over updates, such as the newly designed incremental linear vector and fast informed bound methods, tend to outperform other methods on the control performance test.
Abstract: Partially observable Markov decision processes (POMDPs) can be used to model complex control problems that include both action outcome uncertainty and imperfect observability. A control problem within the POMDP framework is expressed as a dynamic optimization problem with a value function that combines costs or rewards from multiple steps. Although the POMDP framework is more expressive than other simpler frameworks, like Markov decision processes (MDP), its associated optimization methods are more demanding computationally and only very small problems can be solved exactly in practice. The thesis focuses on two possible approaches that can he used to solve larger problems: approximation methods and exploitation of additional problem structure. First, a number of new efficient approximation methods and improvements of existing algorithms are proposed. These include (1) the fast informed bound method based on approximate dynamic programming updates that lead to piecewise linear and convex value functions with a constant number of linear vectors, (2) a grid-based point interpolation method that supports variable grids, (3) an incremental version of the linear vector method that updates value function derivatives, as well as (4) various heuristics for selecting grid-points. The new and existing methods are experimentally tested and compared on a set of three infinite discounted horizon problems of different complexity. The experimental results show that methods that preserve the shape of the value function over updates, such as the newly designed incremental linear vector and fast informed bound methods, tend to outperform other methods on the control performance test. Second, the thesis presents a number of techniques for exploiting additional structure in the model of complex control problems. These are studied as applied to a medical therapy planning problem--the management of patients with chronic ischemic heart disease. The new extensions proposed include factored and hierarchically structured models that combine the advantages of the POMDP and MDP frameworks and cut down the size and complexity of the information state space.

Journal ArticleDOI
TL;DR: In this paper, the authors studied the long run average cost minimization of a stochastic inventory problem with Markovian demand, fixed ordering cost, and convex surplus cost.
Abstract: This paper is concerned with long-run average cost minimization of a stochastic inventory problem with Markovian demand, fixed ordering cost, and convex surplus cost. The states of the Markov chain represent different possible states of the environment. Using a vanishing discount approach, a dynamic programming equation and the corresponding verification theorem are established. Finally, the existence of an optimal state-dependent (s, S) policy is proved.

Journal ArticleDOI
Ralf Korn1
TL;DR: A generalised impulse control model for controlling a process governed by a stochastic differential equation and optimality results relating the value function to quasi-variational inequalities and a formal optimal stopping problem are state.
Abstract: We consider a generalised impulse control model for controlling a process governed by a stochastic differential equation. The controller can only choose a parameter of the probability distribution of the consequence of his control action which is therefore random. We state optimality results relating the value function to quasi-variational inequalities and a formal optimal stopping problem. We also remark that the value function is a viscosity solution of the quasi-variational inequalities which could lead to developments and convergence proofs of numerical schemes. Further, we give some explicit examples and an application in financial mathematics, the optimal control of the exchange rate.

Journal ArticleDOI
TL;DR: The existence of the ergodic attractor is shown in Theorems 1 and 2 in this paper, and the existence of qualitative properties exist behind the convergence of the terms λuλ(x), ≠ u(x,T) in the Hamilton-Jacobi-Bellman equations (HJBs) as λ tends to + 0, T goes to +∞, to the unique number.
Abstract: The problem of the convergence of the terms λuλ(x), ≠ u(x,T) in the Hamilton-Jacobi-Bellman equations (HJBs) as λ tends to +0, T goes to +∞, to the unique number is called the ergodic problem of the HJBs. We show in this paper what kind of qualitative properties exist behind this kind of convergence. The existence of the ergodic attractor is shown in Theorems 1 and 2. Our solutions of HJBs satisfy the equations in the viscosity solutions sense.

Book ChapterDOI
01 Jan 1997
TL;DR: In this article, the optimal set approach was used for sensitivity analysis for linear programming and it was shown that optimal partitions and optimal sets remain constant between two consecutive transition points of the optimal value function.
Abstract: In this chapter we describe the optimal set approach for sensitivity analysis for LP We show that optimal partitions and optimal sets remain constant between two consecutive transition-points of the optimal value function The advantage of using this approach instead of the classical approach (using optimal bases) is shown Moreover, we present an algorithm to compute the partitions, optimal sets and the optimal value function This is a new algorithm and uses primal and dual optimal solutions We also extend some of the results to parametric quadratic programming, and discuss differences and resemblances with the linear programming case

Journal ArticleDOI
TL;DR: In this article, the authors consider a class of finite horizon optimal control problems with unbounded data for nonlinear systems, which includes the Linear-Quadratic (LQ) problem.
Abstract: We consider a class of finite horizon optimal control problems with unbounded data for nonlinear systems, which includes the Linear-Quadratic (LQ) problem. We give comparison results between the value function and viscosity sub- and supersolutions of the Bellman equation, and prove uniqueness for this equation among locally Lipschitz functions bounded below. As an application we show that an optimal control for the LQ problem is nearly optimal for a large class of small unbounded nonlinear and non-quadratic perturbations of the same problem.

Journal ArticleDOI
TL;DR: In this paper, a necessary and sufficient condition for local Lipschitz continuity of the optimal value as a function of the initial position is given for optimal control problems in an arbitrary closed set, where the dynamics can depend in a measurable way on the time.
Abstract: For optimal control problems in $$\mathbb{R}^n $$ with given target and free final time, we obtain a necessary and sufficient condition for local Lipschitz continuity of the optimal value as a function of the initial position. The target can be an arbitrary closed set, and the dynamics can depend in a measurable way on the time. As a limit case of this condition, we obtain a characterization of the viability property of the target, in terms of perpendiculars to the target instead of tangent cones. As an application, we analyze the convergence of certain discretization schemes for time-optimal problems.

Journal ArticleDOI
Endre Pap1
TL;DR: A unified method is given for solving some nonlinear differential (ordinary and partial), difference and optimization equations, using pseudo-superposition principle and pseudoLaplace transform.

Journal ArticleDOI
TL;DR: It is shown that the optimal value function of an MDP is monotone with respect to appropriately defined stochastic order relations, and conditions for continuity withrespect to suitable probability metrics are found.
Abstract: The present work deals with the comparison of discrete time Markov decision processes MDPs, which differ only in their transition probabilities. We show that the optimal value function of an MDP is monotone with respect to appropriately defined stochastic order relations. We also find conditions for continuity with respect to suitable probability metrics. The results are applied to some well-known examples, including inventory control and optimal stopping.

Journal ArticleDOI
TL;DR: In this article, the authors study the bilevel dynamic problem, which is a hierarchy of two dynamic optimization problems, where the constraint region of the upper level problem is determined implicitly by the solutions to the lower level optimal control problem.
Abstract: In this paper we study the bilevel dynamic problem, which is a hierarchy of two dynamic optimization problems, where the constraint region of the upper level problem is determined implicitly by the solutions to the lower level optimal control problem. To obtain optimality conditions, we reformulate the bilevel dynamic problem as a single level optimal control problem that involves the value function of the lower-level problem. Sensitivity analysis of the lower-level problem with respect to the perturbation in the upper-level decision variable is given and first-order necessary optimality conditions are derived by using nonsmooth analysis. A constraint qualification of calmness type and a sufficient condition for the calmness are also given.

Journal ArticleDOI
TL;DR: In this article, a new verification theorem is derived within the framework of viscosity solutions without involving any derivatives of the value functions, which is shown to have wider applicability than the restrictive classical verification theorems, which require the associated dynamic programming equations to have smooth solutions.
Abstract: This paper studies controlled systems governed by Ito's stochastic differential equations in which control variables are allowed to enter both drift and diffusion terms. A new verification theorem is derived within the framework of viscosity solutions without involving any derivatives of the value functions. This theorem is shown to have wider applicability than the restrictive classical verification theorems, which require the associated dynamic programming equations to have smooth solutions. Based on the new verification result, optimal stochastic feedback controls are obtained by maximizing the generalized Hamiltonians over both the control regions and the superdifferentials of the value functions.

Proceedings Article
01 Dec 1997
TL;DR: A RL algorithm is proposed based on a Finite-Difference method and proved to convergence to the optimal solution of the Hamilton-Jacobi-Bellman equation.
Abstract: This paper is concerned with the problem of Reinforcement Learning (RL) for continuous state space and time stochastic control problems. We state the Hamilton-Jacobi-Bellman equation satisfied by the value function and use a Finite-Difference method for designing a convergent approximation scheme. Then we propose a RL algorithm based on this scheme and prove its convergence to the optimal solution.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the assumption that the abnormal part of the directional derivative of the optimal value function reduces to zero has to be replaced by the demand that a nonzero abnormal Lagrange multiplier does not exist.
Abstract: In Ref. 1, bilevel programming problems have been investigated using an equivalent formulation by use of the optimal value function of the lower level problem. In this comment, it is shown that Ref. 1 contains two incorrect results: in Proposition 2.1, upper semicontinuity instead of lower semicontinuity has to be used for guaranteeing existence of optimal solutions; in Theorem 5.1, the assumption that the abnormal part of the directional derivative of the optimal value function reduces to zero has to be replaced by the demand that a nonzero abnormal Lagrange multiplier does not exist.

01 Jan 1997
TL;DR: In this article, a numerical algorithm for the computation of the optimal control for the linear quadratic regulator problem with a positivity constraint on the admissible control set is presented, and sufficient conditions for optimality are presented in terms of inner products, projections on closed convex sets, Pontryagin's maximum principle and dynamic programming.
Abstract: In this paper, the Linear Quadratic Regulator Problem with a positivity constraint on the admissible control set is addressed. Necessary and sufficient conditions for optimality are presented in terms of inner products, projections on closed convex sets, Pontryagin's maximum principle and dynamic programming. The main results are concerned with smoothness of the optimal control and the value function. The maximum principle will be extended to the infinite horizon case. Based on these analytical methods, we propose a numerical algorithm for the computation of the optimal controls for the finite and infinite horizon problem. The numerical methods will be justified by convergence properties between the finite and infinite horizon case on one side and discretized optimal controls and the true optimal control on the other.

Proceedings Article
01 Dec 1997
TL;DR: It is demonstrated, that an optimal ratio of time to space discretization is crucial for optimal learning rates and accuracy of the approximate optimal value function.
Abstract: We propose local error estimates together with algorithms for adaptive a-posteriori grid and time refinement in reinforcement learning. We consider a deterministic system with continuous state and time with infinite horizon discounted cost functional. For grid refinement we follow the procedure of numerical methods for the Bellman-equation. For time refinement we propose a new criterion, based on consistency estimates of discrete solutions of the Bellman-equation. We demonstrate, that an optimal ratio of time to space discretization is crucial for optimal learning rates and accuracy of the approximate optimal value function.

Journal ArticleDOI
TL;DR: In this article, the strong solution of a semi-linear HJB equation with stochastic optimal control in a Hilbert space H is considered, where strong solution is defined as a solution in a L2(μ,H)-Sobolev space.
Abstract: We consider the strong solution of a semi linear HJB equation associated with a stochastic optimal control in a Hilbert space H. By strong solution we mean a solution in a L2(μ,H)-Sobolev space setting. Within this framework, the present problem can be treated in a similar fashion to that of a finite-dimensional case. Of independent interest, a related linear problem with unbounded coefficient is studied and an application to the stochastic control of a reaction-diffusion equation will be given.

Journal ArticleDOI
TL;DR: This paper considers the problem of determining a path that maximizes a multi-attribute, non-order-preserving value function and uses the best-first search algorithm BU* to determine optimal routes for both the q = 0 and q cases.
Abstract: In this paper, we consider the problem of determining a path that maximizes a multi-attribute, non-order-preserving value function. The motivating application is the determination of a most preferred path for transporting hazardous materials based on transportation cost and risk to population. A sub-path of an optimal path may not be optimal for a non-order-preserving value function, implying that a traditional application of dynamic programming may intentionally or unintentionally produce sub-optimal paths. We consider two approximation procedures for two general cases, the q = 0 case and the q > 0 case, where q is the number of required intermediate stops between origin and destination. The first approximation procedure involves applying dynamic programming as if a sub-path of an optimal path were always optimal. The second approximation procedure involves determining a linear order-preserving criterion that approximates the non-order-preserving value function and then applying dynamic programming. We u...