Showing papers on "Bellman equation published in 1997"

PDF

Open Access

Book•

Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations

[...]

18 Dec 1997

TL;DR: In this paper, the main ideas on a model problem with continuous viscosity solutions of Hamilton-Jacobi equations are discussed. But the main idea of the main solutions is not discussed.

...read moreread less

Abstract: Preface.- Basic notations.- Outline of the main ideas on a model problem.- Continuous viscosity solutions of Hamilton-Jacobi equations.- Optimal control problems with continuous value functions: unrestricted state space.- Optimal control problems with continuous value functions: restricted state space.- Discontinuous viscosity solutions and applications.- Approximation and perturbation problems.- Asymptotic problems.- Differential Games.- Numerical solution of Dynamic Programming.- Nonlinear H-infinity control by Pierpaolo Soravia.- Bibliography.- Index

...read moreread less

2,747 citations

Journal Article•DOI•

Reflected solutions of backward SDE's, and related obstacle problems for PDE's

[...]

N. El Karoui, C. Kapoudjian, Etienne Pardoux, Shige Peng, M. C. Quenez - Show less +1 more

01 Apr 1997-Annals of Probability

TL;DR: In this article, reflected solutions of one-dimensional backward stochastic differential equations are studied and the authors prove uniqueness and existence both by a fixed point argument and by approximation via penalization.

...read moreread less

Abstract: We study reflected solutions of one-dimensional backward stochastic differential equations. The “reflection” keeps the solution above a given stochastic process. We prove uniqueness and existence both by a fixed point argument and by approximation via penalization. We show that when the coefficient has a special form, then the solution of our problem is the value function of a mixed optimal stopping–optimal stochastic control problem. We finally show that, when put in a Markovian framework, the solution of our reflected BSDE provides a probabilistic formula for the unique viscosity solution of an obstacle problem for a parabolic partial differential equation.

...read moreread less

781 citations

Journal Article•DOI•

Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation

[...]

Randal W. Beard¹, George N. Saridis², John T. Wen¹•Institutions (2)

Brigham Young University¹, Rensselaer Polytechnic Institute²

01 Dec 1997-Automatica

TL;DR: This paper states sufficient conditions that guarantee that the Galerkin approximation converges to the solution of the GHJB equation and that the resulting approximate control is stabilizing on the same region as the initial control.

...read moreread less

580 citations

Proceedings Article•

Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

[...]

Anthony R. Cassandra¹, Michael L. Littman², Nevin L. Zhang•Institutions (2)

Brown University¹, Duke University²

01 Aug 1997

TL;DR: It is found that incremental pruning is presently the most efficient exact method for solving POMDPS.

...read moreread less

Abstract: Most exact algorithms for general partially observable Markov decision processes (POMDPs) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We find that incremental pruning is presently the most efficient exact method for solving POMDPS.

...read moreread less

441 citations

Book•

Idempotent Analysis and Its Applications

[...]

Vassili N. Kolokoltsov, Victor Pavlovich Maslov

30 Apr 1997

TL;DR: In this article, a generalized solution of Bellman's Differential Equation and multiplicative additive asymptotics is presented, which is based on the Maslov Optimziation Theory.

...read moreread less

Abstract: Preface. 1. Idempotent Analysis. 2. Analysis of Operators on Idempotent Semimodules. 3. Generalized Solutions of Bellman's Differential Equation. 4. Quantization of the Bellman Equation and Multiplicative Asymptotics. References. Appendix: (P. Del Moral) Maslov Optimziation Theory. Optimality versus Randomness. Index.

...read moreread less

425 citations

Journal Article•DOI•

Hedging in incomplete markets with HARA utility

[...]

Darrell Duffie¹, Wendell H. Fleming², H. Mete Soner³, Thaleia Zariphopoulou⁴•Institutions (4)

Stanford University¹, Brown University², Carnegie Mellon University³, University of Wisconsin-Madison⁴

01 May 1997-Journal of Economic Dynamics and Control

TL;DR: In this article, the value function of the stochastic control problem is a smooth solution of the associated Hamilton-Jacobi-Bellman (HJB) equation and the optimal policy is shown to exist and given in a feedback form from the optimality conditions in the HJB equation.

...read moreread less

256 citations

Journal Article•DOI•

Earthquake prediction as a decision-making problem

[...]

G. Molchan¹•Institutions (1)

Russian Academy of Sciences¹

01 Jan 1997-Pure and Applied Geophysics

TL;DR: This joint research aids in understanding the prediction problem as a whole and reveals additional requirements for seismostatistics and algorithmmic and exact solutions are indicated.

...read moreread less

Abstract: In this review we consider an interdisciplinary problem of earthquake prediction involving economics. This joint research aids in understanding the prediction problem as a whole and reveals additional requirements for seismostatistics. We formulate the problem as an optimal control problem: Prossessing the possibility to declare several types of alerts, it is necessary to find an optimal changing alert types; each successful prediction prevents a certain amount of losses; total expected losses are integrated over the semi-infinite time interval. The discount factor is included in the model. Algorithmic and exact solutions are indicated. This paper is based on the recent results byMolchan (1990, 1991, 1992).

...read moreread less

184 citations

Book•

Practical inverse analysis in engineering

[...]

David M. Trujillo, H. R. Busby¹•Institutions (1)

Ohio State University¹

01 Jan 1997

TL;DR: In this article, a computer program for the solution of the General Inverse Problem using Dynamic Programming and Generalized Cross Validation Index is presented, which is a generalization of the generalized cross validation index.

...read moreread less

Abstract: Dynamic Programming System Introduction The Simplest Exchange Bellman's Principle of Optimality First-Order Dynamic System General Multidimensional System Optimal Control as a Multistage Decision Process Matrices and Differential Equations Introduction Vector-Matrix Calculus The Exponential Matrix Approximations to the Exponential Matrix Eigenvalue Reduction The General Inverse Problem Introduction Generalized Cross Validation Dynamic Programming and Generalized Cross Validation Chandrasekhar Equations The Inverse Heat Conduction Problem Introduction One-Dimensional Example Two-Dimensional Example Eigenvalue Reduction Technique L-Curve Analysis The Inverse Structural Dynamics Problem Introduction Single-Degree-of-Freedom Cantilever Beam Problem Two-Dimensional Plate Problem Smoothing and Differentiating Noisy Data Introduction Polynomial Approximation Filtering a 60 Hz Signal Frequency Analysis Two-Dimensional Smoothing Nonlinear Systems Introduction Linearization Methods Nonlinear Inverse Heat Conduction Nonlinear Spring Example Successive Approximation in Policy Space Sequential Estimation and System Identification Introduction Sequential Estimation Multidimensional Sequential Estimation Extended Levenberg-Marquardt's Method Bibliography Appendix A. DYNAVAL: A Computer Program for the Solution of the General Inverse Problem Using Dynamic Programming and Generalized Cross-Validation Index

...read moreread less

145 citations

Journal Article•DOI•

An adaptive grid scheme for the discrete Hamilton-Jacobi-Bellman equation

[...]

Lars Grüne¹•Institutions (1)

Augsburg College¹

01 Jan 1997-Numerische Mathematik

TL;DR: In this paper, an adaptive finite difference scheme for the solution of the discrete first order Hamilton-Jacobi-Bellman equation is presented, based on local a posteriori error estimates and certain properties of these estimates are proved.

...read moreread less

Abstract: In this paper an adaptive finite difference scheme for the solution of the discrete first order Hamilton-Jacobi-Bellman equation is presented. Local a posteriori error estimates are established and certain properties of these estimates are proved. Based on these estimates an adapting iteration for the discretization of the state space is developed. An implementation of the scheme for two-dimensional grids is given and numerical examples are discussed.

...read moreread less

126 citations

Journal Article•DOI•

Optimal times for constrained nonlinear control problems without local controllability

[...]

Pierre Cardaliaguet¹, Marc Quincampoix¹, Patrick Saint-Pierre¹•Institutions (1)

Paris Dauphine University¹

01 Jul 1997-Applied Mathematics and Optimization

TL;DR: New results using viability theory are provided that allow us to study optimal time functions free from the controllability assumptions classically made in the partial differential equations approach.

...read moreread less

Abstract: We study optimal times to reach a given closed target for controlled systems with a state constraint. Our goal is to characterize these optimal time functions in such a way that it is possible to compute them numerically and we do not need to compute trajectories of the controlled system. In this paper we provide new results using viability theory. This allows us to study optimal time functions free from the controllability assumptions classically made in the partial differential equations approach.

...read moreread less

113 citations

Journal Article•DOI•

SPLINE APPROXIMATIONS TO VALUE FUNCTIONS: Linear Programming Approach

[...]

Michael A. Trick¹, Stanley E. Zin²•Institutions (2)

Carnegie Mellon University¹, National Bureau of Economic Research²

01 Jan 1997-Macroeconomic Dynamics

TL;DR: The properties of algorithms that characterize the solution of the Bellman equation of a stochastic dynamic program as the solution to a linear program are reviewed, and it is shown that fitting this approximation through linear programming provides upper and lower bounds on the problem.

...read moreread less

Abstract: We review the properties of algorithms that characterize the solution of the Bellman equation of a stochastic dynamic program, as the solution to a linear program. The variables in this problem are the ordinates of the value function; hence, the number of variables grows with the state space. For situations in which this size becomes computationally burdensome, we suggest the use of low-dimensional cubic-spline approximations to the value function. We show that fitting this approximation through linear programming provides upper and lower bounds on the solution to the original large problem. The information contained in these bounds leads to inexpensive improvements in the accuracy of approximate solutions.

...read moreread less

Proceedings Article•

Incremental methods for computing bounds in partially observable Markov decision processes

[...]

Milos Hauskrecht¹•Institutions (1)

Massachusetts Institute of Technology¹

27 Jul 1997

TL;DR: Novel incremental versions of grid-based linear interpolation method and simple lower bound method with Sondik's updates are introduced and a new method for computing an initial upper bound - the fast informed bound method is introduced.

...read moreread less

Abstract: Partially observable Markov decision processes (POMDPs) allow one to model complex dynamic decision or control problems that include both action outcome uncertainty and imperfect observability. The control problem is formulated as a dynamic optimization problem with a value function combining costs or rewards from multiple steps. In this paper we propose, analyse and test various incremental methods for computing bounds on the value function for control problems with infinite discounted horizon criteria. The methods described and tested include novel incremental versions of grid-based linear interpolation method and simple lower bound method with Sondik's updates. Both of these can work with arbitrary points of the belief space and can be enhanced by various heuristic point selection strategies. Also introduced is a new method for computing an initial upper bound - the fast informed bound method. This method is able to improve significantly on the standard and commonly used upper bound computed by the MDP-based method. The quality of resulting bounds are tested on a maze navigation problem with 20 states, 6 actions and 8 observations.

...read moreread less

Journal Article•DOI•

Optimal production planning in a stochastic manufacturing system with long-run average cost

[...]

Suresh Sethi¹, Wulin Suo¹, Michael Taksar², Qihu Zhang³•Institutions (3)

University of Toronto¹, Stony Brook University², University of Georgia³

01 Jan 1997-Journal of Optimization Theory and Applications

TL;DR: In this paper, the authors studied the optimal production planning in a dynamic stochastic manufacturing system consisting of a single machine that is failure prone and facing a constant demand, and the objective was to choose the rate of production over time in order to minimize the long-run average cost of production and surplus.

...read moreread less

Abstract: This paper is concerned with the optimal production planning in a dynamic stochastic manufacturing system consisting of a single machine that is failure prone and facing a constant demand. The objective is to choose the rate of production over time in order to minimize the long-run average cost of production and surplus. The analysis proceeds with a study of the corresponding problem with a discounted cost. It is shown using the vanishing discount approach that the Hamilton–Jacobi–Bellman equation for the average cost problem has a solution giving rise to the minimal average cost and the so-called potential function. The result helps in establishing a verification theorem. Finally, the optimal control policy is specified in terms of the potential function.

...read moreread less

Dissertation•

Planning and control in stochastic domains with imperfect information

[...]

Milos Hauskrecht, Peter Szolovits

01 Jan 1997

TL;DR: Experimental results show that methods that preserve the shape of the value function over updates, such as the newly designed incremental linear vector and fast informed bound methods, tend to outperform other methods on the control performance test.

...read moreread less

Abstract: Partially observable Markov decision processes (POMDPs) can be used to model complex control problems that include both action outcome uncertainty and imperfect observability. A control problem within the POMDP framework is expressed as a dynamic optimization problem with a value function that combines costs or rewards from multiple steps. Although the POMDP framework is more expressive than other simpler frameworks, like Markov decision processes (MDP), its associated optimization methods are more demanding computationally and only very small problems can be solved exactly in practice. The thesis focuses on two possible approaches that can he used to solve larger problems: approximation methods and exploitation of additional problem structure. First, a number of new efficient approximation methods and improvements of existing algorithms are proposed. These include (1) the fast informed bound method based on approximate dynamic programming updates that lead to piecewise linear and convex value functions with a constant number of linear vectors, (2) a grid-based point interpolation method that supports variable grids, (3) an incremental version of the linear vector method that updates value function derivatives, as well as (4) various heuristics for selecting grid-points. The new and existing methods are experimentally tested and compared on a set of three infinite discounted horizon problems of different complexity. The experimental results show that methods that preserve the shape of the value function over updates, such as the newly designed incremental linear vector and fast informed bound methods, tend to outperform other methods on the control performance test. Second, the thesis presents a number of techniques for exploiting additional structure in the model of complex control problems. These are studied as applied to a medical therapy planning problem--the management of patients with chronic ischemic heart disease. The new extensions proposed include factored and hierarchically structured models that combine the advantages of the POMDP and MDP frameworks and cut down the size and complexity of the information state space.

...read moreread less

Journal Article•DOI•

Average cost optimality in inventory models with Markovian demands

[...]

Dirk Beyer¹, Suresh Sethi¹•Institutions (1)

University of Toronto¹

01 Mar 1997-Journal of Optimization Theory and Applications

TL;DR: In this paper, the authors studied the long run average cost minimization of a stochastic inventory problem with Markovian demand, fixed ordering cost, and convex surplus cost.

...read moreread less

Abstract: This paper is concerned with long-run average cost minimization of a stochastic inventory problem with Markovian demand, fixed ordering cost, and convex surplus cost. The states of the Markov chain represent different possible states of the environment. Using a vanishing discount approach, a dynamic programming equation and the corresponding verification theorem are established. Finally, the existence of an optimal state-dependent (s, S) policy is proved.

...read moreread less

Journal Article•DOI•

Optimal Impulse Control When Control Actions Have Random Consequences

[...]

Ralf Korn¹•Institutions (1)

University of Mainz¹

01 Aug 1997-Mathematics of Operations Research

TL;DR: A generalised impulse control model for controlling a process governed by a stochastic differential equation and optimality results relating the value function to quasi-variational inequalities and a formal optimal stopping problem are state.

...read moreread less

Abstract: We consider a generalised impulse control model for controlling a process governed by a stochastic differential equation. The controller can only choose a parameter of the probability distribution of the consequence of his control action which is therefore random. We state optimality results relating the value function to quasi-variational inequalities and a formal optimal stopping problem. We also remark that the value function is a viscosity solution of the quasi-variational inequalities which could lead to developments and convergence proofs of numerical schemes. Further, we give some explicit examples and an application in financial mathematics, the optimal control of the exchange rate.

...read moreread less

Journal Article•DOI•

Ergodic problem for the Hamilton-Jacobi-Bellman equation. I. Existence of the ergodic attractor

[...]

Mariko Arisawa¹•Institutions (1)

Paris Dauphine University¹

01 Jan 1997-Annales De L Institut Henri Poincare-analyse Non Lineaire

TL;DR: The existence of the ergodic attractor is shown in Theorems 1 and 2 in this paper, and the existence of qualitative properties exist behind the convergence of the terms λuλ(x), ≠ u(x,T) in the Hamilton-Jacobi-Bellman equations (HJBs) as λ tends to + 0, T goes to +∞, to the unique number.

...read moreread less

Abstract: The problem of the convergence of the terms λuλ(x), ≠ u(x,T) in the Hamilton-Jacobi-Bellman equations (HJBs) as λ tends to +0, T goes to +∞, to the unique number is called the ergodic problem of the HJBs. We show in this paper what kind of qualitative properties exist behind this kind of convergence. The existence of the ergodic attractor is shown in Theorems 1 and 2. Our solutions of HJBs satisfy the equations in the viscosity solutions sense.

...read moreread less

Book Chapter•DOI•

The Optimal Set and Optimal Partition Approach to Linear and Quadratic Programming

[...]

Arjan Berkelaar, Kees Roos, Tamás Terlaky

01 Jan 1997

TL;DR: In this article, the optimal set approach was used for sensitivity analysis for linear programming and it was shown that optimal partitions and optimal sets remain constant between two consecutive transition points of the optimal value function.

...read moreread less

Abstract: In this chapter we describe the optimal set approach for sensitivity analysis for LP We show that optimal partitions and optimal sets remain constant between two consecutive transition-points of the optimal value function The advantage of using this approach instead of the classical approach (using optimal bases) is shown Moreover, we present an algorithm to compute the partitions, optimal sets and the optimal value function This is a new algorithm and uses primal and dual optimal solutions We also extend some of the results to parametric quadratic programming, and discuss differences and resemblances with the linear programming case

...read moreread less

Journal Article•DOI•

On the Bellman equation for some unbounded control problems

[...]

Martino Bardi¹, Francesca Da Lio¹•Institutions (1)

University of Padua¹

01 Oct 1997-Nodea-nonlinear Differential Equations and Applications

TL;DR: In this article, the authors consider a class of finite horizon optimal control problems with unbounded data for nonlinear systems, which includes the Linear-Quadratic (LQ) problem.

...read moreread less

Abstract: We consider a class of finite horizon optimal control problems with unbounded data for nonlinear systems, which includes the Linear-Quadratic (LQ) problem. We give comparison results between the value function and viscosity sub- and supersolutions of the Bellman equation, and prove uniqueness for this equation among locally Lipschitz functions bounded below. As an application we show that an optimal control for the LQ problem is nearly optimal for a large class of small unbounded nonlinear and non-quadratic perturbations of the same problem.

...read moreread less

Journal Article•DOI•

Lipschitz continuity of the value function in optimal control

[...]

Vladimir M. Veliov

01 Aug 1997-Journal of Optimization Theory and Applications

TL;DR: In this paper, a necessary and sufficient condition for local Lipschitz continuity of the optimal value as a function of the initial position is given for optimal control problems in an arbitrary closed set, where the dynamics can depend in a measurable way on the time.

...read moreread less

Abstract: For optimal control problems in $$\mathbb{R}^n $$ with given target and free final time, we obtain a necessary and sufficient condition for local Lipschitz continuity of the optimal value as a function of the initial position. The target can be an arbitrary closed set, and the dynamics can depend in a measurable way on the time. As a limit case of this condition, we obtain a characterization of the viability property of the target, in terms of perpendiculars to the target instead of tangent cones. As an application, we analyze the convergence of certain discretization schemes for time-optimal problems.

...read moreread less

Journal Article•DOI•

Decomposable measures and nonlinear equations

[...]

Endre Pap¹•Institutions (1)

University of Novi Sad¹

01 Dec 1997-Fuzzy Sets and Systems

TL;DR: A unified method is given for solving some nonlinear differential (ordinary and partial), difference and optimization equations, using pseudo-superposition principle and pseudoLaplace transform.

...read moreread less

Journal Article•DOI•

How does the value function of a Markov decision process depend on the transition probabilities

[...]

Alfred Müller¹•Institutions (1)

Karlsruhe Institute of Technology¹

01 Nov 1997-Mathematics of Operations Research

TL;DR: It is shown that the optimal value function of an MDP is monotone with respect to appropriately defined stochastic order relations, and conditions for continuity withrespect to suitable probability metrics are found.

...read moreread less

Abstract: The present work deals with the comparison of discrete time Markov decision processes MDPs, which differ only in their transition probabilities. We show that the optimal value function of an MDP is monotone with respect to appropriately defined stochastic order relations. We also find conditions for continuity with respect to suitable probability metrics. The results are applied to some well-known examples, including inventory control and optimal stopping.

...read moreread less

Journal Article•DOI•

Optimal Strategies For Bilevel Dynamic Problems

[...]

Jane J. Ye

01 Mar 1997-Siam Journal on Control and Optimization

TL;DR: In this article, the authors study the bilevel dynamic problem, which is a hierarchy of two dynamic optimization problems, where the constraint region of the upper level problem is determined implicitly by the solutions to the lower level optimal control problem.

...read moreread less

Abstract: In this paper we study the bilevel dynamic problem, which is a hierarchy of two dynamic optimization problems, where the constraint region of the upper level problem is determined implicitly by the solutions to the lower level optimal control problem. To obtain optimality conditions, we reformulate the bilevel dynamic problem as a single level optimal control problem that involves the value function of the lower-level problem. Sensitivity analysis of the lower-level problem with respect to the perturbation in the upper-level decision variable is given and first-order necessary optimality conditions are derived by using nonsmooth analysis. A constraint qualification of calmness type and a sufficient condition for the calmness are also given.

...read moreread less

Journal Article•DOI•

Stochastic Verification Theorems within the Framework of Viscosity Solutions

[...]

Xun Yu Zhou, Jiongmin Yong, Xunjing Li

01 Jan 1997-Siam Journal on Control and Optimization

TL;DR: In this article, a new verification theorem is derived within the framework of viscosity solutions without involving any derivatives of the value functions, which is shown to have wider applicability than the restrictive classical verification theorems, which require the associated dynamic programming equations to have smooth solutions.

...read moreread less

Abstract: This paper studies controlled systems governed by Ito's stochastic differential equations in which control variables are allowed to enter both drift and diffusion terms. A new verification theorem is derived within the framework of viscosity solutions without involving any derivatives of the value functions. This theorem is shown to have wider applicability than the restrictive classical verification theorems, which require the associated dynamic programming equations to have smooth solutions. Based on the new verification result, optimal stochastic feedback controls are obtained by maximizing the generalized Hamiltonians over both the control regions and the superdifferentials of the value functions.

...read moreread less

Proceedings Article•

Reinforcement Learning for Continuous Stochastic Control Problems

[...]

Rémi Munos¹, Paul Bourgine²•Institutions (2)

Local Initiatives Support Corporation¹, École Normale Supérieure²

01 Dec 1997

TL;DR: A RL algorithm is proposed based on a Finite-Difference method and proved to convergence to the optimal solution of the Hamilton-Jacobi-Bellman equation.

...read moreread less

Abstract: This paper is concerned with the problem of Reinforcement Learning (RL) for continuous state space and time stochastic control problems. We state the Hamilton-Jacobi-Bellman equation satisfied by the value function and use a Finite-Difference method for designing a convergent approximation scheme. Then we propose a RL algorithm based on this scheme and prove its convergence to the optimal solution.

...read moreread less

Journal Article•DOI•

First-order necessary optimality conditions for general bilevel programming problems

[...]

Stephan Dempe¹•Institutions (1)

Freiberg University of Mining and Technology¹

01 Dec 1997-Journal of Optimization Theory and Applications

TL;DR: In this paper, it was shown that the assumption that the abnormal part of the directional derivative of the optimal value function reduces to zero has to be replaced by the demand that a nonzero abnormal Lagrange multiplier does not exist.

...read moreread less

Abstract: In Ref. 1, bilevel programming problems have been investigated using an equivalent formulation by use of the optimal value function of the lower level problem. In this comment, it is shown that Ref. 1 contains two incorrect results: in Proposition 2.1, upper semicontinuity instead of lower semicontinuity has to be used for guaranteeing existence of optimal solutions; in Theorem 5.1, the assumption that the abnormal part of the directional derivative of the optimal value function reduces to zero has to be replaced by the demand that a nonzero abnormal Lagrange multiplier does not exist.

...read moreread less

Linear quadratic regulator problem with positive controls

[...]

Wpmh Maurice Heemels, van Sjl Stef Eijndhoven, Anton A. Stoorvogel

01 Jan 1997

TL;DR: In this article, a numerical algorithm for the computation of the optimal control for the linear quadratic regulator problem with a positivity constraint on the admissible control set is presented, and sufficient conditions for optimality are presented in terms of inner products, projections on closed convex sets, Pontryagin's maximum principle and dynamic programming.

...read moreread less

Abstract: In this paper, the Linear Quadratic Regulator Problem with a positivity constraint on the admissible control set is addressed. Necessary and sufficient conditions for optimality are presented in terms of inner products, projections on closed convex sets, Pontryagin's maximum principle and dynamic programming. The main results are concerned with smoothness of the optimal control and the value function. The maximum principle will be extended to the infinite horizon case. Based on these analytical methods, we propose a numerical algorithm for the computation of the optimal controls for the finite and infinite horizon problem. The numerical methods will be justified by convergence properties between the finite and infinite horizon case on one side and discretized optimal controls and the true optimal control on the other.

...read moreread less

Proceedings Article•

Adaptive Choice of Grid and Time in Reinforcement Learning

[...]

Stephan Pareigis¹•Institutions (1)

University of Kiel¹

01 Dec 1997

TL;DR: It is demonstrated, that an optimal ratio of time to space discretization is crucial for optimal learning rates and accuracy of the approximate optimal value function.

...read moreread less

Abstract: We propose local error estimates together with algorithms for adaptive a-posteriori grid and time refinement in reinforcement learning. We consider a deterministic system with continuous state and time with infinite horizon discounted cost functional. For grid refinement we follow the procedure of numerical methods for the Bellman-equation. For time refinement we propose a new criterion, based on consistency estimates of discrete solutions of the Bellman-equation. We demonstrate, that an optimal ratio of time to space discretization is crucial for optimal learning rates and accuracy of the approximate optimal value function.

...read moreread less

Journal Article•DOI•

Infinite-dimensional Hamilton-Jacobi-Bellman equations in Gauss-Sobolev spaces

[...]

Pao-Liu Chow¹, Jose Luis Menaldi¹•Institutions (1)

Wayne State University¹

01 Aug 1997-Nonlinear Analysis-theory Methods & Applications

TL;DR: In this article, the strong solution of a semi-linear HJB equation with stochastic optimal control in a Hilbert space H is considered, where strong solution is defined as a solution in a L2(μ,H)-Sobolev space.

...read moreread less

Abstract: We consider the strong solution of a semi linear HJB equation associated with a stochastic optimal control in a Hilbert space H. By strong solution we mean a solution in a L2(μ,H)-Sobolev space setting. Within this framework, the present problem can be treated in a similar fashion to that of a finite-dimensional case. Of independent interest, a related linear problem with unbounded coefficient is studied and an application to the stochastic control of a reaction-diffusion equation will be given.

...read moreread less

Journal Article•DOI•

Applications of Non-Order-Preserving Path Selection of Hazmat Routing

[...]

David A. Nembhard, Chelsea C. White

01 Aug 1997-Transportation Science

TL;DR: This paper considers the problem of determining a path that maximizes a multi-attribute, non-order-preserving value function and uses the best-first search algorithm BU* to determine optimal routes for both the q = 0 and q cases.

...read moreread less

Abstract: In this paper, we consider the problem of determining a path that maximizes a multi-attribute, non-order-preserving value function. The motivating application is the determination of a most preferred path for transporting hazardous materials based on transportation cost and risk to population. A sub-path of an optimal path may not be optimal for a non-order-preserving value function, implying that a traditional application of dynamic programming may intentionally or unintentionally produce sub-optimal paths. We consider two approximation procedures for two general cases, the q = 0 case and the q > 0 case, where q is the number of required intermediate stops between origin and destination. The first approximation procedure involves applying dynamic programming as if a sub-path of an optimal path were always optimal. The second approximation procedure involves determining a linear order-preserving criterion that approximates the non-order-preserving value function and then applying dynamic programming. We u...

...read moreread less