Topic
Hamilton–Jacobi–Bellman equation
About: Hamilton–Jacobi–Bellman equation is a research topic. Over the lifetime, 2802 publications have been published within this topic receiving 50916 citations. The topic is also known as: HJB equation & Hamilton–Jacobi equation.
Papers published on a yearly basis
Papers
More filters
Book•
[...]
01 Jan 1999
TL;DR: In this article, the authors consider the problem of deterministic control problems in the context of stochastic control systems and show that the optimal control problem can be formulated in a deterministic manner.
Abstract: 1. Basic Stochastic Calculus.- 1. Probability.- 1.1. Probability spaces.- 1.2. Random variables.- 1.3. Conditional expectation.- 1.4. Convergence of probabilities.- 2. Stochastic Processes.- 2.1. General considerations.- 2.2. Brownian motions.- 3. Stopping Times.- 4. Martingales.- 5. Ito's Integral.- 5.1. Nondifferentiability of Brownian motion.- 5.2. Definition of Ito's integral and basic properties.- 5.3. Ito's formula.- 5.4. Martingale representation theorems.- 6. Stochastic Differential Equations.- 6.1. Strong solutions.- 6.2. Weak solutions.- 6.3. Linear SDEs.- 6.4. Other types of SDEs.- 2. Stochastic Optimal Control Problems.- 1. Introduction.- 2. Deterministic Cases Revisited.- 3. Examples of Stochastic Control Problems.- 3.1. Production planning.- 3.2. Investment vs. consumption.- 3.3. Reinsurance and dividend management.- 3.4. Technology diffusion.- 3.5. Queueing systems in heavy traffic.- 4. Formulations of Stochastic Optimal Control Problems.- 4.1. Strong formulation.- 4.2. Weak formulation.- 5. Existence of Optimal Controls.- 5.1. A deterministic result.- 5.2. Existence under strong formulation.- 5.3. Existence under weak formulation.- 6. Reachable Sets of Stochastic Control Systems.- 6.1. Nonconvexity of the reachable sets.- 6.2. Noncloseness of the reachable sets.- 7. Other Stochastic Control Models.- 7.1. Random duration.- 7.2. Optimal stopping.- 7.3. Singular and impulse controls.- 7.4. Risk-sensitive controls.- 7.5. Ergodic controls.- 7.6. Partially observable systems.- 8. Historical Remarks.- 3. Maximum Principle and Stochastic Hamiltonian Systems.- 1. Introduction.- 2. The Deterministic Case Revisited.- 3. Statement of the Stochastic Maximum Principle.- 3.1. Adjoint equations.- 3.2. The maximum principle and stochastic Hamiltonian systems.- 3.3. A worked-out example.- 4. A Proof of the Maximum Principle.- 4.1. A moment estimate.- 4.2. Taylor expansions.- 4.3. Duality analysis and completion of the proof.- 5. Sufficient Conditions of Optimality.- 6. Problems with State Constraints.- 6.1. Formulation of the problem and the maximum principle.- 6.2. Some preliminary lemmas.- 6.3. A proof of Theorem 6.1.- 7. Historical Remarks.- 4. Dynamic Programming and HJB Equations.- 1. Introduction.- 2. The Deterministic Case Revisited.- 3. The Stochastic Principle of Optimality and the HJB Equation.- 3.1. A stochastic framework for dynamic programming.- 3.2. Principle of optimality.- 3.3. The HJB equation.- 4. Other Properties of the Value Function.- 4.1. Continuous dependence on parameters.- 4.2. Semiconcavity.- 5. Viscosity Solutions.- 5.1. Definitions.- 5.2. Some properties.- 6. Uniqueness of Viscosity Solutions.- 6.1. A uniqueness theorem.- 6.2. Proofs of Lemmas 6.6 and 6.7.- 7. Historical Remarks.- 5. The Relationship Between the Maximum Principle and Dynamic Programming.- 1. Introduction.- 2. Classical Hamilton-Jacobi Theory.- 3. Relationship for Deterministic Systems.- 3.1. Adjoint variable and value function: Smooth case.- 3.2. Economic interpretation.- 3.3. Methods of characteristics and the Feynman-Kac formula.- 3.4. Adjoint variable and value function: Nonsmooth case.- 3.5. Verification theorems.- 4. Relationship for Stochastic Systems.- 4.1. Smooth case.- 4.2. Nonsmooth case: Differentials in the spatial variable.- 4.3. Nonsmooth case: Differentials in the time variable.- 5. Stochastic Verification Theorems.- 5.1. Smooth case.- 5.2. Nonsmooth case.- 6. Optimal Feedback Controls.- 7. Historical Remarks.- 6. Linear Quadratic Optimal Control Problems.- 1. Introduction.- 2. The Deterministic LQ Problems Revisited.- 2.1. Formulation.- 2.2. A minimization problem of a quadratic functional.- 2.3. A linear Hamiltonian system.- 2.4. The Riccati equation and feedback optimal control.- 3. Formulation of Stochastic LQ Problems.- 3.1. Statement of the problems.- 3.2. Examples.- 4. Finiteness and Solvability.- 5. A Necessary Condition and a Hamiltonian System.- 6. Stochastic Riccati Equations.- 7. Global Solvability of Stochastic Riccati Equations.- 7.1. Existence: The standard case.- 7.2. Existence: The case C = 0, S = 0, and Q, G ?0.- 7.3. Existence: The one-dimensional case.- 8. A Mean-variance Portfolio Selection Problem.- 9. Historical Remarks.- 7. Backward Stochastic Differential Equations.- 1. Introduction.- 2. Linear Backward Stochastic Differential Equations.- 3. Nonlinear Backward Stochastic Differential Equations.- 3.1. BSDEs in finite deterministic durations: Method of contraction mapping.- 3.2. BSDEs in random durations: Method of continuation.- 4. Feynman-Kac-Type Formulae.- 4.1. Representation via SDEs.- 4.2. Representation via BSDEs.- 5. Forward-Backward Stochastic Differential Equations.- 5.1. General formulation and nonsolvability.- 5.2. The four-step scheme, a heuristic derivation.- 5.3. Several solvable classes of FBSDEs.- 6. Option Pricing Problems.- 6.1. European call options and the Black--Scholes formula.- 6.2. Other options.- 7. Historical Remarks.- References.
2,192 citations
[...]
TL;DR: The McKean-Vlasov NCE method presented in this paper has a close connection with the statistical physics of large particle systems: both identify a consistency relationship between the individual agent at the microscopic level and the mass of individuals at the macroscopic level.
Abstract: We consider stochastic dynamic games in large population conditions where multiclass agents are weakly coupled via their individual dynamics and costs. We approach this large population game problem by the so-called Nash Certainty Equivalence (NCE) Principle which leads to a decentralized control synthesis. The McKean-Vlasov NCE method presented in this paper has a close connection with the statistical physics of large particle systems: both identify a consistency relationship between the individual agent (or particle) at the microscopic level and the mass of individuals (or particles) at the macroscopic level. The overall game is decomposed into (i) an optimal control problem whose Hamilton-Jacobi-Bellman (HJB) equation determines the optimal control for each individual and which involves a measure corresponding to the mass effect, and (ii) a family of McKean-Vlasov (M-V) equations which also depend upon this measure. We designate the NCE Principle as the property that the resulting scheme is consistent (or soluble), i.e. the prescribed control laws produce sample paths which produce the mass effect measure. By construction, the overall closed-loop behaviour is such that each agent’s behaviour is optimal with respect to all other agents in the game theoretic Nash sense.
1,002 citations
[...]
TL;DR: A reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action based on the Hamilton-Jacobi-Bellman (HJB) equation for infinite-horizon, discounted reward problems is presented and algorithms for estimating value functions and improving policies with the use of function approximators are derived.
Abstract: This article presents a reinforcement learning framework for continuous time dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman(HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improving policies with the use of function approximators. The processof value function estimation is formulated asthe minimization of a continuous-time form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived, and their correspondences with the conventional residual gradient, TD(0), and TD(lambda) algorithms are shown. For policy improvement, two methods—a continuous actor-critic method and a value-gradient-based greedy policy—are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived. The advantage updating, a model-free algorithm derived previously, is also formulated in the HJBbased framework.The performance of the proposed algorithms is first tested in a nonlinear control task of swinging a pendulum up with limited torque. It is shown in the simulations that (1) the task is accomplished by the continuous actor-critic method in a number of trials several times fewer than by the conventional discrete actor-critic method; (2) among the continuous policy update methods, the value-gradient-based policy with a known or learned dynamic model performs several times better than the actor-critic method; and (3) a value function update using exponential eligibility traces is more efficient and stable than that based on Euler approximation. The algorithms are then tested in a higher-dimensional task: cartpole swing-up. This task is accomplished in several hundred trials using the value-gradient-based policy with a learned dynamic model.
868 citations