scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper studies a class of continuous-time stochastic control problems which are time-inconsistent in the sense that they do not admit a Bellman optimality principle, and derives an extension of the standard Hamilton–Jacobi–Bellman equation in the form of a system of nonlinear equations for the determination of the equilibrium strategy as well as the equilibrium value function.
Abstract: In this paper, which is a continuation of the discrete-time paper (Bjork and Murgoci in Finance Stoch. 18:545–592, 2004), we study a class of continuous-time stochastic control problems which, in various ways, are time-inconsistent in the sense that they do not admit a Bellman optimality principle. We study these problems within a game-theoretic framework, and we look for Nash subgame perfect equilibrium points. For a general controlled continuous-time Markov process and a fairly general objective functional, we derive an extension of the standard Hamilton–Jacobi–Bellman equation, in the form of a system of nonlinear equations, for the determination of the equilibrium strategy as well as the equilibrium value function. The main theoretical result is a verification theorem. As an application of the general theory, we study a time-inconsistent linear-quadratic regulator. We also present a study of time-inconsistency within the framework of a general equilibrium production economy of Cox–Ingersoll–Ross type (Cox et al. in Econometrica 53:363–384, 1985).

252 citations

Journal ArticleDOI
TL;DR: In this article, the authors studied the connections between deterministic exit time control problems and possibly discontinuous viscosity solutions of a first-order Hamilton-Jacobi (HJ) equation up to the boundary.
Abstract: The authors study the connections between deterministic exit time control problems and possibly discontinuous viscosity solutions of a first-order Hamilton-Jacobi (HJ) equation up to the boundary. This equation admits a maximum and a minimum solution that are the value functions associated to stopping time problems on the boundary. When these solutions are equal, they can be obtained through the vanishing viscosity method. Finally, when the HJ equation has a continuous solution, it is proved to be the value function for the first exit time of the domain. It is also the vanishing viscosity limit arising, in particular, in some large deviations problems.

251 citations

Journal ArticleDOI
TL;DR: In this paper, a discrete-time controlled Markov process subject to a time-average cost constraint is maximized over the class of al causal policies by using a Lagrange multiplier formulation involving the dynamic programming equation.

249 citations

Proceedings Article
07 Dec 2009
TL;DR: This work presents a Bellman error objective function and two gradient-descent TD algorithms that optimize it, and proves the asymptotic almost-sure convergence of both algorithms, for any finite Markov decision process and any smooth value function approximator, to a locally optimal solution.
Abstract: We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximation, can cause these algorithms to become unstable (i.e., the parameters of the approximator may diverge). Sutton et al. (2009a, 2009b) solved the problem of off-policy learning with linear TD algorithms by introducing a new objective function, related to the Bellman error, and algorithms that perform stochastic gradient-descent on this function. These methods can be viewed as natural generalizations to previous TD methods, as they converge to the same limit points when used with linear function approximation methods. We generalize this work to nonlinear function approximation. We present a Bellman error objective function and two gradient-descent TD algorithms that optimize it. We prove the asymptotic almost-sure convergence of both algorithms, for any finite Markov decision process and any smooth value function approximator, to a locally optimal solution. The algorithms are incremental and the computational complexity per time step scales linearly with the number of parameters of the approximator. Empirical results obtained in the game of Go demonstrate the algorithms' effectiveness.

249 citations

Book
27 Sep 2012
TL;DR: In this paper, the authors propose a method for solving control problems by verification, which is based on the Viscosity Solution Equation (VSP) in the sense of VVS.
Abstract: Preface.- 1. Conditional Expectation and Linear Parabolic PDEs.- 2. Stochastic Control and Dynamic Programming.- 3. Optimal Stopping and Dynamic Programming.- 4. Solving Control Problems by Verification.- 5. Introduction to Viscosity Solutions.- 6. Dynamic Programming Equation in the Viscosity Sense.- 7. Stochastic Target Problems.- 8. Second Order Stochastic Target Problems.- 9. Backward SDEs and Stochastic Control.- 10. Quadratic Backward SDEs.- 11. Probabilistic Numerical Methods for Nonlinear PDEs.- 12. Introduction to Finite Differences Methods.- References.

244 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353