scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms and it is emphasized that new termination criteria are established to guarantee the effectiveness of the iteration control laws.
Abstract: In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.

324 citations

Journal ArticleDOI
TL;DR: In this article, the authors consider a classical risk model and allow investment into a risky asset modelled as a Black-Scholes model as well as (proportional) reinsurance.
Abstract: We consider a classical risk model and allow investment into a risky asset modelled as a Black--Scholes model as well as (proportional) reinsurance. Via the Hamilton--Jacobi--Bellman approach we find a candidate for the optimal strategy and develop a numerical procedure to solve the HJB equation. We prove a verification theorem in order to show that any increasing solution to the HJB equation is bounded and solves the optimisation problem. We prove that an increasing solution to the HJB equation exists. Finally two numerical examples are discussed.

321 citations

Journal ArticleDOI
01 Mar 1990
TL;DR: By representing value function separability in the structure of the graph of the influence diagram, formulation is simplified and operations on the model can take advantage of the separability, this allows simple exploitation in the value function of a decision problem.
Abstract: The concept of a super value node is developed to extend the theory of influence diagrams to allow dynamic programming to be performed within this graphical modeling framework. The operations necessary to exploit the presence of these nodes and efficiently analyze the models are developed. The key result is that by representing value function separability in the structure of the graph of the influence diagram, formulation is simplified and operations on the model can take advantage of the separability. From the decision analysis perspective, this allows simple exploitation of separability in the value function of a decision problem. This allows algorithms to be designed to solve influence diagrams that automatically recognize the opportunity for applying dynamic programming. From the decision processes perspective, influence diagrams with super value nodes allow efficient formulation and solution of nonstandard decision process structures. They also allow the exploitation of conditional independence between state variables. >

320 citations

Journal ArticleDOI
TL;DR: An online learning algorithm is developed to solve the linear quadratic tracking (LQT) problem for partially-unknown continuous-time systems and it is shown that the value function is Quadratic in terms of the state of the system and the command generator.
Abstract: In this technical note, an online learning algorithm is developed to solve the linear quadratic tracking (LQT) problem for partially-unknown continuous-time systems. It is shown that the value function is quadratic in terms of the state of the system and the command generator. Based on this quadratic form, an LQT Bellman equation and an LQT algebraic Riccati equation (ARE) are derived to solve the LQT problem. The integral reinforcement learning technique is used to find the solution to the LQT ARE online and without requiring the knowledge of the system drift dynamics or the command generator dynamics. The convergence of the proposed online algorithm to the optimal control solution is verified. To show the efficiency of the proposed approach, a simulation example is provided.

320 citations

Journal ArticleDOI
TL;DR: In this article, the authors develop a theory for stochastic control problems which are time inconsistent in the sense that they do not admit a Bellman optimality principle and attach these problems by viewing them within a game theoretic framework, and look for Nash subgame perfect equilibrium points.
Abstract: We develop a theory for stochastic control problems which, in various ways, are time inconsistent in the sense that they do not admit a Bellman optimality principle. We attach these problems by viewing them within a game theoretic framework, and we look for Nash subgame perfect equilibrium points. For a general controlled Markov process and a fairly general objective functional we derive an extension of the standard Hamilton-Jacobi-Bellman equation, in the form of a system of non-linear equations, for the determination for the equilibrium strategy as well as the equilibrium value function. All known examples of time inconsistency in the literature are easily seen to be special cases of the present theory. We also prove that for every time inconsistent problem, there exists an associated time consistent problem such that the optimal control and the optimal value function for the consistent problem coincides with the equilibrium control and value function respectively for the time inconsistent problem. We also study some concrete examples.

315 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353