scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, neural networks are used along with two-player policy iterations to solve for the feedback strategies of a continuous-time zero-sum game that appears in L2-gain optimal control, suboptimal Hinfin control, of nonlinear systems affine in input with the control policy having saturation constraints.
Abstract: In this paper, neural networks are used along with two-player policy iterations to solve for the feedback strategies of a continuous-time zero-sum game that appears in L2-gain optimal control, suboptimal Hinfin control, of nonlinear systems affine in input with the control policy having saturation constraints. The result is a closed-form representation, on a prescribed compact set chosen a priori, of the feedback strategies and the value function that solves the associated Hamilton-Jacobi-Isaacs (HJI) equation. The closed-loop stability, L2-gain disturbance attenuation of the neural network saturated control feedback strategy, and uniform convergence results are proven. Finally, this approach is applied to the rotational/translational actuator (RTAC) nonlinear benchmark problem under actuator saturation, offering guaranteed stability and disturbance attenuation.

173 citations

DOI
01 Jan 1989
TL;DR: The thesis develops methods to solve discrete-time finite-state partially observable Markov decision processes and proves that the policy improvement step in iterative discretization procedure can be replaced by the approximation version of linear support algorithm.
Abstract: The thesis develops methods to solve discrete-time finite-state partially observable Markov decision processes. For the infinite horizon problem, only discounted reward case is considered. For the finite horizon problem, two new algorithms are developed. The first algorithm is called the relaxed region algorithm. For each support in the value function, this algorithm determines a region not smaller than its support region and modifies it implicitly in later steps until the exact support region is found. The second algorithm, called linear support algorithm, systematically approximates the value function until all supports in the value function are found. The most important feature of this algorithm is that it can be modified to find an approximate value function. It has been shown that these two algorithms are more efficient than the one-pass algorithm. For the infinite horizon problem, it is first shown that the approximation version of linear support algorithm can be used to substitute the policy improvement step in a standard successive approximation method to obtain an $\epsilon$-optimal value function. Next, an iterative discretization procedure is developed which uses a small number of states to find new supports and improve the value function between two policy improvement steps. Since only a finite number of states are chosen in this process, some techniques developed for finite MDP can be applied here. Finally, we prove that the policy improvement step in iterative discretization procedure can be replaced by the approximation version of linear support algorithm. The last part of the thesis deals with problems with continuous signals. We first show that if the signal processes are uniformly distributed, then the problem can be reformulated as a problem with finite number of signals. Then the result is extended to where the signal processes are step functions. Since step functions can be easily used to approximate most of the probability distributions, this method can be used to approximate most of the problems with continuous signals. Finally, we present some conditions which guarantee that the linear support can be computed for any given state, then the methods developed for finite signal cases can be easily modified and applied to problems for which the conditions hold.

173 citations

Journal ArticleDOI
TL;DR: This article proposes a new approach for computing a semi-explicit form of the solution to a class of Hamilton-Jacobi (HJ) partial differential equations (PDEs), using control techniques based on viability theory.
Abstract: This article proposes a new approach for computing a semi-explicit form of the solution to a class of Hamilton-Jacobi (HJ) partial differential equations (PDEs), using control techniques based on viability theory. We characterize the epigraph of the value function solving the HJ PDE as a capture basin of a target through an auxiliary dynamical system, called ?characteristic system?. The properties of capture basins enable us to define components as building blocks of the solution to the HJ PDE in the Barron/Jensen-Frankowska sense. These components can encode initial conditions, boundary conditions, and internal ?boundary? conditions, which are the topic of this article. A generalized Lax-Hopf formula is derived, and enables us to formulate the necessary and sufficient conditions for a mixed initial and boundary conditions problem with multiple internal boundary conditions to be well posed. We illustrate the capabilities of the method with a data assimilation problem for reconstruction of highway traffic flow using Lagrangian measurements generated from Next Generation Simulation (NGSIM) traffic data.

172 citations

Journal ArticleDOI
TL;DR: In this paper, a new approach is presented for the problem of stochastic control of nonlinear systems, which takes into account the past observations and also the future observation program.
Abstract: A new approach is presented for the problem of stochastic control of nonlinear systems. It is well known that, except for the linear-quadratic problem, the optimal stochastic controller cannot be obtained in practice. In general it is the curse of dimensionality that makes the strict application of the principle of optimality infeasible. The two subproblems of stochastic control, estimation and control proper, are, except for the linear-quadratic case, intercoupled. As pointed out by Feldbaum, in addition to its effects on the state of the system, the control also affects the estimation performance. In this paper, the control problem is formulated such that this dual property of the control appears explicitly. The resulting control sequence exhibits the closed-loop property, i.e., it takes into account the past observations and also the future observation program. Thus, in addition to being adaptive, this control also plans its future learning according to the control objective. Some preliminary simulation results illustrate these properties of the control.

172 citations

Journal ArticleDOI
TL;DR: The nonsymmetric upper and lower bounds on the rate of convergence of general monotone approximation/numerical schemes for parabolic Hamilton-Jacobi-Bellman equations are obtained by introducing a new notion of consistency.
Abstract: . We obtain nonsymmetric upper and lower bounds on the rate of convergence of general monotone approximation/numerical schemes for parabolic Hamilton-Jacobi-Bellman equations by introducing a new notion of consistency. Our results are robust and general - they improve and extend earlier results by Krylov, Barles, and Jakobsen. We apply our general results to various schemes including Crank-Nicholson type finite difference schemes, splitting methods, and the classical approximation by piecewise constant controls. In the first two cases our results are new, and in the last two cases the results are obtained by a new method which we develop here.

171 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353