scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Two neural networks, namely, actor NN and critic NN, are tuned online and simultaneously to generate the optimal bounded control policy to encode the input constraints into the optimization problem using a nonquadratic performance function.
Abstract: This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input constraints. The tracking error dynamics and reference trajectory dynamics are first combined to form an augmented system. Then, a new discounted performance function based on the augmented system is presented for the optimal nonlinear tracking problem. In contrast to the standard solution, which finds the feedforward and feedback terms of the control input separately, the minimization of the proposed discounted performance function gives both feedback and feedforward parts of the control input simultaneously. This enables us to encode the input constraints into the optimization problem using a nonquadratic performance function. The DT tracking Bellman equation and tracking Hamilton-Jacobi-Bellman (HJB) are derived. An actor-critic-based reinforcement learning algorithm is used to learn the solution to the tracking HJB equation online without requiring knowledge of the system drift dynamics. That is, two neural networks (NNs), namely, actor NN and critic NN, are tuned online and simultaneously to generate the optimal bounded control policy. A simulation example is given to show the effectiveness of the proposed method.

242 citations

Journal ArticleDOI
TL;DR: A weak version of the dynamic programming principle is proved for standard stochastic control problems and mixed control-stopping problems, which avoids the technical difficulties related to the measurable selection argument.
Abstract: We prove a weak version of the dynamic programming principle for standard stochastic control problems and mixed control-stopping problems, which avoids the technical difficulties related to the measurable selection argument. In the Markov case, our result is tailor-made for the derivation of the dynamic programming equation in the sense of viscosity solutions.

242 citations

Journal ArticleDOI
TL;DR: This paper studies an optimal stopping time problem for pricing perpetual American put options in a regime switching model using the ``modified smooth fit' technique" and obtains an explicit optimal stopping rule and the corresponding value function in a closed form.
Abstract: This paper studies an optimal stopping time problem for pricing perpetual American put options in a regime switching model An explicit optimal stopping rule and the corresponding value function in a closed form are obtained using the ``modified smooth fit' technique The solution is then compared with the numerical results obtained via a dynamic programming approach and also with a two-point boundary-value differential equation (TPBVDE) method

239 citations

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of finding a near-optimal policy in a continuous space, discounted Markovian Decision Problem (MDP) by employing value-function-based methods when only a single trajectory of a fixed policy is available as the input.
Abstract: In this paper we consider the problem of finding a near-optimal policy in a continuous space, discounted Markovian Decision Problem (MDP) by employing value-function-based methods when only a single trajectory of a fixed policy is available as the input. We study a policy-iteration algorithm where the iterates are obtained via empirical risk minimization with a risk function that penalizes high magnitudes of the Bellman-residual. Our main result is a finite-sample, high-probability bound on the performance of the computed policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept (the VC-crossing dimension), the approximation power of the function set and the controllability properties of the MDP. Moreover, we prove that when a linear parameterization is used the new algorithm is equivalent to Least-Squares Policy Iteration. To the best of our knowledge this is the first theoretical result for off-policy control learning over continuous state-spaces using a single trajectory.

231 citations

Book ChapterDOI
01 Jan 1999
TL;DR: In this article, theoretical and numerical results for solving qualitative and quantitative control and differential game problems are treated in the framework of set-valued analysis and viability theory, which is rather well adapted to look at these several problems with a unified point of view.
Abstract: This chapter deals with theoretical and numerical results for solving qualitative and quantitative control and differential game problems. These questions are treated in the framework of set-valued analysis and viability theory. In a way, this approach is rather well adapted to look at these several problems with a unified point of view. The idea is to characterize the value function as a viability kernel instead of solving a Hamilton—Jacobi—Bellmann equation. This allows us to easily take into account state constraints without any controllability assumptions on the dynamic, neither at the boundary of targets, nor at the boundary of the constraint set. In the case of two-player differential games, the value function is characterized as a discriminating kernel. This allows dealing with a large class of systems with minimal regularity and convexity assumptions. Rigorous proofs of the convergence, including irregular cases, and completely explicit algorithms are provided.

229 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353