scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The main advantage of the approach proposed is that it can be applied to a general class of target-hitting continuous dynamic games with nonlinear dynamics, and has very good properties in terms of its numerical solution, since the value function and the Hamiltonian of the system are both continuous.
Abstract: A new framework for formulating reachability problems with competing inputs, nonlinear dynamics, and state constraints as optimal control problems is developed. Such reach-avoid problems arise in, among others, the study of safety problems in hybrid systems. Earlier approaches to reach-avoid computations are either restricted to linear systems, or face numerical difficulties due to possible discontinuities in the Hamiltonian of the optimal control problem. The main advantage of the approach proposed in this paper is that it can be applied to a general class of target-hitting continuous dynamic games with nonlinear dynamics, and has very good properties in terms of its numerical solution, since the value function and the Hamiltonian of the system are both continuous. The performance of the proposed method is demonstrated by applying it to a case study, which involves the target-hitting problem of an underactuated underwater vehicle in the presence of obstacles.

193 citations

Journal ArticleDOI
TL;DR: A theory for a general class of discrete-time stochastic control problems that, in various ways, are time-inconsistent in the sense that they do not admit a Bellman optimality principle is developed.
Abstract: We develop a theory for a general class of discrete-time stochastic control problems that, in various ways, are time-inconsistent in the sense that they do not admit a Bellman optimality principle. We attack these problems by viewing them within a game theoretic framework, and we look for subgame perfect Nash equilibrium points. For a general controlled Markov process and a fairly general objective functional, we derive an extension of the standard Bellman equation, in the form of a system of nonlinear equations, for the determination of the equilibrium strategy as well as the equilibrium value function. Most known examples of time-inconsistent stochastic control problems in the literature are easily seen to be special cases of the present theory. We also prove that for every time-inconsistent problem, there exists an associated time-consistent problem such that the optimal control and the optimal value function for the consistent problem coincide with the equilibrium control and value function, respectively for the time-inconsistent problem. To exemplify the theory, we study some concrete examples, such as hyperbolic discounting and mean–variance control.

188 citations

Journal ArticleDOI
TL;DR: A broad class of stochastic dynamic programming problems that are amenable to relaxation via decomposition are considered, namely, Lagrangian relaxation and the linear programming (LP) approach to approximate dynamic programming.
Abstract: We consider a broad class of stochastic dynamic programming problems that are amenable to relaxation via decomposition. These problems comprise multiple subproblems that are independent of each other except for a collection of coupling constraints on the action space. We fit an additively separable value function approximation using two techniques, namely, Lagrangian relaxation and the linear programming (LP) approach to approximate dynamic programming. We prove various results comparing the relaxations to each other and to the optimal problem value. We also provide a column generation algorithm for solving the LP-based relaxation to any desired optimality tolerance, and we report on numerical experiments on bandit-like problems. Our results provide insight into the complexity versus quality trade-off when choosing which of these relaxations to implement.

187 citations

Journal ArticleDOI
TL;DR: This joint research aids in understanding the prediction problem as a whole and reveals additional requirements for seismostatistics and algorithmmic and exact solutions are indicated.
Abstract: In this review we consider an interdisciplinary problem of earthquake prediction involving economics. This joint research aids in understanding the prediction problem as a whole and reveals additional requirements for seismostatistics. We formulate the problem as an optimal control problem: Prossessing the possibility to declare several types of alerts, it is necessary to find an optimal changing alert types; each successful prediction prevents a certain amount of losses; total expected losses are integrated over the semi-infinite time interval. The discount factor is included in the model. Algorithmic and exact solutions are indicated. This paper is based on the recent results byMolchan (1990, 1991, 1992).

184 citations

Journal ArticleDOI
TL;DR: A powerful new theorem is presented that can provide a unified analysis of value-function-based reinforcement-learning algorithms and allows the convergence of a complex asynchronous reinforcement- learning algorithm to be proved by verifying that a simpler synchronous algorithm converges.
Abstract: Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

183 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353