scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper develops an off-policy reinforcement learning (RL) algorithm to solve optimal synchronization of multiagent systems by using the framework of graphical games and shows that the optimal distributed policies found by the proposed algorithm satisfy the global Nash equilibrium and synchronize all agents to the leader.
Abstract: This paper develops an off-policy reinforcement learning (RL) algorithm to solve optimal synchronization of multiagent systems. This is accomplished by using the framework of graphical games. In contrast to traditional control protocols, which require complete knowledge of agent dynamics, the proposed off-policy RL algorithm is a model-free approach, in that it solves the optimal synchronization problem without knowing any knowledge of the agent dynamics. A prescribed control policy, called behavior policy, is applied to each agent to generate and collect data for learning. An off-policy Bellman equation is derived for each agent to learn the value function for the policy under evaluation, called target policy, and find an improved policy, simultaneously. Actor and critic neural networks along with least-square approach are employed to approximate target control policies and value functions using the data generated by applying prescribed behavior policies. Finally, an off-policy RL algorithm is presented that is implemented in real time and gives the approximate optimal control policy for each agent using only measured data. It is shown that the optimal distributed policies found by the proposed algorithm satisfy the global Nash equilibrium and synchronize all agents to the leader. Simulation results illustrate the effectiveness of the proposed method.

136 citations

Journal ArticleDOI
TL;DR: In this paper, a stochastic optimal switching and impulse control problem in a finite horizon is studied, and the continuity of the value function, which is by no means trivial, is proved.
Abstract: A stochastic optimal switching and impulse control problem in a finite horizon is studied. The continuity of the value function, which is by no means trivial, is proved. The Bellman dynamic programming principle is shown to be valid for such a problem. Moroever, the value function is characterized as the unique viscosity solution of the corresponding Hamilton-Jacobi-Bellman equation.

135 citations

Journal ArticleDOI
TL;DR: In this paper, the optimal control process is constructed by solving the Skorokhod problem of reflecting the two-dimensional Brownian motion along a free boundary in the $ - abla V$ direction.
Abstract: It is desired to control a two-dimensional Brownian motion by adding a (possibly singularly) continuous process to it so as to minimize an expected infinite-horizon discounted running cost. The Hamilton–Jacobi–Bellman characterization of the value function V is a variational inequality which has a unique twice continuously differentiable solution. The optimal control process is constructed by solving the Skorokhod problem of reflecting the two-dimensional Brownian motion along a free boundary in the $ - abla V$ direction.

133 citations

Book ChapterDOI
01 Jan 1990
TL;DR: In this article, the authors present theory, applications, and computational methods for Markov Decision Processes (MDPs) and provide an optimality equation that characterizes the supremal value of the objective function, characterizing the form of an optimal policy, and developing efficient computational procedures for finding policies thatare optimal or close to optimal.
Abstract: Publisher Summary This chapter presents theory, applications, and computational methods for Markov Decision Processes (MDP's). MDP's are a class of stochastic sequential decision processes in which the cost and transition functions depend only on the current state of the system and the current action. These models have been applied in a wide range of subject areas, most notably in queueing and inventory control. A sequential decision process is a model for dynamic system under the control of a decision maker. Sequential decision processes are classified according to the times (epochs) at which decisions are made, the length of the decision making horizon, the mathematical properties of the state and action spaces, and the optimality criteria. The focus of this chapter is problems in which decisions are made periodically at discrete time points. The state and action sets are either finite, countable, compact or Borel; their characteristics determine the form of the reward and transition probability functions. The optimality criteria considered in the chapter include finite and infinite horizon expected total reward, infinite horizon expected total discounted reward, and average expected reward. The main objectives in analyzing sequential decision processes in general and MDP's in particular include (1) providing an optimality equation that characterizes the supremal value of the objective function, (2) characterizing the form of an optimal policy if it exists, (3) developing efficient computational procedures for finding policies thatare optimal or close to optimal. The optimality or Bellman equation is the basic entity in MDP theory and almost all existence, characterization, and computational results are based on its analysis.

132 citations

Journal ArticleDOI
TL;DR: In this paper, a class of risk-sensitive mean-field stochastic differential games with exponential cost functions is studied and the corresponding mean field equilibria are characterized in terms of backward-forward macroscopic McKean-Vlasov equations, Fokker-Planck-Kolmogorov equations and HJB equations.
Abstract: In this paper, we study a class of risk-sensitive mean-field stochastic differential games. We show that under appropriate regularity conditions, the mean-field value of the stochastic differential game with exponentiated integral cost functional coincides with the value function satisfying a Hamilton -Jacobi- Bellman (HJB) equation with an additional quadratic term. We provide an explicit solution of the mean-field best response when the instantaneous cost functions are log-quadratic and the state dynamics are affine in the control. An equivalent mean-field risk-neutral problem is formulated and the corresponding mean-field equilibria are characterized in terms of backward-forward macroscopic McKean-Vlasov equations, Fokker-Planck-Kolmogorov equations, and HJB equations. We provide numerical examples on the mean field behavior to illustrate both linear and McKean-Vlasov dynamics.

132 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353