scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, a class of risk-sensitive mean-field stochastic differential games is studied, and the authors show that the mean field value of the exponentiated cost function coincides with the value function of a Hamilton-Jacobi-Bellman-Fleming (HJBF) equation with an additional quadratic term.

54 citations

Proceedings Article
01 Jan 2020
TL;DR: A new Variational Policy Gradient Theorem for RL with general utilities is derived, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.
Abstract: In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases. Such generality invalidates the Bellman equation. As this means that dynamic programming no longer works, we focus on direct policy search. Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function. We develop a variational Monte Carlo gradient estimation algorithm to compute the policy gradient based on sample paths. We prove that the variational policy gradient scheme converges globally to the optimal policy for the general objective, though the optimization problem is nonconvex. We also establish its rate of convergence of the order $O(1/t)$ by exploiting the hidden convexity of the problem, and proves that it converges exponentially when the problem admits hidden strong convexity. Our analysis applies to the standard RL problem with cumulative rewards as a special case, in which case our result improves the available convergence rate.

54 citations

Journal ArticleDOI
TL;DR: In this article, the authors consider a class of finite horizon optimal control problems with unbounded data for nonlinear systems, which includes the Linear-Quadratic (LQ) problem.
Abstract: We consider a class of finite horizon optimal control problems with unbounded data for nonlinear systems, which includes the Linear-Quadratic (LQ) problem. We give comparison results between the value function and viscosity sub- and supersolutions of the Bellman equation, and prove uniqueness for this equation among locally Lipschitz functions bounded below. As an application we show that an optimal control for the LQ problem is nearly optimal for a large class of small unbounded nonlinear and non-quadratic perturbations of the same problem.

54 citations

Journal ArticleDOI
TL;DR: In this article, the authors compare two different calmness conditions which are widely used in the literature on bilevel programming and on mathematical programs with equilibrium constraints, and they seem to suggest that partial calmness is considerably more restrictive than calmness of the perturbed generalized equation.
Abstract: In this article, we compare two different calmness conditions which are widely used in the literature on bilevel programming and on mathematical programs with equilibrium constraints. In order to do so, we consider convex bilevel programming as a kind of intersection between both research areas. The so-called partial calmness concept is based on the function value approach for describing the lower level solution set. Alternatively, calmness in the sense of multifunctions may be considered for perturbations of the generalized equation representing the same lower level solution set. Both concepts allow to derive first-order necessary optimality conditions via tools of generalized differentiation introduced by Mordukhovich. They are very different, however, concerning their range of applicability and the form of optimality conditions obtained. The results of this article seem to suggest that partial calmness is considerably more restrictive than calmness of the perturbed generalized equation. This fact is al...

54 citations

BookDOI
01 Jan 1994
TL;DR: In this article, a theory of differential games and applications in worst-case controller design are presented. But the authors focus on zero-sum differential games: Pursuit-evasion games and numerical schemes.
Abstract: I. Zero-sum differential games: Theory and applications in worst-case controller design.- A Theory of Differential Games.- H?-Optimal Control of Singularly Perturbed Systems with Sampled-State Measurements.- New Results on Nonlinear H? Control Via Measurement Feedback.- Reentry Trajectory Optimization under Atmospheric Uncertainty as a Differential Game.- II. Zero-sum differential games: Pursuit-evasion games and numerical schemes.- Fully Discrete Schemes for the Value Function of Pursuit-Evasion Games.- Zero Sum Differential Games with Stopping Times: Some Results about its Numerical Resolution.- Singular Paths in Differential Games with Simple Motion.- The Circular Wall Pursuit.- III. Mathematical programming techniques.- Decomposition of Multi-Player Linear Programs.- Convergent Stepsizes for Constrained Min-Max Algorithms.- Algorithms for the Solution of a Large-Scale Single-Controller Stochastic Game.- IV. Stochastic games: Differential, sequential and Markov Games.- Stochastic Games with Average Cost Constraints.- Stationary Equilibria for Nonzero-Sum Average Payoff Ergodic Stochastic Games and General State Space.- Overtaking Equilibria for Switching Regulator and Tracking Games.- Monotonicity of Optimal Policies in a Zero Sum Game: A Flow Control Model.- V. Applications.- Capital Accumulation Subject to Pollution Control: A Differential Game with a Feedback Nash Equilibrium.- Coastal States and Distant Water Fleets Under Extended Jurisdiction: The Search for Optimal Incentive Schemes.- Stabilizing Management and Structural Development of Open-Access Fisheries.- The Non-Uniqueness of Markovian Strategy Equilibrium: The Case of Continuous Time Models for Non-Renewable Resources.- An Evolutionary Game Theory for Differential Equation Models with Reference to Ecosystem Management.- On Barter Contracts in Electricity Exchange.- Preventing Minority Disenfranchisement Through Dynamic Bayesian Reapportionment of Legislative Voting Power.- Learning by Doing and Technology Sharing in Asymmetric Duopolies.

54 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353