Topic
Bellman equation
About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: The standard envelope theorems apply to choice sets with convex and topological structure, providing sufficient conditions for the value function to be differentiable in a parameter and characterizing its derivative as mentioned in this paper.
Abstract: The standard envelope theorems apply to choice sets with convex and topological structure, providing sufficient conditions for the value function to be differentiable in a parameter and characterizing its derivative. This paper studies optimization with arbitrary choice sets and shows that the traditional envelope formula holds at any differentiability point of the value function. We also provide conditions for the value function to be, variously, absolutely continuous, left- and right-differentiable, or fully differentiable. These results are applied to mechanism design, convex programming, continuous optimization problems, saddle-point problems, problems with parameterized constraints, and optimal stopping problems.
1,183 citations
••
TL;DR: It is shown that the constrained optimal control law has the largest region of asymptotic stability (RAS) and the result is a nearly optimal constrained state feedback controller that has been tuned a priori off-line.
1,045 citations
••
TL;DR: In this article, a stochastic differential formulation of recursive utility is given sufficient conditions for existence, uniqueness, time consistency, monotonicity, continuity, risk aversion, concavity, and other properties.
Abstract: A stochastic differential formulation of recursive utility is given sufficient conditions for existence, uniqueness, time consistency, monotonicity, continuity, risk aversion, concavity, and other properties. In the setting of Brownian information, recursive and intertemporal expected utility functions are observationally distinguishable. However, one cannot distinguish between a number of non-expected-utility theories of one-shot choice under uncertainty after they are suitably integrated into an intertemporal framework. In a "smooth" Markov setting, the stochastic differential utility model produces a generalization of the Hamilton-Bellman-Jacobi characterization of optimality. A companion paper explores the implications for asset prices. Copyright 1992 by The Econometric Society.
1,040 citations
••
TL;DR: An online algorithm based on policy iteration for learning the continuous-time optimal control solution with infinite horizon cost for nonlinear systems with known dynamics, which finds in real-time suitable approximations of both the optimal cost and the optimal control policy, while also guaranteeing closed-loop stability.
1,012 citations
••
01 Aug 2008TL;DR: It is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control.
Abstract: Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control. It is assumed that, at each iteration, the value and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to approximate the value function, whereas an action network is used to approximate the optimal control policy. It is stressed that this approach allows the implementation of HDP without knowing the internal dynamics of the system. The exact solution assumption holds for some classes of nonlinear systems and, specifically, in the specific case of the DT linear quadratic regulator (LQR), where the action is linear and the value quadratic in the states and NNs have zero approximation error. It is stressed that, for the LQR, HDP may be implemented without knowing the system A matrix by using two NNs. This fact is not generally appreciated in the folklore of HDP for the DT LQR, where only one critic NN is generally used.
919 citations