scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Posted Content
TL;DR: In this paper, a new on-line scheme is presented to design the optimal coordination control for the consensus problem of multi-agent differential games by fuzzy adaptive dynamic programming (FADP), which brings together game theory, generalized fuzzy hyperbolic model (GFHM) and adaptive programming.
Abstract: In this paper, a new on-line scheme is presented to design the optimal coordination control for the consensus problem of multi-agent differential games by fuzzy adaptive dynamic programming (FADP), which brings together game theory, generalized fuzzy hyperbolic model (GFHM) and adaptive dynamic programming. In general, the optimal coordination control for multi-agent differential games is the solution of the coupled Hamilton-Jacobi (HJ) equations. Here, for the first time, GFHMs are used to approximate the solution (value functions) of the coupled HJ equations, based on policy iteration (PI) algorithm. Namely, for each agent, GFHM is used to capture the mapping between the local consensus error and local value function. Since our scheme uses the single-network rchitecture for each agent (which eliminates the action network model compared with dual-network architecture), it is a more reasonable architecture for multi-agent systems. Furthermore, the approximation solution is utilized to obtain the optimal coordination controls. Finally, we give the stability analysis for our scheme, and prove the weight estimation error and the local consensus error are uniformly ultimately bounded. Further, the control node trajectory is proven to be cooperative uniformly ultimately bounded.

47 citations

01 Jan 2013
TL;DR: A hybrid ADP with a clustering approach using both a deterministic lookahead policy and a value function approximation is developed that signicantly reduces the computational time, while still improving the ADP algorithm by exploiting the structure of the disruption transition functions.
Abstract: We consider a dynamic shortest path problem with stochastic disruptions in the network. We use both historical information and real-time information of the network for the dynamic routing decisions. We model the problem as a discrete time nite horizon Markov Decision Process (MDP). For networks with many levels of disruptions, the MDP faces the curses of dimensionality. We rst apply Approximate Dynamic Programming (ADP) algorithm with a standard value function approximation. Then, we improve the ADP algorithm by exploiting the structure of the disruption transition functions. We develop a hybrid ADP with a clustering approach using both a deterministic lookahead policy and a value function approximation. We develop a test bed of networks to evaluate the quality of the solutions. The hybrid ADP algorithm with clustering approach signicantly reduces the computational time, while still

47 citations

Journal ArticleDOI
TL;DR: In this article, the authors study the bilevel dynamic problem, which is a hierarchy of two dynamic optimization problems, where the constraint region of the upper level problem is determined implicitly by the solutions to the lower level optimal control problem.
Abstract: In this paper we study the bilevel dynamic problem, which is a hierarchy of two dynamic optimization problems, where the constraint region of the upper level problem is determined implicitly by the solutions to the lower level optimal control problem. To obtain optimality conditions, we reformulate the bilevel dynamic problem as a single level optimal control problem that involves the value function of the lower-level problem. Sensitivity analysis of the lower-level problem with respect to the perturbation in the upper-level decision variable is given and first-order necessary optimality conditions are derived by using nonsmooth analysis. A constraint qualification of calmness type and a sufficient condition for the calmness are also given.

47 citations

01 Jan 2000
TL;DR: Alternative numerical methods for approximating solutions to continuous-state dynamic programming (DP) problems are compared, including DPI and PPI to parameteric methods applied to the Euler equation for several test problems with closed-form solutions.
Abstract: We compare alternative numerical methods for approximating solutions to continuous-state dynamic programming (DP) problems. We distinguish two approaches: discrete approximation and parametric approximation. In the former, the continuous state space is discretized into a finite number of points N, and the resulting finite-state DP problem is solved numerically. In the latter, a function associated with the DP problem such as the value function, the policy function, or some other related function is approximated by a smooth function of K unknown parameters. Values of the parameters are chosen so that the parametric function approximates the true function as closely as possible. We focus on approximations that are linear in parameters, i.e. where the parametric approximation is a linear combination of K basis functions. We also focus on methods that approximate the value function V as the solution to the Bellman equation associated with the DP problem. In finite state DP problems the method of policy iteration is an effective iterative method for solving the Bellman equation that converges to V in a finite number of steps. Each iteration involves a policy valuation step that computes the value function Vα corresponding to a trial policy α. We show how policy iteration can be extended to continuous-state DP problems. For discrete approximation, we refer to the resulting algorithm as discrete policy iteration (DPI). Each policy valuation step requires the solution of a system of linear equations with N variables. For parametric approximation, we refer to the resulting algorithm as parametric policy iteration (PPI). Each policy valuation step requires the solution of a linear regression with K unknown parameters. The advantage of PPI is that it is generally much faster than DPI, particularly when V can be well-approximated with small K. The disadvantage is that the PPI algorithm may either fail to converge or may converge to an incorrect solution. We compare DPI and PPI to parameteric methods applied to the Euler equation for several test problems with closed-form solutions. We also compare the performance of these methods in several “real” applications, including a life-cycle consumption problem, an inventory investment problem, and a problem of optimal pricing, advertising, and exit decisions for newly introduced products.

47 citations

Journal ArticleDOI
TL;DR: In this article, it is shown that if the solution of the variational problem is smooth enough, the qualitative effects of parameter perturbations on the entire optimal arcs can be represented by a generalized Slutsky-type matrix, which holds in integral form and is symmetric negative semidefinite.
Abstract: autonomous variational calculus problem with a fixed vector of initial stocks, fixed initial and terminal time values, a free vector of terminal stocks, and a time-independent vector of parameters. It is shown that if the solution of the variational problem is smooth enough, the qualitative effects of parameter perturbations on the entire optimal arcs can be represented by a generalized Slutsky-type matrix, which holds in integral form and is symmetric negative semidefinite. Sufficient conditions for the optimal value function to be convex in the parameters are also given.

47 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353