scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: It is illustrated that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but whendone by aggregation it does not, which implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability.
Abstract: We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic and distributed policy iteration. Our discussion of policy evaluation is couched in general terms and aims to unify the available methods in the light of recent research developments and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods, such as least-squares temporal difference (LSTD), and iterative methods, such as least-squares policy evaluation (LSPE) and TD (λ), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods and provides the connecting link between LSTD and LSPE. Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.

265 citations

Journal ArticleDOI
TL;DR: In this paper, the value function of Mayer's problem arising in optimal control is investigated, and lower semicontinuous solutions of the associated Hamilton-Jacobi-Bellman equation are defined in three (equivalent) ways.
Abstract: The value function of Mayer’s problem arising in optimal control is investigated, and lower semicontinuous solutions of the associated Hamilton–Jacobi–Bellman equation are defined in three (equivalent) ways. Under quite weak assumptions about the control system, the value function is the unique solution. Moreover, it is stable with respect to perturbations of the control system and the cost. It coincides with the viscosity solution whenever it is continuous.

263 citations

Proceedings Article
04 Aug 2001
TL;DR: This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions and produces a logical description of the optimal value function and policy by constructing a set of first-order formulae that minimally partition state space according to distinctions made by the valuefunction and policy.
Abstract: We present a dynamic programming approach for the solution of first-order Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions. It produces a logical description of the optimal value function and policy by constructing a set of first-order formulae that minimally partition state space according to distinctions made by the value function and policy. This is achieved through the use of an operation known as decision-theoretic regression. In effect, our algorithm performs value iteration without explicit enumeration of either the state or action spaces of the MDP. This allows problems involving relational fluents and quantification to be solved without requiring explicit state space enumeration or conversion to propositional form.

262 citations

Book ChapterDOI
14 Dec 1994
TL;DR: In this article, a general framework for hybrid control problems is proposed, which encompasses several types of hybrid phenomena considered in the literature, and a specific control problem is studied in this framework, leading to an existence result for optimal controls.
Abstract: We propose a very general framework for hybrid control problems that encompasses several types of hybrid phenomena considered in the literature. A specific control problem is studied in this framework, leading to an existence result for optimal controls. The "value function" associated with this problem is expected to satisfy a set of "generalized quasi-variational inequalities". >

262 citations

Journal ArticleDOI
TL;DR: In this article, the value function of the stochastic control problem is a smooth solution of the associated Hamilton-Jacobi-Bellman (HJB) equation and the optimal policy is shown to exist and given in a feedback form from the optimality conditions in the HJB equation.

256 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353