scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Proceedings Article
22 Jul 2012
TL;DR: This work introduces a highly efficient method for solving continuous partially-observable Markov decision processes (POMDPs) in which beliefs can be modeled using Gaussian distributions over the state space using an extended Kalman filter.
Abstract: We introduce a highly efficient method for solving continuous partially-observable Markov decision processes (POMDPs) in which beliefs can be modeled using Gaussian distributions over the state space. Our method enables fast solutions to sequential decision making under uncertainty for a variety of problems involving noisy or incomplete observations and stochastic actions. We present an efficient approach to compute locally-valid approximations to the value function over continuous spaces in time polynomial (O[n4]) in the dimension n of the state space. To directly tackle the intractability of solving general POMDPs, we leverage the assumption that beliefs are Gaussian distributions over the state space, approximate the belief update using an extended Kalman filter (EKF), and represent the value function by a function that is quadratic in the mean and linear in the variance of the belief. Our approach iterates towards a linear control policy over the state space that is locally-optimal with respect to a user defined cost function, and is approximately valid in the vicinity of a nominal trajectory through belief space. We demonstrate the scalability and potential of our approach on problems inspired by robot navigation under uncertainty for state spaces of up to 128 dimensions.

44 citations

Journal ArticleDOI
TL;DR: The method is based on a set-oriented discretization of the state space in combination with a new algorithm for the computation of shortest paths in weighted directed hypergraphs and it is proved the convergence of the scheme as the discretized parameter goes to zero.
Abstract: We propose a new numerical method for the computation of the optimal value function of perturbed control systems and associated globally stabilizing optimal feedback controllers. The method is based on a set-oriented discretization of the state space in combination with a new algorithm for the computation of shortest paths in weighted directed hypergraphs. Using the concept of multivalued game, we prove the convergence of the scheme as the discretization parameter goes to zero.

44 citations

Journal ArticleDOI
TL;DR: In this article, a numerical scheme for solving a dynamic programming equation with Malliavin weights arising from the time-discretization of backward stochastic differential equations with the integration by parts-representation of the Z-component was proposed.
Abstract: We design a numerical scheme for solving a Dynamic Programming equation with Malliavin weights arising from the time-discretization of backward stochastic differential equations with the integration by parts-representation of the Z-component by [Ma-Zhang 2002]. When the sequence of conditional expectations is computed using empirical least-squares regressions, we establish, under general conditions, tight error bounds as the time-average of local regression errors only (up to logarithmic factors). We compute the algorithm complexity by a suitable optimization of the parameters, depending on the dimension and the smoothness of value functions, in the limit as the number of grid times goes to infinity. The estimates take into account the regularity of the terminal function.

44 citations

Journal ArticleDOI
TL;DR: In this article, error bounds for a class of monotone approximation schemes, which under some assumptions includes finite difference schemes, and bounds on the error induced when the original Levy measure is replaced by a finite measure with compact support, are derived.
Abstract: We derive error estimates for approximate (viscosity) solutions of Bellman equations associated to controlled jump-diffusion processes, which are fully nonlinear integro-partial differential equations. Two main results are obtained: (i) error bounds for a class of monotone approximation schemes, which under some assumptions includes finite difference schemes, and (ii) bounds on the error induced when the original Levy measure is replaced by a finite measure with compact support, an approximation process that is commonly used when designing numerical schemes for integro-partial differential equations. Our proofs use and extend techniques introduced by Krylov and Barles-Jakobsen.

44 citations

Proceedings ArticleDOI
01 Apr 2007
TL;DR: This paper presents a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained, and derives novel dual forms of dynamic programming, including policy evaluation, policy iteration and value iteration to derive new forms of Sarsa and Q-learning.
Abstract: We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed techniques for representing, approximating and estimating probability distributions, without running the risks associated with divergent value function estimation. A second advantage is that some distinct algorithms for the average reward and discounted reward case in the primal become unified under the dual. In this paper, we present a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained. With this reformulation, we then derive novel dual forms of dynamic programming, including policy evaluation, policy iteration and value iteration. Moreover, we derive dual formulations of temporal difference learning to obtain new forms of Sarsa and Q-learning. Finally, we scale these techniques up to large domains by introducing approximation, and develop new approximate off-policy learning algorithms that avoid the divergence problems associated with the primal approach. We show that the dual view yields a viable alternative to standard value function based techniques and opens new avenues for solving dynamic programming and reinforcement learning problems

44 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353