Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•

Efficient approximate value iteration for continuous Gaussian POMDPs

[...]

Jur van den Berg¹, Sachin Patil², Ron Alterovitz²•Institutions (2)

University of Utah¹, University of North Carolina at Chapel Hill²

22 Jul 2012

TL;DR: This work introduces a highly efficient method for solving continuous partially-observable Markov decision processes (POMDPs) in which beliefs can be modeled using Gaussian distributions over the state space using an extended Kalman filter.

...read moreread less

Abstract: We introduce a highly efficient method for solving continuous partially-observable Markov decision processes (POMDPs) in which beliefs can be modeled using Gaussian distributions over the state space. Our method enables fast solutions to sequential decision making under uncertainty for a variety of problems involving noisy or incomplete observations and stochastic actions. We present an efficient approach to compute locally-valid approximations to the value function over continuous spaces in time polynomial (O[n4]) in the dimension n of the state space. To directly tackle the intractability of solving general POMDPs, we leverage the assumption that beliefs are Gaussian distributions over the state space, approximate the belief update using an extended Kalman filter (EKF), and represent the value function by a function that is quadratic in the mean and linear in the variance of the belief. Our approach iterates towards a linear control policy over the state space that is locally-optimal with respect to a user defined cost function, and is approximately valid in the vicinity of a nominal trajectory through belief space. We demonstrate the scalability and potential of our approach on problems inspired by robot navigation under uncertainty for state spaces of up to 128 dimensions.

...read moreread less

44 citations

Journal Article•DOI•

Global Optimal Control of Perturbed Systems

[...]

Lars Grüne¹, Oliver Junge²•Institutions (2)

University of Bayreuth¹, Technische Universität München²

01 Mar 2008-Journal of Optimization Theory and Applications

TL;DR: The method is based on a set-oriented discretization of the state space in combination with a new algorithm for the computation of shortest paths in weighted directed hypergraphs and it is proved the convergence of the scheme as the discretized parameter goes to zero.

...read moreread less

Abstract: We propose a new numerical method for the computation of the optimal value function of perturbed control systems and associated globally stabilizing optimal feedback controllers. The method is based on a set-oriented discretization of the state space in combination with a new algorithm for the computation of shortest paths in weighted directed hypergraphs. Using the concept of multivalued game, we prove the convergence of the scheme as the discretization parameter goes to zero.

...read moreread less

44 citations

Journal Article•DOI•

Approximation of backward stochastic differential equations using Malliavin weights and least-squares regression

[...]

Emmanuel Gobet¹, Plamen Turkedjiev¹•Institutions (1)

École Polytechnique¹

06 Jan 2016-Bernoulli

TL;DR: In this article, a numerical scheme for solving a dynamic programming equation with Malliavin weights arising from the time-discretization of backward stochastic differential equations with the integration by parts-representation of the Z-component was proposed.

...read moreread less

Abstract: We design a numerical scheme for solving a Dynamic Programming equation with Malliavin weights arising from the time-discretization of backward stochastic differential equations with the integration by parts-representation of the Z-component by [Ma-Zhang 2002]. When the sequence of conditional expectations is computed using empirical least-squares regressions, we establish, under general conditions, tight error bounds as the time-average of local regression errors only (up to logarithmic factors). We compute the algorithm complexity by a suitable optimization of the parameters, depending on the dimension and the smoothness of value functions, in the limit as the number of grid times goes to infinity. The estimates take into account the regularity of the terminal function.

...read moreread less

44 citations

Journal Article•DOI•

Error estimates for approximate solutions to Bellman equations associated with controlled jump-diffusions

[...]

Espen R. Jakobsen¹, Kenneth H. Karlsen², Claudia La Chioma•Institutions (2)

Norwegian University of Science and Technology¹, University of Oslo²

22 Jul 2008-Numerische Mathematik

TL;DR: In this article, error bounds for a class of monotone approximation schemes, which under some assumptions includes finite difference schemes, and bounds on the error induced when the original Levy measure is replaced by a finite measure with compact support, are derived.

...read moreread less

Abstract: We derive error estimates for approximate (viscosity) solutions of Bellman equations associated to controlled jump-diffusion processes, which are fully nonlinear integro-partial differential equations. Two main results are obtained: (i) error bounds for a class of monotone approximation schemes, which under some assumptions includes finite difference schemes, and (ii) bounds on the error induced when the original Levy measure is replaced by a finite measure with compact support, an approximation process that is commonly used when designing numerical schemes for integro-partial differential equations. Our proofs use and extend techniques introduced by Krylov and Barles-Jakobsen.

...read moreread less

44 citations

Proceedings Article•DOI•

Dual Representations for Dynamic Programming and Reinforcement Learning

[...]

Tao Wang, Michael Bowling, Dale Schuurmans

01 Apr 2007

TL;DR: This paper presents a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained, and derives novel dual forms of dynamic programming, including policy evaluation, policy iteration and value iteration to derive new forms of Sarsa and Q-learning.

...read moreread less

Abstract: We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed techniques for representing, approximating and estimating probability distributions, without running the risks associated with divergent value function estimation. A second advantage is that some distinct algorithms for the average reward and discounted reward case in the primal become unified under the dual. In this paper, we present a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained. With this reformulation, we then derive novel dual forms of dynamic programming, including policy evaluation, policy iteration and value iteration. Moreover, we derive dual formulations of temporal difference learning to obtain new forms of Sarsa and Q-learning. Finally, we scale these techniques up to large domains by introducing approximation, and develop new approximate off-policy learning algorithms that avoid the divergence problems associated with the primal approach. We show that the dual view yields a viable alternative to standard value function based techniques and opens new avenues for solving dynamic programming and reinforcement learning problems

...read moreread less

44 citations

Collapse

Network Information

Performance

Metrics

6,698

Papers

155,793

Citations

No. of papers in the topic in previous years
Year	Papers
2023	261
2022	537
2021	369
2020	411
2019	348
2018	353

Bellman equation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics