Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

[...]

Alekh Agarwal¹, Sham M. Kakade², Jason D. Lee³, Gaurav Mahajan⁴•Institutions (4)

Microsoft¹, University of Washington², Princeton University³, University of California, San Diego⁴

15 Jul 2020

TL;DR: One insight of this work is in formalizing the importance how a favorable initial state distribution provides a means to circumvent worst-case exploration issues, analogous to the global convergence guarantees of iterative value function based algorithms.

...read moreread less

Abstract: Policy gradient (PG) methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution (say with a sufficiently rich policy class); how they cope with approximation error due to using a restricted class of parametric policies; or their finite sample behavior. Such characterizations are important not only to compare these methods to their approximate value function counterparts (where such issues are relatively well understood, at least in the worst case), but also to help with more principled approaches to algorithm design. This work provides provable characterizations of computational, approximation, and sample size issues with regards to policy gradient methods in the context of discounted Markov Decision Processes (MDPs). We focus on both: 1) ``tabular'' policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy, and 2) restricted policy classes, which may not contain the optimal policy and where we provide agnostic learning results. In the emph{tabular setting}, our main results are: 1) convergence rate to global optimum for direct parameterization and projected gradient ascent 2) an asymptotic convergence to global optimum for softmax policy parameterization and PG; and a convergence rate with additional entropy regularization, and 3) dimension-free convergence to global optimum for softmax policy parameterization and Natural Policy Gradient (NPG) method with exact gradients. In emph{function approximation}, we further analyze NPG with exact as well as inexact gradients under certain smoothness assumptions on the policy parameterization and establish rates of convergence in terms of the quality of the initial state distribution. One insight of this work is in formalizing how a favorable initial state distribution provides a means to circumvent worst-case exploration issues. Overall, these results place PG methods under a solid theoretical footing, analogous to the global convergence guarantees of iterative value function based algorithms.

...read moreread less

198 citations

Proceedings Article•DOI•

An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

[...]

Ronald Parr¹, Lihong Li², Gavin Taylor¹, Christopher Painter-Wakefield¹, Michael L. Littman² - Show less +1 more•Institutions (2)

Duke University¹, Rutgers University²

05 Jul 2008

TL;DR: It is shown that linear value-function approximation is equivalent to a form of linear model approximation, and a relationship between the model-approximation error and the Bellman error is derived, which can guide feature selection for model improvement and/or value- function improvement.

...read moreread less

Abstract: We show that linear value-function approximation is equivalent to a form of linear model approximation. We then derive a relationship between the model-approximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or value-function improvement. We also show how these results give insight into the behavior of existing feature-selection algorithms.

...read moreread less

198 citations

Journal Article•DOI•

Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems.

[...]

Huaguang Zhang¹, Chunbin Qin², Bin Jiang³, Yanhong Luo¹•Institutions (3)

Northeastern University (China)¹, Henan University², Nanjing University of Aeronautics and Astronautics³

28 Jul 2014-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An online adaptive policy learning algorithm (APLA) based on adaptive dynamic programming (ADP) is proposed for learning in real-time the solution to the Hamilton-Jacobi-Isaacs (HJI) equation, which appears in the H∞ control problem.

...read moreread less

Abstract: The problem of H∞ state feedback control of affine nonlinear discrete-time systems with unknown dynamics is investigated in this paper. An online adaptive policy learning algorithm (APLA) based on adaptive dynamic programming (ADP) is proposed for learning in real-time the solution to the Hamilton-Jacobi-Isaacs (HJI) equation, which appears in the H∞ control problem. In the proposed algorithm, three neural networks (NNs) are utilized to find suitable approximations of the optimal value function and the saddle point feedback control and disturbance policies. Novel weight updating laws are given to tune the critic, actor, and disturbance NNs simultaneously by using data generated in real-time along the system trajectories. Considering NN approximation errors, we provide the stability analysis of the proposed algorithm with Lyapunov approach. Moreover, the need of the system input dynamics for the proposed algorithm is relaxed by using a NN identification scheme. Finally, simulation examples show the effectiveness of the proposed algorithm.

...read moreread less

197 citations

Journal Article•DOI•

On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman equations

[...]

Guy Barles¹, Espen R. Jakobsen²•Institutions (2)

François Rabelais University¹, Norwegian University of Science and Technology²

01 Jan 2002-Mathematical Modelling and Numerical Analysis

TL;DR: General results on the rate of convergence of a certain class of monotone approximation schemes for stationary Hamilton-Jacobi- Bellman equations with variable coecients are obtained using systematically a tricky idea of N.V. Krylov.

...read moreread less

Abstract: Using systematically a tricky idea of N.V. Krylov, we obtain general results on the rate of convergence of a certain class of monotone approximation schemes for stationary Hamilton-Jacobi-Bellman equations with variable coefficients. This result applies in particular to control schemes based on the dynamic programming principle and to finite difference schemes despite, here, we are not able to treat the most general case. General results have been obtained earlier by Krylov for finite difference schemes in the stationary case with constant coefficients and in the time-dependent case with variable coefficients by using control theory and probabilistic methods. In this paper we are able to handle variable coefficients by a purely analytical method. In our opinion this way is far simpler and, for the cases we can treat, it yields a better rate of convergence than Krylov obtains in the variable coefficients case.

...read moreread less

197 citations

Journal Article•DOI•

Basis Function Adaptation in Temporal Difference Reinforcement Learning

[...]

Ishai Menache¹, Shie Mannor², Nahum Shimkin¹•Institutions (2)

Technion – Israel Institute of Technology¹, McGill University²

01 Feb 2005-Annals of Operations Research

TL;DR: This paper examines methods for adapting the basis function during the learning process in the context of evaluating the value function under a fixed control policy using the Bellman approximation error as an optimization criterion.

...read moreread less

Abstract: Reinforcement Learning (RL) is an approach for solving complex multi-stage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters. Function approximation is essential for problems with a large state space, as it facilitates compact representation and enables generalization. Linear approximation architectures (where the adjustable parameters are the weights of pre-fixed basis functions) have recently gained prominence due to efficient algorithms and convergence guarantees. Nonetheless, an appropriate choice of basis function is important for the success of the algorithm. In the present paper we examine methods for adapting the basis function during the learning process in the context of evaluating the value function under a fixed control policy. Using the Bellman approximation error as an optimization criterion, we optimize the weights of the basis function while simultaneously adapting the (non-linear) basis function parameters. We present two algorithms for this problem. The first uses a gradient-based approach and the second applies the Cross Entropy method. The performance of the proposed algorithms is evaluated and compared in simulations.

...read moreread less

194 citations

Collapse

Network Information

Performance

Metrics

6,698

Papers

155,793

Citations

No. of papers in the topic in previous years
Year	Papers
2023	261
2022	537
2021	369
2020	411
2019	348
2018	353

Bellman equation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics