scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, explicit expressions of the optimal portfolio strategy, the value function and the efficient frontier of the mean-variance problem are expressed in terms of the solution of the backward stochastic Riccati equation.

42 citations

Journal ArticleDOI
TL;DR: This paper analyzes the stability of the closed-loop system without relying on the Lyapunov stability theory and also proposes a policy iteration (PI) approach to approximate the value function with the convergence proof.

42 citations

Journal ArticleDOI
TL;DR: In this paper, the fully nonlinear stochastic Hamilton-Jacobi-Bellman (HJB) equation was studied for the optimal control problem of stochastically differential equations with random coefficiencies.
Abstract: In this paper we study the fully nonlinear stochastic Hamilton--Jacobi--Bellman (HJB) equation for the optimal stochastic control problem of stochastic differential equations with random coefficien...

42 citations

Posted Content
TL;DR: A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approximate MDP solver like least-squares policy iteration (LSPI).
Abstract: This paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approximate MDP solver like least squares policy iteration (LSPI). The key innovation is a coordinate-free representation of value functions, using the theory of smooth functions on a Riemannian manifold. Hodge theory yields a constructive method for generating basis functions for approximating value functions based on the eigenfunctions of the self-adjoint (Laplace-Beltrami) operator on manifolds. In effect, this approach performs a global Fourier analysis on the state space graph to approximate value functions, where the basis functions reflect the largescale topology of the underlying state space. A new class of algorithms called Representation Policy Iteration (RPI) are presented that automatically learn both basis functions and approximately optimal policies. Illustrative experiments compare the performance of RPI with that of LSPI using two handcoded basis functions (RBF and polynomial state encodings).

42 citations

Book ChapterDOI
16 Jul 2001
TL;DR: A bound is provided on the average reward of the policy obtained by solving the Bellman equations which depends on the relationship between the discount factor and the mixingtime of the Markov chain.
Abstract: In many reinforcement learningproblems, it is appropriate to optimize the average reward. In practice, this is often done by solving the Bellman equations usinga discount factor close to 1. In this paper, we provide a bound on the average reward of the policy obtained by solving the Bellman equations which depends on the relationship between the discount factor and the mixingtime of the Markov chain. We extend this result to the direct policy gradient of Baxter and Bartlett, in which a discount parameter is used to find a biased estimate of the gradient of the average reward with respect to the parameters of a policy. We show that this biased gradient is an exact gradient of a related discounted problem and provide a bound on the optima found by following these biased gradients of the average reward. Further, we show that the exact Hessian in this related discounted problem is an approximate Hessian of the average reward, with equality in the limit the discount factor tends to 1. We then provide an algorithm to estimate the Hessian from a sample path of the underlyingMark ov chain, which converges with probability 1.

42 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353