scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
DOI
01 Jan 2011
TL;DR: It is proved that the regularization-based Approximate Value/Policy Iteration algorithms introduced in this thesis enjoys an oracle-like property and it may be used to achieve adaptivity: the performance is almost as good as the performance of the unknown best parameters.
Abstract: This thesis studies the reinforcement learning and planning problems that are modeled by a discounted Markov Decision Process (MDP) with a large state space and finite action space. We follow the value-based approach in which a function approximator is used to estimate the optimal value function. The choice of function approximator, however, is nontrivial, as it depends on both the number of data samples and the MDP itself. The goal of this work is to introduce flexible and statistically-efficient algorithms that find close to optimal policies for these problems without much prior information about them. The recurring theme of this thesis is the application of the regularization technique to design value function estimators that choose their estimates from rich function spaces. We introduce regularization-based Approximate Value/Policy Iteration algorithms, analyze their statistical properties, and provide upper bounds on the performance loss of the resulted policy compared to the optimal one. The error bounds show the dependence of the performance loss on the number of samples, the capacity of the function space to which the estimated value function belongs, and some intrinsic properties of the MDP itself. Remarkably, the dependence on the number of samples in the task of policy evaluation is minimax optimal. We also address the problem of automatic parameter-tuning of reinforcement learning/planning algorithms and introduce a complexity regularization-based model selection algorithm. We prove that the algorithm enjoys an oracle-like property and it may be used to achieve adaptivity: the performance is almost as good as the performance of the unknown best parameters. Our two other contributions are used to analyze the aforementioned algorithms. First, we analyze the rate of convergence of the estimation error in regularized least-squares regression when the data is exponentially β-mixing. We prove that up to a logarithmic factor, the convergence rate is the same as the optimal minimax rate available for the i.i.d. case. Second, we attend to the question of how the errors at each iteration of the approximate policy/value iteration influence the quality of the resulting policy. We provide results that highlight some new aspects of these algorithms.

43 citations

Journal ArticleDOI
TL;DR: In this paper, a stochastic optimization model for hydropower generation reservoirs was proposed, in which the transition probability matrix was calculated based on copula functions; and the value function of the last period was calculated by stepwise iteration.

43 citations

Proceedings ArticleDOI
28 Jul 2002
TL;DR: It is argued that this architecture for constructing a piecewise linear combination of the subtask value functions, using greedy decision tree techniques, is suitable for many types of MDPs whose combinatorics are determined largely by the existence multiple conflicting objectives.
Abstract: A number of proposals have been put forth in recent years for the solution of Markov decision processes (MDPs) whose state (and sometimes action) spaces are factored. One recent class of methods involves linear value function approximation, where the optimal value function is assumed to be a linear combination of some set of basis functions, with the aim of finding suitable weights. While sophisticated techniques have been developed for finding the best approximation within this constrained space, few methods have been proposed for choosing a suitable basis set, or modifying it if solution quality is found wanting. We propose a general framework, and specific proposals, that address both of these questions. In particular, we examine weakly coupled MDPs where a number of subtasks can be viewed independently modulo resource constraints. We then describe methods for constructing a piecewise linear combination of the subtask value functions, using greedy decision tree techniques. We argue that this architecture is suitable for many types of MDPs whose combinatorics are determined largely by the existence multiple conflicting objectives.

43 citations

Journal ArticleDOI
TL;DR: In this article, a portfolio optimization problem on an infinite time horizon is considered, where the risky asset price obeys a logarithmic Brownian motion, and the interest rate varies according to an ergodic Markov diffusion process.
Abstract: A portfolio optimization problem on an infinite time horizon is considered. Risky asset price obeys a logarithmic Brownian motion, and the interest rate varies according to an ergodic Markov diffusion process. Moreover, the interest rate fluctuation is correlated with the risky asset price fluctuation. The goal is to choose optimal investment and consumption policies to maximize the infinite horizon expected discounted log utility of consumption. A dynamic programming principle is used to derive the dynamic programming equation (DPE). The explicit solutions for optimal consumption and investment control policies are obtained. In addition, for a special case, an explicit formula for the value function is given.

43 citations

Journal ArticleDOI
TL;DR: In this paper, the authors studied the complexity of the contraction fixed point problem and showed that in the worst case the minimal number of function evaluations and arithmetic operations required to compute an e-approximation to a fixed point V * e B d increases exponentially in d. They showed that the curse of dimensionality disappears if the domain of Γ has additional special structure.
Abstract: This paper analyzes the complexity of the contraction fixed point problem: compute an e-approximation to the fixed point V * = Γ(V * ) of a contraction mapping r that maps a Banach space B d of continuous functions of d variables into itself. We focus on quasi linear contractions where Γ is a nonlinear functional of a finite number of conditional expectation operators. This class includes contractive Fredholm integral equations that arise in asset pricing applications and the contractive Bellman equation from dynamic programming. In the absence of further restrictions on the domain of Γ, the quasi linear fixed point problem is subject to the curse of dimensionality, i.e., in the worst case the minimal number of function evaluations and arithmetic operations required to compute an e-approximation to a fixed point V * e B d increases exponentially in d. We show that the curse of dimensionality disappears if the domain of Γ has additional special structure. We identify a particular type of special structure for which the problem is strongly tractable even in the worst case, i.e., the number of function evaluations and arithmetic operations needed to compute an e-approximation of V * is bounded by Ce -p where C and p are constants independent of d. We present examples of economic problems that have this type of special structure including a class of rational expectations asset pricing problems for which the optimal exponent p = 1 is nearly achieved.

43 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353