Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

DOI•

Regularization in reinforcement learning

[...]

Csaba Szepesvári¹, Amir-massoud Farahmand¹•Institutions (1)

University of Alberta¹

01 Jan 2011

TL;DR: It is proved that the regularization-based Approximate Value/Policy Iteration algorithms introduced in this thesis enjoys an oracle-like property and it may be used to achieve adaptivity: the performance is almost as good as the performance of the unknown best parameters.

...read moreread less

Abstract: This thesis studies the reinforcement learning and planning problems that are modeled by a discounted Markov Decision Process (MDP) with a large state space and finite action space. We follow the value-based approach in which a function approximator is used to estimate the optimal value function. The choice of function approximator, however, is nontrivial, as it depends on both the number of data samples and the MDP itself. The goal of this work is to introduce flexible and statistically-efficient algorithms that find close to optimal policies for these problems without much prior information about them. The recurring theme of this thesis is the application of the regularization technique to design value function estimators that choose their estimates from rich function spaces. We introduce regularization-based Approximate Value/Policy Iteration algorithms, analyze their statistical properties, and provide upper bounds on the performance loss of the resulted policy compared to the optimal one. The error bounds show the dependence of the performance loss on the number of samples, the capacity of the function space to which the estimated value function belongs, and some intrinsic properties of the MDP itself. Remarkably, the dependence on the number of samples in the task of policy evaluation is minimax optimal. We also address the problem of automatic parameter-tuning of reinforcement learning/planning algorithms and introduce a complexity regularization-based model selection algorithm. We prove that the algorithm enjoys an oracle-like property and it may be used to achieve adaptivity: the performance is almost as good as the performance of the unknown best parameters. Our two other contributions are used to analyze the aforementioned algorithms. First, we analyze the rate of convergence of the estimation error in regularized least-squares regression when the data is exponentially β-mixing. We prove that up to a logarithmic factor, the convergence rate is the same as the optimal minimax rate available for the i.i.d. case. Second, we attend to the question of how the errors at each iteration of the approximate policy/value iteration influence the quality of the resulting policy. We provide results that highlight some new aspects of these algorithms.

...read moreread less

43 citations

Journal Article•DOI•

Stochastic optimal operation of reservoirs based on copula functions

[...]

Xiaohui Lei, Qiaofeng Tan¹, Xu Wang, Hao Wang, Xin Wen¹, Chao Wang, Jingwen Zhang² - Show less +3 more•Institutions (2)

Hohai University¹, Wuhan University²

01 Feb 2018-Journal of Hydrology

TL;DR: In this paper, a stochastic optimization model for hydropower generation reservoirs was proposed, in which the transition probability matrix was calculated based on copula functions; and the value function of the last period was calculated by stepwise iteration.

...read moreread less

43 citations

Proceedings Article•DOI•

Piecewise linear value function approximation for factored MDPs

[...]

Pascal Poupart¹, Craig Boutilier¹, Relu Patrascu², Dale Schuurmans²•Institutions (2)

University of Toronto¹, University of Waterloo²

28 Jul 2002

TL;DR: It is argued that this architecture for constructing a piecewise linear combination of the subtask value functions, using greedy decision tree techniques, is suitable for many types of MDPs whose combinatorics are determined largely by the existence multiple conflicting objectives.

...read moreread less

Abstract: A number of proposals have been put forth in recent years for the solution of Markov decision processes (MDPs) whose state (and sometimes action) spaces are factored. One recent class of methods involves linear value function approximation, where the optimal value function is assumed to be a linear combination of some set of basis functions, with the aim of finding suitable weights. While sophisticated techniques have been developed for finding the best approximation within this constrained space, few methods have been proposed for choosing a suitable basis set, or modifying it if solution quality is found wanting. We propose a general framework, and specific proposals, that address both of these questions. In particular, we examine weakly coupled MDPs where a number of subtasks can be viewed independently modulo resource constraints. We then describe methods for constructing a piecewise linear combination of the subtask value functions, using greedy decision tree techniques. We argue that this architecture is suitable for many types of MDPs whose combinatorics are determined largely by the existence multiple conflicting objectives.

...read moreread less

43 citations

Journal Article•DOI•

Stochastic portfolio optimization with log utility

[...]

Tao Pang¹•Institutions (1)

North Carolina State University¹

01 Sep 2006-International Journal of Theoretical and Applied Finance

TL;DR: In this article, a portfolio optimization problem on an infinite time horizon is considered, where the risky asset price obeys a logarithmic Brownian motion, and the interest rate varies according to an ergodic Markov diffusion process.

...read moreread less

Abstract: A portfolio optimization problem on an infinite time horizon is considered. Risky asset price obeys a logarithmic Brownian motion, and the interest rate varies according to an ergodic Markov diffusion process. Moreover, the interest rate fluctuation is correlated with the risky asset price fluctuation. The goal is to choose optimal investment and consumption policies to maximize the infinite horizon expected discounted log utility of consumption. A dynamic programming principle is used to derive the dynamic programming equation (DPE). The explicit solutions for optimal consumption and investment control policies are obtained. In addition, for a special case, an explicit formula for the value function is given.

...read moreread less

43 citations

Journal Article•DOI•

Is There a Curse of Dimensionality for Contraction Fixed Points in the Worst Case

[...]

John Rust¹, Joseph F. Traub², Henryk Wozniakowski³•Institutions (3)

University of Maryland, College Park¹, Columbia University², University of Warsaw³

01 Jan 2002-Econometrica

TL;DR: In this paper, the authors studied the complexity of the contraction fixed point problem and showed that in the worst case the minimal number of function evaluations and arithmetic operations required to compute an e-approximation to a fixed point V * e B d increases exponentially in d. They showed that the curse of dimensionality disappears if the domain of Γ has additional special structure.

...read moreread less

Abstract: This paper analyzes the complexity of the contraction fixed point problem: compute an e-approximation to the fixed point V * = Γ(V * ) of a contraction mapping r that maps a Banach space B d of continuous functions of d variables into itself. We focus on quasi linear contractions where Γ is a nonlinear functional of a finite number of conditional expectation operators. This class includes contractive Fredholm integral equations that arise in asset pricing applications and the contractive Bellman equation from dynamic programming. In the absence of further restrictions on the domain of Γ, the quasi linear fixed point problem is subject to the curse of dimensionality, i.e., in the worst case the minimal number of function evaluations and arithmetic operations required to compute an e-approximation to a fixed point V * e B d increases exponentially in d. We show that the curse of dimensionality disappears if the domain of Γ has additional special structure. We identify a particular type of special structure for which the problem is strongly tractable even in the worst case, i.e., the number of function evaluations and arithmetic operations needed to compute an e-approximation of V * is bounded by Ce -p where C and p are constants independent of d. We present examples of economic problems that have this type of special structure including a class of rational expectations asset pricing problems for which the optimal exponent p = 1 is nearly achieved.

...read moreread less

43 citations

Collapse

Network Information

Performance

Metrics

6,698

Papers

155,793

Citations

No. of papers in the topic in previous years
Year	Papers
2023	261
2022	537
2021	369
2020	411
2019	348
2018	353

Bellman equation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics