scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the authors show that direct integration of the optimal risk in a stopping problem for Brownian motion yields the value function of the monotone follower stochastic control problem and provide an explicit construction of its optimal process.
Abstract: This idea is employed to show that direct integration of the optimal risk in a stopping problem for Brownian motion, yields the value function of the so-called monotone follower stochastic control problem and provides an explicit construction of its optimal process. Ideas from the theory of balayage for continuous semimartingales are employed, in order to find novel and useful representations for the value functions of these problems.

100 citations

Dissertation
01 Jan 1998
TL;DR: This thesis provides an analysis (a proof of convergence, together with bounds on approximation error) of temporal-difference learning in the context of autonomous (uncontrolled) systems as applied to the approximation of an infinite horizon discounted rewards and average and differential rewards.
Abstract: In principle, a wide variety of sequential decision problems--ranging from dynamic resource allocation in telecommunication networks to financial risk management--can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming Such algorithms compute and store a value function, which evaluates expected future reward as a function of current state Unfortunately, exact computation of the value function typically requires time and storage that grow proportionately with the number of states, and consequently, the enormous state spaces that arise in practical applications render the algorithms intractable In this thesis, we study tractable methods that approximate the value function Our work builds on research in an area of artificial intelligence known as reinforcement learning A point of focus of this thesis is temporal-difference learning--a stochastic algorithm inspired to some extent by phenomena observed in animal behavior Given a selection of basis functions, the algorithm updates weights during simulation of the system such that the weighted combination of basis functions ultimately approximates a value function We provide an analysis (a proof of convergence, together with bounds on approximation error) of temporal-difference learning in the context of autonomous (uncontrolled) systems as applied to the approximation of (1) infinite horizon discounted rewards and (2) average and differential rewards As a special case of temporal-difference learning in a context involving control, we propose variants of the algorithm that generate approximate solutions to optimal stopping problems We analyze algorithms designed for several problem classes: (1) optimal stopping of a stationary mixing process with an infinite horizon and discounted rewards; (2) optimal stopping of an independent increments process with an infinite horizon and discounted rewards; (3) optimal stopping with a finite horizon and discounted rewards; (4) a zero-sum two-player stopping game with an infinite horizon and discounted rewards We also present a computational case study involving a complex optimal stopping problem that is representative of those arising in the financial derivatives industry In addition to algorithms for tuning basis function weights, we study an approach to basis function generation In particular, we explore the use of "scenarios" that are representative of the range of possible events in a system Each scenario is used to construct a basis function that maps states to future rewards contingent on the future realization of the scenario We derive, in the context of autonomous systems, a bound on the number of "representative scenarios" that suffices for uniformly accurate approximation of the value function The bound exhibits a dependence on a measure of "complexity" of the system that can often grow at a rate much slower that the state space size (Copies available exclusively from MIT Libraries Rm 14-0551, Cambridge, MA 02139-4307 Ph 617-253-5668; Fax 617-253-1690)

99 citations

Journal ArticleDOI
TL;DR: A numerical scheme for solving the Multi step-forward Dynamic Programming (MDP) equation arising from the time-discretization of backward stochastic differential equations, where the generator is assumed to be locally Lipschitz, which includes some cases of quadratic drivers.
Abstract: We design a numerical scheme for solving the Multi step-forward Dynamic Programming (MDP) equation arising from the time-discretization of backward stochastic differential equations. The generator is assumed to be locally Lipschitz, which includes some cases of quadratic drivers. When the large sequence of conditional expectations is computed using empirical least-squares regressions, under general conditions we establish an upper bound error as the average, rather than the sum, of local regression errors only, suggesting that our error estimation is tight. Despite the nested regression problems, the interdependency errors are justified to be at most of the order of the statistical regression errors (up to logarithmic factor). Finally, we optimize the algorithm parameters, depending on the dimension and on the smoothness of value functions, in the limit as the time mesh size goes to zero and compute the complexity needed to achieve a given accuracy. Numerical experiments are presented illustrating theoretical convergence estimates.

99 citations

Book ChapterDOI
25 Mar 2002
TL;DR: It is proved that the closed form of the state-feedback solution to finite time optimal control based on quadratic or linear norms performance criteria is a time-varying piecewise affine feedback control law.
Abstract: In this paper we study the solution to optimal control problems for discrete time linear hybrid systems. First, we prove that the closed form of the state-feedback solution to finite time optimal control based on quadratic or linear norms performance criteria is a time-varying piecewise affine feedback control law. Then, we give an insight into the structure of the optimal state-feedback solution and of the value function. Finally, we briefly describe how the optimal control law can be computed by means of multiparametric programming.

99 citations

Proceedings ArticleDOI
04 Dec 2001
TL;DR: Two algorithms are presented that efficiently perform the online evaluation of the explicit optimal control law both in terms of storage demands and computational complexity.
Abstract: For discrete-time linear time-invariant systems with constraints on inputs and outputs, the constrained finite-time optimal controller can be obtained explicitly as a piecewise-affine function of the initial state via multi-parametric programming. By exploiting the properties of the value function, we present two algorithms that efficiently perform the online evaluation of the explicit optimal control law both in terms of storage demands and computational complexity. The algorithms are particularly effective when used for model-predictive control (MPC) where an open-loop constrained finite-time optimal control problem has to be solved at each sampling time.

98 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353