Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A new approach to the skorohod problem, and its applications

[...]

N. El Karoui¹, Ioannis Karatzas²•Institutions (2)

Pierre-and-Marie-Curie University¹, Columbia University²

01 Sep 1991-Stochastics and Stochastics Reports

TL;DR: In this article, the authors show that direct integration of the optimal risk in a stopping problem for Brownian motion yields the value function of the monotone follower stochastic control problem and provide an explicit construction of its optimal process.

...read moreread less

Abstract: This idea is employed to show that direct integration of the optimal risk in a stopping problem for Brownian motion, yields the value function of the so-called monotone follower stochastic control problem and provides an explicit construction of its optimal process. Ideas from the theory of balayage for continuous semimartingales are employed, in order to find novel and useful representations for the value functions of these problems.

...read moreread less

100 citations

Dissertation•

Learning and value function approximation in complex decision processes

[...]

Benjamin Van Roy, John N. Tsitsiklis

01 Jan 1998

TL;DR: This thesis provides an analysis (a proof of convergence, together with bounds on approximation error) of temporal-difference learning in the context of autonomous (uncontrolled) systems as applied to the approximation of an infinite horizon discounted rewards and average and differential rewards.

...read moreread less

Abstract: In principle, a wide variety of sequential decision problems--ranging from dynamic resource allocation in telecommunication networks to financial risk management--can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming Such algorithms compute and store a value function, which evaluates expected future reward as a function of current state Unfortunately, exact computation of the value function typically requires time and storage that grow proportionately with the number of states, and consequently, the enormous state spaces that arise in practical applications render the algorithms intractable In this thesis, we study tractable methods that approximate the value function Our work builds on research in an area of artificial intelligence known as reinforcement learning A point of focus of this thesis is temporal-difference learning--a stochastic algorithm inspired to some extent by phenomena observed in animal behavior Given a selection of basis functions, the algorithm updates weights during simulation of the system such that the weighted combination of basis functions ultimately approximates a value function We provide an analysis (a proof of convergence, together with bounds on approximation error) of temporal-difference learning in the context of autonomous (uncontrolled) systems as applied to the approximation of (1) infinite horizon discounted rewards and (2) average and differential rewards As a special case of temporal-difference learning in a context involving control, we propose variants of the algorithm that generate approximate solutions to optimal stopping problems We analyze algorithms designed for several problem classes: (1) optimal stopping of a stationary mixing process with an infinite horizon and discounted rewards; (2) optimal stopping of an independent increments process with an infinite horizon and discounted rewards; (3) optimal stopping with a finite horizon and discounted rewards; (4) a zero-sum two-player stopping game with an infinite horizon and discounted rewards We also present a computational case study involving a complex optimal stopping problem that is representative of those arising in the financial derivatives industry In addition to algorithms for tuning basis function weights, we study an approach to basis function generation In particular, we explore the use of "scenarios" that are representative of the range of possible events in a system Each scenario is used to construct a basis function that maps states to future rewards contingent on the future realization of the scenario We derive, in the context of autonomous systems, a bound on the number of "representative scenarios" that suffices for uniformly accurate approximation of the value function The bound exhibits a dependence on a measure of "complexity" of the system that can often grow at a rate much slower that the state space size (Copies available exclusively from MIT Libraries Rm 14-0551, Cambridge, MA 02139-4307 Ph 617-253-5668; Fax 617-253-1690)

...read moreread less

99 citations

Journal Article•DOI•

Linear regression MDP scheme for discrete backward stochastic differential equations under general conditions

[...]

Emmanuel Gobet¹, Plamen Turkedjiev¹•Institutions (1)

École Polytechnique¹

06 Aug 2015-Mathematics of Computation

TL;DR: A numerical scheme for solving the Multi step-forward Dynamic Programming (MDP) equation arising from the time-discretization of backward stochastic differential equations, where the generator is assumed to be locally Lipschitz, which includes some cases of quadratic drivers.

...read moreread less

Abstract: We design a numerical scheme for solving the Multi step-forward Dynamic Programming (MDP) equation arising from the time-discretization of backward stochastic differential equations. The generator is assumed to be locally Lipschitz, which includes some cases of quadratic drivers. When the large sequence of conditional expectations is computed using empirical least-squares regressions, under general conditions we establish an upper bound error as the average, rather than the sum, of local regression errors only, suggesting that our error estimation is tight. Despite the nested regression problems, the interdependency errors are justified to be at most of the order of the statistical regression errors (up to logarithmic factor). Finally, we optimize the algorithm parameters, depending on the dimension and on the smoothness of value functions, in the limit as the time mesh size goes to zero and compute the complexity needed to achieve a given accuracy. Numerical experiments are presented illustrating theoretical convergence estimates.

...read moreread less

99 citations

Book Chapter•DOI•

On the Optimal Control Law for Linear Discrete Time Hybrid Systems

[...]

Alberto Bemporad¹, Francesco Borrelli², Manfred Morari²•Institutions (2)

University of Siena¹, ETH Zurich²

25 Mar 2002

TL;DR: It is proved that the closed form of the state-feedback solution to finite time optimal control based on quadratic or linear norms performance criteria is a time-varying piecewise affine feedback control law.

...read moreread less

Abstract: In this paper we study the solution to optimal control problems for discrete time linear hybrid systems. First, we prove that the closed form of the state-feedback solution to finite time optimal control based on quadratic or linear norms performance criteria is a time-varying piecewise affine feedback control law. Then, we give an insight into the structure of the optimal state-feedback solution and of the value function. Finally, we briefly describe how the optimal control law can be computed by means of multiparametric programming.

...read moreread less

99 citations

Proceedings Article•DOI•

Efficient on-line computation of constrained optimal control

[...]

L. Borrelli, T. Baotic, Alberto Bemporad, T. Morari

04 Dec 2001

TL;DR: Two algorithms are presented that efficiently perform the online evaluation of the explicit optimal control law both in terms of storage demands and computational complexity.

...read moreread less

Abstract: For discrete-time linear time-invariant systems with constraints on inputs and outputs, the constrained finite-time optimal controller can be obtained explicitly as a piecewise-affine function of the initial state via multi-parametric programming. By exploiting the properties of the value function, we present two algorithms that efficiently perform the online evaluation of the explicit optimal control law both in terms of storage demands and computational complexity. The algorithms are particularly effective when used for model-predictive control (MPC) where an open-loop constrained finite-time optimal control problem has to be solved at each sampling time.

...read moreread less

98 citations

Collapse

Network Information

Performance

Metrics

6,698

Papers

155,793

Citations

No. of papers in the topic in previous years
Year	Papers
2023	261
2022	537
2021	369
2020	411
2019	348
2018	353

Bellman equation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics