scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Stochastic optimal control via forward and backward stochastic differential equations and importance sampling

01 Jan 2018-Automatica (Pergamon)-Vol. 87, Iss: 87, pp 159-165
TL;DR: This scheme is shown to be capable of learning the optimal control without requiring an initial guess and to enhance the efficiency of the proposed scheme when treating more complex nonlinear systems, an iterative algorithm based on Girsanov's theorem on the change of measure is derived.
About: This article is published in Automatica.The article was published on 2018-01-01 and is currently open access. It has received 59 citations till now. The article focuses on the topics: Stochastic partial differential equation & Stochastic control.
Citations
More filters
Proceedings Article
22 Jun 2019
TL;DR: A new methodology for decision-making under uncertainty using recent advancements in the areas of nonlinear stochastic optimal control theory, applied mathematics, and machine learning is proposed.
Abstract: In this paper we propose a new methodology for decision-making under uncertainty using recent advancements in the areas of nonlinear stochastic optimal control theory, applied mathematics, and machine learning. Grounded on the fundamental relation between certain nonlinear partial differential equations and forward-backward stochastic differential equations, we develop a control framework that is scalable and applicable to general classes of stochastic systems and decision-making problem formulations in robotics and autonomy. The proposed deep neural network architectures for stochastic control consist of recurrent and fully connected layers. The performance and scalability of the aforementioned algorithm are investigated in three non-linear systems in simulation with and without control constraints. We conclude with a discussion on future directions and their implications to robotics.

41 citations


Cites background or methods from "Stochastic optimal control via forw..."

  • ...Exarchos and Theodorou [14] developed an importance sampling based iterative scheme by approximating the conditional expectation at every time step using linear regression (see also [15] and [16])....

    [...]

  • ...The resulting algorithms overcome limitations of previous work in [24] by exploiting Girsanov’s theorem as in [14] to enable efficient exploration and by utilizing the benefits of recurrent neural networks in learning temporal dependencies....

    [...]

  • ...We refer the readers to proof of Theorem 1 in [14] for the full derivation of change of measure for FBSDEs....

    [...]

  • ...This was the basis of the approximation scheme and corresponding algorithm introduced in [14]....

    [...]

  • ...This problem was addressed in [14] through application of Girsanov’s theorem, which allows for the modification of the drift terms in the FBSDE system thereby facilitating efficient exploration through controlled forward dynamics....

    [...]

Journal ArticleDOI
TL;DR: A probabilistic representation of the solution to the nonlinear Hamilton–Jacobi–Bellman equation is obtained, expressed in the form of a system of decoupled FBSDEs, which can be solved by employing linear regression techniques.

32 citations

Posted Content
02 Sep 2020
TL;DR: This paper introduces a new formulation for stochastic optimal control and Stochastic dynamic optimization that ensures safety with respect to state and control constraints that is designed for safe trajectory optimization.
Abstract: This paper introduces a new formulation for stochastic optimal control and stochastic dynamic optimization that ensures safety with respect to state and control constraints. The proposed methodology brings together concepts such as Forward-Backward Stochastic Differential Equations, Stochastic Barrier Functions, Differentiable Convex Optimization and Deep Learning. Using the aforementioned concepts, a Neural Network architecture is designed for safe trajectory optimization in which learning can be performed in an end-to-end fashion. Simulations are performed on three systems to show the efficacy of the proposed methodology.

22 citations


Cites methods from "Stochastic optimal control via forw..."

  • ...Popular solution methods in literature include iLQG [12], Path-Integral Control [13], and the Forward-Backward Stochastic Differential Equations (FBSDEs) framework [14], which tackle the HJB through locally optimal solutions....

    [...]

Journal ArticleDOI
TL;DR: A sampling-based algorithm designed to solve various classes of stochastic differential games in light of the nonlinear version of the Feynman–Kac lemma, probabilistic representations of solutions to the non linear Hamilton–Jacobi–Isaacs equations that arise for each class are obtained.
Abstract: The aim of this work is to present a sampling-based algorithm designed to solve various classes of stochastic differential games. The foundation of the proposed approach lies in the formulation of the game solution in terms of a decoupled pair of forward and backward stochastic differential equations (FBSDEs). In light of the nonlinear version of the Feynman–Kac lemma, probabilistic representations of solutions to the nonlinear Hamilton–Jacobi–Isaacs equations that arise for each class are obtained. These representations are in form of decoupled systems of FBSDEs, which may be solved numerically.

18 citations


Cites background or methods from "Stochastic optimal control via forw..."

  • ...In this paper, we employ a scheme proposed in previous work by the authors [17], which capitalizes on the regularity present whenever systems of FBSDEs are linked to PDEs....

    [...]

  • ...In previous work by the authors [17], a scheme involving a drift term modification has been constructed through Girsanov’s theorem on the change of measure [30,48]....

    [...]

  • ...The formal proof which involves Girsanov’s theorem on the change of measure can be found in [17]....

    [...]

Journal ArticleDOI
12 Jun 2019-Chaos
TL;DR: An adaptive importance sampling scheme for the simulation of rare events when the underlying dynamics is given by diffusion is proposed, based on a Gibbs variational principle that is used to determine the optimal change of measure.
Abstract: We propose an adaptive importance sampling scheme for the simulation of rare events when the underlying dynamics is given by diffusion. The scheme is based on a Gibbs variational principle that is used to determine the optimal (i.e., zero-variance) change of measure and exploits the fact that the latter can be rephrased as a stochastic optimal control problem. The control problem can be solved by a stochastic approximation algorithm, using the Feynman–Kac representation of the associated dynamic programming equations, and we discuss numerical aspects for high-dimensional problems along with simple toy examples.

14 citations

References
More filters
Book
01 Jan 1987
TL;DR: In this paper, the authors present a characterization of continuous local martingales with respect to Brownian motion in terms of Markov properties, including the strong Markov property, and a generalized version of the Ito rule.
Abstract: 1 Martingales, Stopping Times, and Filtrations.- 1.1. Stochastic Processes and ?-Fields.- 1.2. Stopping Times.- 1.3. Continuous-Time Martingales.- A. Fundamental inequalities.- B. Convergence results.- C. The optional sampling theorem.- 1.4. The Doob-Meyer Decomposition.- 1.5. Continuous, Square-Integrable Martingales.- 1.6. Solutions to Selected Problems.- 1.7. Notes.- 2 Brownian Motion.- 2.1. Introduction.- 2.2. First Construction of Brownian Motion.- A. The consistency theorem.- B. The Kolmogorov-?entsov theorem.- 2.3. Second Construction of Brownian Motion.- 2.4. The SpaceC[0, ?), Weak Convergence, and Wiener Measure.- A. Weak convergence.- B. Tightness.- C. Convergence of finite-dimensional distributions.- D. The invariance principle and the Wiener measure.- 2.5. The Markov Property.- A. Brownian motion in several dimensions.- B. Markov processes and Markov families.- C. Equivalent formulations of the Markov property.- 2.6. The Strong Markov Property and the Reflection Principle.- A. The reflection principle.- B. Strong Markov processes and families.- C. The strong Markov property for Brownian motion.- 2.7. Brownian Filtrations.- A. Right-continuity of the augmented filtration for a strong Markov process.- B. A "universal" filtration.- C. The Blumenthal zero-one law.- 2.8. Computations Based on Passage Times.- A. Brownian motion and its running maximum.- B. Brownian motion on a half-line.- C. Brownian motion on a finite interval.- D. Distributions involving last exit times.- 2.9. The Brownian Sample Paths.- A. Elementary properties.- B. The zero set and the quadratic variation.- C. Local maxima and points of increase.- D. Nowhere differentiability.- E. Law of the iterated logarithm.- F. Modulus of continuity.- 2.10. Solutions to Selected Problems.- 2.11. Notes.- 3 Stochastic Integration.- 3.1. Introduction.- 3.2. Construction of the Stochastic Integral.- A. Simple processes and approximations.- B. Construction and elementary properties of the integral.- C. A characterization of the integral.- D. Integration with respect to continuous, local martingales.- 3.3. The Change-of-Variable Formula.- A. The Ito rule.- B. Martingale characterization of Brownian motion.- C. Bessel processes, questions of recurrence.- D. Martingale moment inequalities.- E. Supplementary exercises.- 3.4. Representations of Continuous Martingales in Terms of Brownian Motion.- A. Continuous local martingales as stochastic integrals with respect to Brownian motion.- B. Continuous local martingales as time-changed Brownian motions.- C. A theorem of F. B. Knight.- D. Brownian martingales as stochastic integrals.- E. Brownian functionals as stochastic integrals.- 3.5. The Girsanov Theorem.- A. The basic result.- B. Proof and ramifications.- C. Brownian motion with drift.- D. The Novikov condition.- 3.6. Local Time and a Generalized Ito Rule for Brownian Motion.- A. Definition of local time and the Tanaka formula.- B. The Trotter existence theorem.- C. Reflected Brownian motion and the Skorohod equation.- D. A generalized Ito rule for convex functions.- E. The Engelbert-Schmidt zero-one law.- 3.7. Local Time for Continuous Semimartingales.- 3.8. Solutions to Selected Problems.- 3.9. Notes.- 4 Brownian Motion and Partial Differential Equations.- 4.1. Introduction.- 4.2. Harmonic Functions and the Dirichlet Problem.- A. The mean-value property.- B. The Dirichlet problem.- C. Conditions for regularity.- D. Integral formulas of Poisson.- E. Supplementary exercises.- 4.3. The One-Dimensional Heat Equation.- A. The Tychonoff uniqueness theorem.- B. Nonnegative solutions of the heat equation.- C. Boundary crossing probabilities for Brownian motion.- D. Mixed initial/boundary value problems.- 4.4. The Formulas of Feynman and Kac.- A. The multidimensional formula.- B. The one-dimensional formula.- 4.5. Solutions to selected problems.- 4.6. Notes.- 5 Stochastic Differential Equations.- 5.1. Introduction.- 5.2. Strong Solutions.- A. Definitions.- B. The Ito theory.- C. Comparison results and other refinements.- D. Approximations of stochastic differential equations.- E. Supplementary exercises.- 5.3. Weak Solutions.- A. Two notions of uniqueness.- B. Weak solutions by means of the Girsanov theorem.- C. A digression on regular conditional probabilities.- D. Results of Yamada and Watanabe on weak and strong solutions.- 5.4. The Martingale Problem of Stroock and Varadhan.- A. Some fundamental martingales.- B. Weak solutions and martingale problems.- C. Well-posedness and the strong Markov property.- D. Questions of existence.- E. Questions of uniqueness.- F. Supplementary exercises.- 5.5. A Study of the One-Dimensional Case.- A. The method of time change.- B. The method of removal of drift.- C. Feller's test for explosions.- D. Supplementary exercises.- 5.6. Linear Equations.- A. Gauss-Markov processes.- B. Brownian bridge.- C. The general, one-dimensional, linear equation.- D. Supplementary exercises.- 5.7. Connections with Partial Differential Equations.- A. The Dirichlet problem.- B. The Cauchy problem and a Feynman-Kac representation.- C. Supplementary exercises.- 5.8. Applications to Economics.- A. Portfolio and consumption processes.- B. Option pricing.- C. Optimal consumption and investment (general theory).- D. Optimal consumption and investment (constant coefficients).- 5.9. Solutions to Selected Problems.- 5.10. Notes.- 6 P. Levy's Theory of Brownian Local Time.- 6.1. Introduction.- 6.2. Alternate Representations of Brownian Local Time.- A. The process of passage times.- B. Poisson random measures.- C. Subordinators.- D. The process of passage times revisited.- E. The excursion and downcrossing representations of local time.- 6.3. Two Independent Reflected Brownian Motions.- A. The positive and negative parts of a Brownian motion.- B. The first formula of D. Williams.- C. The joint density of (W(t), L(t), ? +(t)).- 6.4. Elastic Brownian Motion.- A. The Feynman-Kac formulas for elastic Brownian motion.- B. The Ray-Knight description of local time.- C. The second formula of D. Williams.- 6.5. An Application: Transition Probabilities of Brownian Motion with Two-Valued Drift.- 6.6. Solutions to Selected Problems.- 6.7. Notes.

8,639 citations

Book
01 Jun 1992
TL;DR: In this article, a time-discrete approximation of deterministic Differential Equations is proposed for the stochastic calculus, based on Strong Taylor Expansions and Strong Taylor Approximations.
Abstract: 1 Probability and Statistics- 2 Probability and Stochastic Processes- 3 Ito Stochastic Calculus- 4 Stochastic Differential Equations- 5 Stochastic Taylor Expansions- 6 Modelling with Stochastic Differential Equations- 7 Applications of Stochastic Differential Equations- 8 Time Discrete Approximation of Deterministic Differential Equations- 9 Introduction to Stochastic Time Discrete Approximation- 10 Strong Taylor Approximations- 11 Explicit Strong Approximations- 12 Implicit Strong Approximations- 13 Selected Applications of Strong Approximations- 14 Weak Taylor Approximations- 15 Explicit and Implicit Weak Approximations- 16 Variance Reduction Methods- 17 Selected Applications of Weak Approximations- Solutions of Exercises- Bibliographical Notes

6,284 citations


"Stochastic optimal control via forw..." refers background or methods in this paper

  • ...The simplest discretized scheme for the forward process is the Euler scheme, which is also called Euler–Maruyama scheme (Kloeden & Platen, 1999): Xi+1 ≈ Xi + b(ti, Xi)∆ti +Σ(ti, Xi)∆Wi, (18) for i = 0, . . . ,N − 1 and X0 = x....

    [...]

  • ...Several alternative, higher order schemes exist that can be selected in lieu of the Euler scheme (Kloeden & Platen, 1999)....

    [...]

Journal ArticleDOI
TL;DR: Some Mathematical Preliminaries as mentioned in this paper include the Ito Integrals, Ito Formula and the Martingale Representation Theorem, and Stochastic Differential Equations.
Abstract: Some Mathematical Preliminaries.- Ito Integrals.- The Ito Formula and the Martingale Representation Theorem.- Stochastic Differential Equations.- The Filtering Problem.- Diffusions: Basic Properties.- Other Topics in Diffusion Theory.- Applications to Boundary Value Problems.- Application to Optimal Stopping.- Application to Stochastic Control.- Application to Mathematical Finance.

4,705 citations

Book
18 Dec 1992
TL;DR: In this paper, an introduction to optimal stochastic control for continuous time Markov processes and to the theory of viscosity solutions is given, as well as a concise introduction to two-controller, zero-sum differential games.
Abstract: This book is intended as an introduction to optimal stochastic control for continuous time Markov processes and to the theory of viscosity solutions. The authors approach stochastic control problems by the method of dynamic programming. The text provides an introduction to dynamic programming for deterministic optimal control problems, as well as to the corresponding theory of viscosity solutions. A new Chapter X gives an introduction to the role of stochastic optimal control in portfolio optimization and in pricing derivatives in incomplete markets. Chapter VI of the First Edition has been completely rewritten, to emphasize the relationships between logarithmic transformations and risk sensitivity. A new Chapter XI gives a concise introduction to two-controller, zero-sum differential games. Also covered are controlled Markov diffusions and viscosity solutions of Hamilton-Jacobi-Bellman equations. The authors have tried, through illustrative examples and selective material, to connect stochastic control theory with other mathematical areas (e.g. large deviations theory) and with applications to engineering, physics, management, and finance. In this Second Edition, new material on applications to mathematical finance has been added. Concise introductions to risk-sensitive control theory, nonlinear H-infinity control and differential games are also included.

3,885 citations


"Stochastic optimal control via forw..." refers methods in this paper

  • ...Specifically, by applying the stochastic version of Bellman’s principle of optimality, it is shown (Fleming & Soner, 2006; Yong & Zhou, 1999) that if the Value function is in C1,2([0, T ]×Rn), then it is a solution to the following terminal value problem of a second-order partial differential…...

    [...]

  • ...Following the same procedure as in Section 2, and under Assumption 1, the resulting HJB PDE is Fleming & Soner (2006)⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ vt + 1 2 tr(vxxΣ(t, x)Σ⊤(t, x)) + v⊤x b(t, x) + h(t, x,Σ⊤(t, x)vx) = 0, (t, x) ∈ [0, T ) × G, v(T , x) = g(x), x ∈ G, v(t, x) = ψ(t, x), (t, x) ∈ [0, T ) × ∂G (15) in…...

    [...]

Journal ArticleDOI
TL;DR: In this paper, a new approach for approximating the value of American options by simulation is presented, using least squares to estimate the conditional expected payoff to the optionholder from continuation.
Abstract: This article presents a simple yet powerful new approach for approximating the value of American options by simulation. The key to this approach is the use of least squares to estimate the conditional expected payoff to the optionholder from continuation. This makes this approach readily applicable in path-dependent and multifactor situations where traditional finite difference techniques cannot be used. We illustrate this technique with several realistic examples including valuing an option when the underlying asset follows a jump-diffusion process and valuing an American swaption in a 20-factor string model of the term structure.

2,612 citations