# Neural Network-Based Solutions for Stochastic Optimal Control Using Path Integrals

TL;DR: A novel adaptive critic approach using the PI formulation is proposed for solving stochastic optimal control problems and the potential of the algorithm is demonstrated through simulation results from a couple of benchmark problems.

Abstract: In this paper, an offline approximate dynamic programming approach using neural networks is proposed for solving a class of finite horizon stochastic optimal control problems. There are two approaches available in the literature, one based on stochastic maximum principle (SMP) formalism and the other based on solving the stochastic Hamilton–Jacobi–Bellman (HJB) equation. However, in the presence of noise, the SMP formalism becomes complex and results in having to solve a couple of backward stochastic differential equations. Hence, current solution methodologies typically ignore the noise effect. On the other hand, the inclusion of noise in the HJB framework is very straightforward. Furthermore, the stochastic HJB equation of a control-affine nonlinear stochastic system with a quadratic control cost function and an arbitrary state cost function can be formulated as a path integral (PI) problem. However, due to curse of dimensionality, it might not be possible to utilize the PI formulation for obtaining comprehensive solutions over the entire operating domain. A neural network structure called the adaptive critic design paradigm is used to effectively handle this difficulty. In this paper, a novel adaptive critic approach using the PI formulation is proposed for solving stochastic optimal control problems. The potential of the algorithm is demonstrated through simulation results from a couple of benchmark problems.

##### Citations

More filters

••

[...]

TL;DR: In this paper, the authors explore questions of existence and uniqueness for solutions to stochastic differential equations and offer a study of their properties, using diffusion processes as a model of a Markov process with continuous sample paths.

Abstract: We explore in this chapter questions of existence and uniqueness for solutions to stochastic differential equations and offer a study of their properties. This endeavor is really a study of diffusion processes. Loosely speaking, the term diffusion is attributed to a Markov process which has continuous sample paths and can be characterized in terms of its infinitesimal generator.

2,022 citations

••

[...]

TL;DR: An adaptive neural network (NN) control problem is investigated for discrete-time nonlinear systems with input saturation, and a multigradient recursive reinforcement learning scheme is proposed, which utilizes both the current gradient and the past gradients.

Abstract: In this paper, an adaptive neural network (NN) control problem is investigated for discrete-time nonlinear systems with input saturation. Radial-basis-function (RBF) NNs, including critic NNs and action NNs, are employed to approximate the utility functions and system uncertainties, respectively. In the previous works, a gradient descent scheme is applied to update weight vectors, which may lead to local optimal problem. To circumvent this problem, a multigradient recursive (MGR) reinforcement learning scheme is proposed, which utilizes both the current gradient and the past gradients. As a consequence, the MGR scheme not only eliminates the local optimal problem but also guarantees faster convergence rate than the gradient descent scheme. Moreover, the constraint of actuator input saturation is considered. The closed-loop system stability is developed by using the Lyapunov stability theory, and it is proved that all the signals in the closed-loop system are semiglobal uniformly ultimately bounded (SGUUB). Finally, the effectiveness of the proposed approach is further validated via some simulation results.

76 citations

••

[...]

TL;DR: Simulation demonstrates that the optimized stochastic approach can achieve the desired control objective and can remove the assumption of persistence excitation, which is required for most RL based adaptive optimal control.

Abstract: In this work, a reinforcement learning (RL) based optimized control approach is developed by implementing tracking control for a class of stochastic nonlinear systems with unknown dynamic. The RL is constructed in identifier-actor-critic architecture, where the identifier aims for determining the stochastic system in mean square, the actor aims for executing the control action and the critic aims for evaluating the control performance. In almost all of the published RL-based optimal control, since both actor and critic updating laws are yielded on the basis of implementing gradient descent method to the square of Bellman residual error, these methods are very complex and are performed difficultly. By contrast, the proposed optimized control is obviously simple because the RL algorithm is derived based on the negative gradient of a simple positive function. Furthermore, the proposed approach can remove the assumption of persistence excitation, which is required for most RL based adaptive optimal control. Finally, based on the adaptive identifier, the system stability is proven by using the quadratic Lyapunov function rather than quartic Lyapunov function, which is usually required for most stochastic systems. Simulation further demonstrates that the optimized stochastic approach can achieve the desired control objective.

14 citations

••

[...]

TL;DR: In this paper, approximate optimal distributed control schemes for a class of nonlinear interconnected systems with strong interconnections are presented using continuous and event-sampled feedback information and the proposed event-based distributed control scheme for linear interconnected systems is discussed.

Abstract: In this paper, approximate optimal distributed control schemes for a class of nonlinear interconnected systems with strong interconnections are presented using continuous and event-sampled feedback information. The optimal control design is formulated as an $N$ -player nonzero-sum game where the control policies of the subsystems act as players. An approximate Nash equilibrium solution to the game, which is the solution to the coupled Hamilton–Jacobi equation, is obtained using the approximate dynamic programming-based approach. A critic neural network (NN) at each subsystem is utilized to approximate the Nash solution and novel event-sampling conditions, that are decentralized, are designed to asynchronously orchestrate the sampling and transmission of state vector at each subsystem. To ensure the local ultimate boundedness of the closed-loop system state and NN parameter estimation errors, a hybrid-learning scheme is introduced and the stability is guaranteed using Lyapunov-based stability analysis. Finally, implementation of the proposed event-based distributed control scheme for linear interconnected systems is discussed. For completeness, Zeno-free behavior of the event-sampled system is shown analytically and a numerical example is included to support the analytical results.

14 citations

### Cites background from "Neural Network-Based Solutions for ..."

[...]

##### References

More filters

••

[...]

TL;DR: A generalization of the sampling method introduced by Metropolis et al. as mentioned in this paper is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates.

Abstract: SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates. Examples of the methods, including the generation of random orthogonal matrices and potential applications of the methods to numerical problems arising in statistics, are discussed. For numerical problems in a large number of dimensions, Monte Carlo methods are often more efficient than conventional numerical methods. However, implementation of the Monte Carlo methods requires sampling from high dimensional probability distributions and this may be very difficult and expensive in analysis and computer time. General methods for sampling from, or estimating expectations with respect to, such distributions are as follows. (i) If possible, factorize the distribution into the product of one-dimensional conditional distributions from which samples may be obtained. (ii) Use importance sampling, which may also be used for variance reduction. That is, in order to evaluate the integral J = X) p(x)dx = Ev(f), where p(x) is a probability density function, instead of obtaining independent samples XI, ..., Xv from p(x) and using the estimate J, = Zf(xi)/N, we instead obtain the sample from a distribution with density q(x) and use the estimate J2 = Y{f(xj)p(x1)}/{q(xj)N}. This may be advantageous if it is easier to sample from q(x) thanp(x), but it is a difficult method to use in a large number of dimensions, since the values of the weights w(xi) = p(x1)/q(xj) for reasonable values of N may all be extremely small, or a few may be extremely large. In estimating the probability of an event A, however, these difficulties may not be as serious since the only values of w(x) which are important are those for which x -A. Since the methods proposed by Trotter & Tukey (1956) for the estimation of conditional expectations require the use of importance sampling, the same difficulties may be encountered in their use. (iii) Use a simulation technique; that is, if it is difficult to sample directly from p(x) or if p(x) is unknown, sample from some distribution q(y) and obtain the sample x values as some function of the corresponding y values. If we want samples from the conditional dis

13,481 citations

### "Neural Network-Based Solutions for ..." refers methods in this paper

[...]

•

[...]

TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.

Abstract: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization. The treatment focuses on basic unifying themes, and conceptual foundations. It illustrates the versatility, power, and generality of the method with many examples and applications from engineering, operations research, and other fields. It also addresses extensively the practical application of the methodology, possibly through the use of approximations, and provides an extensive treatment of the far-reaching methodology of Neuro-Dynamic Programming/Reinforcement Learning.

10,491 citations

### "Neural Network-Based Solutions for ..." refers methods in this paper

[...]

••

[...]

TL;DR: In this paper, the authors return to the possible solutions X t (ω) of the stochastic differential equation where W t is 1-dimensional "white noise" and where X t satisfies the integral equation in differential form.

Abstract: We now return to the possible solutions X t (ω) of the stochastic differential equation
(5.1)
where W t is 1-dimensional “white noise”. As discussed in Chapter III the Ito interpretation of (5.1) is that X t satisfies the stochastic integral equation
or in differential form
(5.2)
.

4,115 citations

[...]

01 Jan 2009

TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.

Abstract: From the Publisher:
This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.

4,050 citations

##### Related Papers (5)

[...]

[...]