scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Neural Network-Based Solutions for Stochastic Optimal Control Using Path Integrals

TL;DR: A novel adaptive critic approach using the PI formulation is proposed for solving stochastic optimal control problems and the potential of the algorithm is demonstrated through simulation results from a couple of benchmark problems.
Abstract: In this paper, an offline approximate dynamic programming approach using neural networks is proposed for solving a class of finite horizon stochastic optimal control problems. There are two approaches available in the literature, one based on stochastic maximum principle (SMP) formalism and the other based on solving the stochastic Hamilton–Jacobi–Bellman (HJB) equation. However, in the presence of noise, the SMP formalism becomes complex and results in having to solve a couple of backward stochastic differential equations. Hence, current solution methodologies typically ignore the noise effect. On the other hand, the inclusion of noise in the HJB framework is very straightforward. Furthermore, the stochastic HJB equation of a control-affine nonlinear stochastic system with a quadratic control cost function and an arbitrary state cost function can be formulated as a path integral (PI) problem. However, due to curse of dimensionality, it might not be possible to utilize the PI formulation for obtaining comprehensive solutions over the entire operating domain. A neural network structure called the adaptive critic design paradigm is used to effectively handle this difficulty. In this paper, a novel adaptive critic approach using the PI formulation is proposed for solving stochastic optimal control problems. The potential of the algorithm is demonstrated through simulation results from a couple of benchmark problems.
Citations
More filters
Book ChapterDOI
01 Jan 1998
TL;DR: In this paper, the authors explore questions of existence and uniqueness for solutions to stochastic differential equations and offer a study of their properties, using diffusion processes as a model of a Markov process with continuous sample paths.
Abstract: We explore in this chapter questions of existence and uniqueness for solutions to stochastic differential equations and offer a study of their properties. This endeavor is really a study of diffusion processes. Loosely speaking, the term diffusion is attributed to a Markov process which has continuous sample paths and can be characterized in terms of its infinitesimal generator.

2,446 citations

Journal ArticleDOI
TL;DR: An adaptive neural network (NN) control problem is investigated for discrete-time nonlinear systems with input saturation, and a multigradient recursive reinforcement learning scheme is proposed, which utilizes both the current gradient and the past gradients.
Abstract: In this paper, an adaptive neural network (NN) control problem is investigated for discrete-time nonlinear systems with input saturation. Radial-basis-function (RBF) NNs, including critic NNs and action NNs, are employed to approximate the utility functions and system uncertainties, respectively. In the previous works, a gradient descent scheme is applied to update weight vectors, which may lead to local optimal problem. To circumvent this problem, a multigradient recursive (MGR) reinforcement learning scheme is proposed, which utilizes both the current gradient and the past gradients. As a consequence, the MGR scheme not only eliminates the local optimal problem but also guarantees faster convergence rate than the gradient descent scheme. Moreover, the constraint of actuator input saturation is considered. The closed-loop system stability is developed by using the Lyapunov stability theory, and it is proved that all the signals in the closed-loop system are semiglobal uniformly ultimately bounded (SGUUB). Finally, the effectiveness of the proposed approach is further validated via some simulation results.

146 citations

Journal ArticleDOI
TL;DR: Simulation demonstrates that the optimized stochastic approach can achieve the desired control objective and can remove the assumption of persistence excitation, which is required for most RL based adaptive optimal control.

42 citations

Journal ArticleDOI
TL;DR: In this paper, approximate optimal distributed control schemes for a class of nonlinear interconnected systems with strong interconnections are presented using continuous and event-sampled feedback information and the proposed event-based distributed control scheme for linear interconnected systems is discussed.
Abstract: In this paper, approximate optimal distributed control schemes for a class of nonlinear interconnected systems with strong interconnections are presented using continuous and event-sampled feedback information. The optimal control design is formulated as an $N$ -player nonzero-sum game where the control policies of the subsystems act as players. An approximate Nash equilibrium solution to the game, which is the solution to the coupled Hamilton–Jacobi equation, is obtained using the approximate dynamic programming-based approach. A critic neural network (NN) at each subsystem is utilized to approximate the Nash solution and novel event-sampling conditions, that are decentralized, are designed to asynchronously orchestrate the sampling and transmission of state vector at each subsystem. To ensure the local ultimate boundedness of the closed-loop system state and NN parameter estimation errors, a hybrid-learning scheme is introduced and the stability is guaranteed using Lyapunov-based stability analysis. Finally, implementation of the proposed event-based distributed control scheme for linear interconnected systems is discussed. For completeness, Zeno-free behavior of the event-sampled system is shown analytically and a numerical example is included to support the analytical results.

37 citations


Cites background from "Neural Network-Based Solutions for ..."

  • ...ming (ADP) [15]–[17]/reinforcement learning (RL) [18], [19] based approximate solutions are sought for optimal control of...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A generalization of the sampling method introduced by Metropolis et al. as mentioned in this paper is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates.
Abstract: SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates. Examples of the methods, including the generation of random orthogonal matrices and potential applications of the methods to numerical problems arising in statistics, are discussed. For numerical problems in a large number of dimensions, Monte Carlo methods are often more efficient than conventional numerical methods. However, implementation of the Monte Carlo methods requires sampling from high dimensional probability distributions and this may be very difficult and expensive in analysis and computer time. General methods for sampling from, or estimating expectations with respect to, such distributions are as follows. (i) If possible, factorize the distribution into the product of one-dimensional conditional distributions from which samples may be obtained. (ii) Use importance sampling, which may also be used for variance reduction. That is, in order to evaluate the integral J = X) p(x)dx = Ev(f), where p(x) is a probability density function, instead of obtaining independent samples XI, ..., Xv from p(x) and using the estimate J, = Zf(xi)/N, we instead obtain the sample from a distribution with density q(x) and use the estimate J2 = Y{f(xj)p(x1)}/{q(xj)N}. This may be advantageous if it is easier to sample from q(x) thanp(x), but it is a difficult method to use in a large number of dimensions, since the values of the weights w(xi) = p(x1)/q(xj) for reasonable values of N may all be extremely small, or a few may be extremely large. In estimating the probability of an event A, however, these difficulties may not be as serious since the only values of w(x) which are important are those for which x -A. Since the methods proposed by Trotter & Tukey (1956) for the estimation of conditional expectations require the use of importance sampling, the same difficulties may be encountered in their use. (iii) Use a simulation technique; that is, if it is difficult to sample directly from p(x) or if p(x) is unknown, sample from some distribution q(y) and obtain the sample x values as some function of the corresponding y values. If we want samples from the conditional dis

14,965 citations


"Neural Network-Based Solutions for ..." refers methods in this paper

  • ...In this paper, the Metropolis–Hastings sampling scheme [35], [36] is employed to sample trajectories as per the probability distribution given in (21)....

    [...]

Book
01 May 1995
TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.
Abstract: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization. The treatment focuses on basic unifying themes, and conceptual foundations. It illustrates the versatility, power, and generality of the method with many examples and applications from engineering, operations research, and other fields. It also addresses extensively the practical application of the methodology, possibly through the use of approximations, and provides an extensive treatment of the far-reaching methodology of Neuro-Dynamic Programming/Reinforcement Learning.

10,834 citations


"Neural Network-Based Solutions for ..." refers methods in this paper

  • ...Most RL techniques that are employed for solving optimal control problems are based on value iteration [8], policy iteration [8], or Q-learning [10]....

    [...]

01 Jan 1989

4,916 citations


"Neural Network-Based Solutions for ..." refers methods in this paper

  • ...Most RL techniques that are employed for solving optimal control problems are based on value iteration [8], policy iteration [8], or Q-learning [10]....

    [...]

  • ...Q-learning algorithms are model-free learning algorithms that do not require an explicit system model for solving the optimal control problem....

    [...]

Journal ArticleDOI
01 Jan 1990
TL;DR: This paper first reviews basic backpropagation, a simple method which is now being widely used in areas like pattern recognition and fault diagnosis, and describes further extensions of this method, to deal with systems other than neural networks, systems involving simultaneous equations or true recurrent networks, and other practical issues which arise with this method.
Abstract: Basic backpropagation, which is a simple method now being widely used in areas like pattern recognition and fault diagnosis, is reviewed. The basic equations for backpropagation through time, and applications to areas like pattern recognition involving dynamic systems, systems identification, and control are discussed. Further extensions of this method, to deal with systems other than neural networks, systems involving simultaneous equations, or true recurrent networks, and other practical issues arising with the method are described. Pseudocode is provided to clarify the algorithms. The chain rule for ordered derivatives-the theorem which underlies backpropagation-is briefly discussed. The focus is on designing a simpler version of backpropagation which can be translated into computer code and applied directly by neutral network users. >

4,572 citations


"Neural Network-Based Solutions for ..." refers methods in this paper

  • ...For discrete-time systems, Werbos [5], [6] developed a family of ADP algorithms that effectively uses the actor–critic architecture for solving the optimal control problem....

    [...]

01 Jan 2009
TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
Abstract: From the Publisher: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.

4,251 citations