Policy-Iteration-Based Learning for Nonlinear Player Game Systems With Constrained Inputs

doi:10.1109/TSMC.2019.2962629

Journal ArticleDOI

Policy-Iteration-Based Learning for Nonlinear Player Game Systems With Constrained Inputs

Chaoxu Mu, +2 more

- 01 Oct 2021 -

IEEE Transactions on Systems, Man, and C...

- Vol. 51, Iss: 10, pp 6488-6502

Chats0

TLDR

An adaptive learning algorithm is developed based on policy iteration technique to approximately obtain the Nash equilibrium using real-time data to solve the optimal control problem for nonlinear nonzero-sum differential game in the environment of no initial admissible policies.

Abstract:

This article investigates the optimal control problem for nonlinear nonzero-sum differential game in the environment of no initial admissible policies while considering the control constraint. An adaptive learning algorithm is thus developed based on policy iteration technique to approximately obtain the Nash equilibrium using real-time data. A two-player continuous-time system is used to present this approximate mechanism, which is implemented as a critic–actor architecture for every player. The constraint is incorporated into this optimization by introducing the nonquadratic value function, and the associated constrained Hamilton–Jacobi equation is derived. The critic neural network (NN) and actor NN are utilized to learn the value function and the optimal control policy, respectively, in the light of novel weight tuning laws. In order to tackle the stability during the learning phase, two stable operators are designed for two actors. The proposed algorithm is proved to be convergent as a Newton’s iteration, and the stability of this closed-loop system is also ensured by Lyapunov analysis. Finally, two simulation examples demonstrate the effectiveness of the proposed learning scheme by considering different constraint scenes.

Policy-Iteration-Based Learning for Nonlinear Player Game Systems With Constrained Inputs

Citations

Compensator-critic structure-based neuro-optimal control of modular robot manipulators with uncertain environmental contacts using non-zero-sum games

Optimal Synchronization Control of Heterogeneous Asymmetric Input-Constrained Unknown Nonlinear MASs via Reinforcement Learning

Event-triggered design for discrete-time nonlinear systems with control constraints

Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system.

Learning‐based control for discrete‐time constrained nonzero‐sum games

References

Approximate dynamic programming for real-time control and neural modeling

Nonzero-sum differential games

Adaptive Neural Impedance Control of a Robotic Manipulator With Input Saturation

Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization

Online learning control by association and reinforcement

Related Papers (5)

Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations

Approximate Nash Solutions for Multiplayer Mixed-Zero-Sum Game With Reinforcement Learning

Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games

Policy Iteration Q -Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems

Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics