Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators

doi:10.1109/TSMCB.2011.2166384

Journal ArticleDOI

Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators

Qinmin Yang, +1 more

- Vol. 42, Iss: 2, pp 377-390

Chats0

TLDR

In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances.

Abstract:

In this paper, reinforcement learning state- and output-feedback-based adaptive critic controller designs are proposed by using the online approximators (OLAs) for a general multi-input and multioutput affine unknown nonlinear discretetime systems in the presence of bounded disturbances. The proposed controller design has two entities, an action network that is designed to produce optimal signal and a critic network that evaluates the performance of the action network. The critic estimates the cost-to-go function which is tuned online using recursive equations derived from heuristic dynamic programming. Here, neural networks (NNs) are used both for the action and critic whereas any OLAs, such as radial basis functions, splines, fuzzy logic, etc., can be utilized. For the output-feedback counterpart, an additional NN is designated as the observer to estimate the unavailable system states, and thus, separation principle is not required. The NN weight tuning laws for the controller schemes are also derived while ensuring uniform ultimate boundedness of the closed-loop system using Lyapunov theory. Finally, the effectiveness of the two controllers is tested in simulation on a pendulum balancing system and a two-link robotic arm system.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Fuzzy Approximation-Based Adaptive Backstepping Optimal Control for a Class of Nonlinear Discrete-Time Systems With Dead-Zone

Yan-Jun Liu, +3 more

- 01 Feb 2016 -

IEEE Transactions on Fuzzy Systems

TL;DR: An adaptive fuzzy optimal control design is addressed for a class of unknown nonlinear discrete-time systems that contain unknown functions and nonsymmetric dead-zone and can be proved based on the difference Lyapunov function method.

...read moreread less

Journal ArticleDOI

Adaptive Fault-Tolerant Tracking Control for Discrete-Time Multiagent Systems via Reinforcement Learning Algorithm

Hongyi Li, +2 more

- 17 Feb 2021 -

IEEE Transactions on Systems, Man, and C...

TL;DR: This article investigates the adaptive fault-tolerant tracking control problem for a class of discrete-time multiagent systems via a reinforcement learning algorithm and proves that all signals of the closed-loop system are semiglobally uniformly ultimately bounded.

...read moreread less

Journal ArticleDOI

Off-Policy Reinforcement Learning for $ H_\infty $ Control Design

Biao Luo, +2 more

- 01 Jan 2015 -

IEEE Transactions on Systems, Man, and C...

TL;DR: An off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved.

...read moreread less

Journal ArticleDOI

Reinforcement Learning Output Feedback NN Control Using Deterministic Learning Technique

Bin Xu, +2 more

- 01 Mar 2014 -

IEEE Transactions on Neural Networks

TL;DR: A novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems and a deterministic learning technique has been employed to guarantee that the partial persistent excitation condition of internal states is satisfied during tracking control to a periodic reference orbit.

...read moreread less

Journal ArticleDOI

Adaptive Dynamic Programming for Control: A Survey and Recent Advances

Derong Liu, +4 more

- 01 Jan 2021 -

IEEE Transactions on Systems, Man, and C...

TL;DR: In this article, the adaptive dynamic programming (ADP) with applications in control is reviewed, and the use of ADP to solve game problems, mainly nonzero-sum game problems is elaborated.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Book

Dynamic Programming and Optimal Control

Dimitri P. Bertsekas

TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.

...read moreread less

Journal ArticleDOI

Technical Note : \cal Q -Learning

Chris Watkins, +1 more

- 01 May 1992 -

Machine Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

...read moreread less

Book

Nonlinear and adaptive control design

Miroslav Krstic, +2 more

TL;DR: In this paper, the focus is on adaptive nonlinear control results introduced with the new recursive design methodology -adaptive backstepping, and basic tools for nonadaptive BackStepping design with state and output feedbacks.

...read moreread less

Journal ArticleDOI

Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton

- 01 Aug 1988 -

Machine Learning

TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.

...read moreread less