Model-free Q-learning designs for discrete-time zero-sum games with application to H-infinity control

doi:10.1016/J.AUTOMATICA.2006.09.019

Journal ArticleDOI

Model-free Q-learning designs for discrete-time zero-sum games with application to H-infinity control

- Vol. 43, Iss: 3, pp 473-481

TLDR

It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the (GARE) of the linear quadratic discrete-time zero-sum game.

Abstract:

In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices. The idea is to solve for an action dependent value function Q(x,u,w) of the zero-sum game instead of solving for the state dependent value function V(x) which satisfies a corresponding game algebraic Riccati equation (GARE). Since the state and actions spaces are continuous, two action networks and one critic network are used that are adaptively tuned in forward time using adaptive critic methods. The result is a Q-learning approximate dynamic programming model-free approach that solves the zero-sum game forward in time. It is shown that the critic converges to the game value function and the action networks converge to the Nash equilibrium of the game. Proofs of convergence of the algorithm are shown. It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the (GARE) of the linear quadratic discrete-time zero-sum game. The effectiveness of this method is shown by performing an H-infinity control autopilot design for an F-16 aircraft.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Reinforcement learning and adaptive dynamic programming for feedback control

Frank L. Lewis, +1 more

- 01 Sep 2009 -

IEEE Circuits and Systems Magazine

TL;DR: This work describes mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming that give insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior.

...read moreread less

Journal ArticleDOI

Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

A. Al-Tamimi, +2 more

TL;DR: It is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control.

...read moreread less

Journal ArticleDOI

From model-based control to data-driven control: Survey, classification and perspective

Zhongsheng Hou, +1 more

- 01 Jun 2013 -

Information Sciences

TL;DR: This paper is a brief survey on the existing problems and challenges inherent in model-based control (MBC) theory, and some important issues in the analysis and design of data-driven control (DDC) methods are here reviewed and addressed.

...read moreread less

Journal ArticleDOI

Adaptive Dynamic Programming: An Introduction

Fei-Yue Wang, +2 more

- 01 May 2009 -

IEEE Computational Intelligence Magazine

TL;DR: Some recent research trends within the field of adaptive/approximate dynamic programming (ADP), including the variations on the structure of ADP schemes, the development of ADPs algorithms and applications, and many recent papers have provided convergence analysis associated with the algorithms developed.

...read moreread less

Journal ArticleDOI

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics

Yu Jiang, +1 more

- 01 Oct 2012 -

Automatica

TL;DR: This paper presents a novel policy iteration approach for finding online adaptive optimal controllers for continuous-time linear systems with completely unknown system dynamics, using the approximate/adaptive dynamic programming technique to iteratively solve the algebraic Riccati equation using the online information of state and input.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Learning from delayed rewards

Chris Watkins

Book

Dynamic Noncooperative Game Theory

Tamer Basar, +1 more

TL;DR: In this paper, the authors present a general formulation of non-cooperative finite games: N-Person nonzero-sum games, Pursuit-Evasion games, and Stackelberg Equilibria of infinite dynamic games.

...read moreread less

Neuro-Dynamic Programming.

Dimitri P. Bertsekas

TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.

...read moreread less

Book

Neuro-dynamic programming

Dimitri P. Bertsekas, +1 more

TL;DR: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.

...read moreread less

Journal ArticleDOI

Neuronlike adaptive elements that can solve difficult learning control problems

Andrew G. Barto, +2 more

TL;DR: In this article, a system consisting of two neuron-like adaptive elements can solve a difficult learning control problem, where the task is to balance a pole that is hinged to a movable cart by applying forces to the cart base.

...read moreread less

Collapse

Related Papers (5)

Reinforcement learning and adaptive dynamic programming for feedback control

Frank L. Lewis, +1 more

- 01 Sep 2009 -

IEEE Circuits and Systems Magazine

Automatica

Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach

Murad Abu-Khalaf, +1 more

- 01 May 2005 -

Automatica

Model-free Q-learning designs for discrete-time zero-sum games with application to H-infinity control

Citations

Reinforcement learning and adaptive dynamic programming for feedback control

Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

From model-based control to data-driven control: Survey, classification and perspective

Adaptive Dynamic Programming: An Introduction

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics

References

Learning from delayed rewards

Dynamic Noncooperative Game Theory

Neuro-Dynamic Programming.

Neuro-dynamic programming

Neuronlike adaptive elements that can solve difficult learning control problems

Related Papers (5)

Reinforcement learning and adaptive dynamic programming for feedback control

Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

Reinforcement Learning: An Introduction

Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem

Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach