Home
/
Authors
/
Murad Abu-Khalaf

Author

Murad Abu-Khalaf

Other affiliations: University of Texas at Arlington, MathWorks

Bio: Murad Abu-Khalaf is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Optimal control & Nonlinear system. The author has an hindex of 18, co-authored 34 publications receiving 3401 citations. Previous affiliations of Murad Abu-Khalaf include University of Texas at Arlington & MathWorks.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach

[...]

Murad Abu-Khalaf¹, Frank L. Lewis¹•Institutions (1)

University of Texas at Arlington¹

01 May 2005-Automatica

TL;DR: It is shown that the constrained optimal control law has the largest region of asymptotic stability (RAS) and the result is a nearly optimal constrained state feedback controller that has been tuned a priori off-line.

...read moreread less

1,045 citations

Journal Article•DOI•

Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

[...]

A. Al-Tamimi¹, Frank L. Lewis², Murad Abu-Khalaf³•Institutions (3)

Hashemite University¹, University of Texas at Arlington², MathWorks³

01 Aug 2008

TL;DR: It is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control.

...read moreread less

Abstract: Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control. It is assumed that, at each iteration, the value and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to approximate the value function, whereas an action network is used to approximate the optimal control policy. It is stressed that this approach allows the implementation of HDP without knowing the internal dynamics of the system. The exact solution assumption holds for some classes of nonlinear systems and, specifically, in the specific case of the DT linear quadratic regulator (LQR), where the action is linear and the value quadratic in the states and NNs have zero approximation error. It is stressed that, for the LQR, HDP may be implemented without knowing the system A matrix by using two NNs. This fact is not generally appreciated in the folklore of HDP for the DT LQR, where only one critic NN is generally used.

...read moreread less

919 citations

Journal Article•DOI•

Brief paper: Adaptive optimal control for continuous-time linear systems based on policy iteration

[...]

Draguna Vrabie¹, Octavian Pastravanu, Murad Abu-Khalaf², Frank L. Lewis¹•Institutions (2)

University of Texas at Arlington¹, MathWorks²

01 Feb 2009-Automatica

TL;DR: This paper proposes a new scheme based on adaptive critics for finding online the state feedback, infinite horizon, optimal control solution of linear continuous-time systems using only partial knowledge regarding the system dynamics.

...read moreread less

716 citations

Journal Article•DOI•

Model-free Q-learning designs for discrete-time zero-sum games with application to H-infinity control

[...]

A. Al-Tamimi¹, Frank L. Lewis¹, Murad Abu-Khalaf¹•Institutions (1)

University of Texas at Arlington¹

01 Mar 2007

TL;DR: It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the (GARE) of the linear quadratic discrete-time zero-sum game.

...read moreread less

Abstract: In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices. The idea is to solve for an action dependent value function Q(x,u,w) of the zero-sum game instead of solving for the state dependent value function V(x) which satisfies a corresponding game algebraic Riccati equation (GARE). Since the state and actions spaces are continuous, two action networks and one critic network are used that are adaptively tuned in forward time using adaptive critic methods. The result is a Q-learning approximate dynamic programming model-free approach that solves the zero-sum game forward in time. It is shown that the critic converges to the game value function and the action networks converge to the Nash equilibrium of the game. Proofs of convergence of the algorithm are shown. It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the (GARE) of the linear quadratic discrete-time zero-sum game. The effectiveness of this method is shown by performing an H-infinity control autopilot design for an F-16 aircraft.

...read moreread less

441 citations

Journal Article•DOI•

Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems

[...]

Murad Abu-Khalaf¹, Frank L. Lewis², Jie Huang³•Institutions (3)

MathWorks¹, University of Texas at Arlington², The Chinese University of Hong Kong³

01 Jul 2008-IEEE Transactions on Neural Networks

TL;DR: In this paper, neural networks are used along with two-player policy iterations to solve for the feedback strategies of a continuous-time zero-sum game that appears in L2-gain optimal control, suboptimal Hinfin control, of nonlinear systems affine in input with the control policy having saturation constraints.

...read moreread less

Abstract: In this paper, neural networks are used along with two-player policy iterations to solve for the feedback strategies of a continuous-time zero-sum game that appears in L2-gain optimal control, suboptimal Hinfin control, of nonlinear systems affine in input with the control policy having saturation constraints. The result is a closed-form representation, on a prescribed compact set chosen a priori, of the feedback strategies and the value function that solves the associated Hamilton-Jacobi-Isaacs (HJI) equation. The closed-loop stability, L2-gain disturbance attenuation of the neural network saturated control feedback strategy, and uniform convergence results are proven. Finally, this approach is applied to the rotational/translational actuator (RTAC) nonlinear benchmark problem under actuator saturation, offering guaranteed stability and disturbance attenuation.

...read moreread less

173 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Reinforcement learning and adaptive dynamic programming for feedback control

[...]

Frank L. Lewis¹, Draguna Vrabie¹•Institutions (1)

University of Texas at Arlington¹

01 Sep 2009-IEEE Circuits and Systems Magazine

TL;DR: This work describes mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming that give insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior.

...read moreread less

Abstract: Living organisms learn by acting on their environment, observing the resulting reward stimulus, and adjusting their actions accordingly to improve the reward. This action-based or reinforcement learning can capture notions of optimal behavior occurring in natural systems. We describe mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming. These give us insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior.

...read moreread less

1,163 citations

Journal Article•DOI•

Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach

[...]

Murad Abu-Khalaf¹, Frank L. Lewis¹•Institutions (1)

University of Texas at Arlington¹

01 May 2005-Automatica

...read moreread less

1,045 citations

Journal Article•DOI•

Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem

[...]

Kyriakos G. Vamvoudakis¹, Frank L. Lewis¹•Institutions (1)

University of Texas at Arlington¹

01 May 2010-Automatica

TL;DR: An online algorithm based on policy iteration for learning the continuous-time optimal control solution with infinite horizon cost for nonlinear systems with known dynamics, which finds in real-time suitable approximations of both the optimal cost and the optimal control policy, while also guaranteeing closed-loop stability.

...read moreread less

1,012 citations

Journal Article•DOI•

Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

[...]

A. Al-Tamimi¹, Frank L. Lewis², Murad Abu-Khalaf³•Institutions (3)

Hashemite University¹, University of Texas at Arlington², MathWorks³

01 Aug 2008

...read moreread less

919 citations

Journal Article•DOI•

Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers

[...]

Frank L. Lewis¹, Draguna Vrabie, Kyriakos G. Vamvoudakis²•Institutions (2)

University of Texas at Arlington¹, University of California, Santa Barbara²

16 Nov 2012-IEEE Control Systems Magazine

TL;DR: In this article, the authors describe the use of reinforcement learning to design feedback controllers for discrete and continuous-time dynamical systems that combine features of adaptive control and optimal control, which are not usually designed to be optimal in the sense of minimizing user-prescribed performance functions.

...read moreread less

Abstract: This article describes the use of principles of reinforcement learning to design feedback controllers for discrete- and continuous-time dynamical systems that combine features of adaptive control and optimal control. Adaptive control [1], [2] and optimal control [3] represent different philosophies for designing feedback controllers. Optimal controllers are normally designed of ine by solving Hamilton JacobiBellman (HJB) equations, for example, the Riccati equation, using complete knowledge of the system dynamics. Determining optimal control policies for nonlinear systems requires the offline solution of nonlinear HJB equations, which are often difficult or impossible to solve. By contrast, adaptive controllers learn online to control unknown systems using data measured in real time along the system trajectories. Adaptive controllers are not usually designed to be optimal in the sense of minimizing user-prescribed performance functions. Indirect adaptive controllers use system identification techniques to first identify the system parameters and then use the obtained model to solve optimal design equations [1]. Adaptive controllers may satisfy certain inverse optimality conditions [4].

...read moreread less

841 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse