Home
/
Authors
/
Mohammad-Bagher Naghibi-Sistani

Author

Mohammad-Bagher Naghibi-Sistani

Bio: Mohammad-Bagher Naghibi-Sistani is an academic researcher from Ferdowsi University of Mashhad. The author has contributed to research in topics: Control theory & Reinforcement learning. The author has an hindex of 11, co-authored 21 publications receiving 932 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems

[...]

Hamidreza Modares¹, Frank L. Lewis², Mohammad-Bagher Naghibi-Sistani¹•Institutions (2)

Ferdowsi University of Mashhad¹, University of Texas at Arlington²

01 Jan 2014-Automatica

TL;DR: An integral reinforcement learning algorithm on an actor-critic structure is developed to learn online the solution to the Hamilton-Jacobi-Bellman equation for partially-unknown constrained-input systems and it is shown that using this technique, an easy-to-check condition on the richness of the recorded data is sufficient to guarantee convergence to a near-optimal control law.

...read moreread less

410 citations

Journal Article•DOI•

Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks

[...]

Hamidreza Modares¹, Frank L. Lewis², Mohammad-Bagher Naghibi-Sistani¹•Institutions (2)

Ferdowsi University of Mashhad¹, University of Texas at Austin²

22 Aug 2013-IEEE Transactions on Neural Networks

TL;DR: This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems where two neural networks are tuned online and simultaneously to generate the optimal bounded control policy.

...read moreread less

Abstract: This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems. The proposed PI algorithm is implemented on an actor-critic structure where two neural networks (NNs) are tuned online and simultaneously to generate the optimal bounded control policy. The requirement of complete knowledge of the system dynamics is obviated by employing a novel NN identifier in conjunction with the actor and critic NNs. It is shown how the identifier weights estimation error affects the convergence of the critic NN. A novel learning rule is developed to guarantee that the identifier weights converge to small neighborhoods of their ideal values exponentially fast. To provide an easy-to-check persistence of excitation condition, the experience replay technique is used. That is, recorded past experiences are used simultaneously with current data for the adaptation of the identifier weights. Stability of the whole system consisting of the actor, critic, system state, and system identifier is guaranteed while all three networks undergo adaptation. Convergence to a near-optimal control law is also shown. The effectiveness of the proposed method is illustrated with a simulation example.

...read moreread less

371 citations

Journal Article•DOI•

Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data

[...]

Bahare Kiumarsi¹, Frank L. Lewis¹, Mohammad-Bagher Naghibi-Sistani², Ali Karimpour²•Institutions (2)

University of Texas at Arlington¹, Ferdowsi University of Mashhad²

06 Jan 2015-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An output-feedback solution to the infinite-horizon linear quadratic tracking (LQT) problem for unknown discrete-time systems is proposed and a novel Bellman equation is developed that evaluates the value function related to a fixed policy by using only the input, output, and reference trajectory data from the augmented system.

...read moreread less

Abstract: In this paper, an output-feedback solution to the infinite-horizon linear quadratic tracking (LQT) problem for unknown discrete-time systems is proposed. An augmented system composed of the system dynamics and the reference trajectory dynamics is constructed. The state of the augmented system is constructed from a limited number of measurements of the past input, output, and reference trajectory in the history of the augmented system. A novel Bellman equation is developed that evaluates the value function related to a fixed policy by using only the input, output, and reference trajectory data from the augmented system. By using approximate dynamic programming, a class of reinforcement learning methods, the LQT problem is solved online without requiring knowledge of the augmented system dynamics only by measuring the input, output, and reference trajectory from the augmented system. We develop both policy iteration (PI) and value iteration (VI) algorithms that converge to an optimal controller that require only measuring the input, output, and reference trajectory data. The convergence of the proposed PI and VI algorithms is shown. A simulation example is used to verify the effectiveness of the proposed control scheme.

...read moreread less

175 citations

Journal Article•DOI•

Bipartite consensus control for fractional-order nonlinear multi-agent systems: An output constraint approach

[...]

Milad Shahvali¹, Ali Azarbahram¹, Mohammad-Bagher Naghibi-Sistani¹, Javad Askari²•Institutions (2)

Ferdowsi University of Mashhad¹, Isfahan University of Technology²

15 Jul 2020-Neurocomputing

TL;DR: A novel fully distributed controller is developed based on backstepping technique and neuro-adaptive update mechanism to ensure bipartite consensus of multiple fractional-order nonlinear systems with output constraints and it is shown that all the closed-loop error signals are uniformly ultimately bounded.

...read moreread less

58 citations

Journal Article•DOI•

Adaptive output-feedback bipartite consensus for nonstrict-feedback nonlinear multi-agent systems: A finite-time approach

[...]

Milad Shahvali¹, Mohammad-Bagher Naghibi-Sistani¹, Javad Askari²•Institutions (2)

Ferdowsi University of Mashhad¹, Isfahan University of Technology²

27 Nov 2018-Neurocomputing

TL;DR: Finite-time bipartite synchronization of multi-agent systems is assessed here and a virtual affine variable is introduced, and neural network along with minimal learning parameter principle are employed to approximate composite uncertainties including unknown functions in the system dynamics, unknown control coefficients and control inputs.

...read moreread less

44 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

[신간의 별자리x] 우리/미술, 그리고 ‘슬픔의 박물관’

[...]

이화영

01 Jan 2015

12,972 citations

Journal Article•DOI•

Principles of Neural Science

[...]

Michael P. Alexander

06 Jun 1986-JAMA

TL;DR: The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or her own research.

...read moreread less

Abstract: I have developed "tennis elbow" from lugging this book around the past four weeks, but it is worth the pain, the effort, and the aspirin. It is also worth the (relatively speaking) bargain price. Including appendixes, this book contains 894 pages of text. The entire panorama of the neural sciences is surveyed and examined, and it is comprehensive in its scope, from genomes to social behaviors. The editors explicitly state that the book is designed as "an introductory text for students of biology, behavior, and medicine," but it is hard to imagine any audience, interested in any fragment of neuroscience at any level of sophistication, that would not enjoy this book. The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or

...read moreread less

7,563 citations

Book Chapter•DOI•

An Introduction to Game Theory

[...]

Bruce Bueno de Mesquita

01 Jan 2014

TL;DR: This chapter is devoted to a more detailed examination of game theory, and two game theoretic scenarios were examined: Simultaneous-move and multi-stage games.

...read moreread less

Abstract: This chapter is devoted to a more detailed examination of game theory. Game theory is an important tool for analyzing strategic behavior, is concerned with how individuals make decisions when they recognize that their actions affect, and are affected by, the actions of other individuals or groups. Strategic behavior recognizes that the decision-making process is frequently mutually interdependent. Game theory is the study of the strategic behavior involving the interaction of two or more individuals, teams, or firms, usually referred to as players. Two game theoretic scenarios were examined in this chapter: Simultaneous-move and multi-stage games. In simultaneous-move games the players effectively move at the same time. A normal-form game summarizes the players, possible strategies and payoffs from alternative strategies in a simultaneous-move game. Simultaneous-move games may be either noncooperative or cooperative. In contrast to noncooperative games, players of cooperative games engage in collusive behavior. A Nash equilibrium, which is a solution to a problem in game theory, occurs when the players’ payoffs cannot be improved by changing strategies. Simultaneous-move games may be either one-shot or repeated games. One-shot games are played only once. Repeated games are games that are played more than once. Infinitely-repeated games are played over and over again without end. Finitely-repeated games are played a limited number of times. Finitely-repeated games have certain or uncertain ends.

...read moreread less

814 citations

Journal Article•DOI•

Optimal and Autonomous Control Using Reinforcement Learning: A Survey

[...]

Bahare Kiumarsi¹, Kyriakos G. Vamvoudakis², Hamidreza Modares³, Frank L. Lewis¹•Institutions (3)

University of Texas at Arlington¹, Virginia Tech², Missouri University of Science and Technology³

01 Jun 2018-IEEE Transactions on Neural Networks

TL;DR: Q-learning and the integral RL algorithm as core algorithms for discrete time (DT) and continuous-time (CT) systems, respectively are discussed, and a new direction of off-policy RL for both CT and DT systems is discussed.

...read moreread less

Abstract: This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal $\mathcal {H}_{2}$ and $\mathcal {H}_\infty $ control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.

...read moreread less

536 citations

Journal Article•DOI•

Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems

[...]

Derong Liu, Qinglai Wei

01 Mar 2014-IEEE Transactions on Neural Networks

TL;DR: It is shown that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation and it is proven that any of the iteratives control laws can stabilize the nonlinear systems.

...read moreread less

Abstract: This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.

...read moreread less

535 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse