Author
Mohammad-Bagher Naghibi-Sistani
Bio: Mohammad-Bagher Naghibi-Sistani is an academic researcher from Ferdowsi University of Mashhad. The author has contributed to research in topics: Control theory & Reinforcement learning. The author has an hindex of 11, co-authored 21 publications receiving 932 citations.
Papers
More filters
••
TL;DR: An integral reinforcement learning algorithm on an actor-critic structure is developed to learn online the solution to the Hamilton-Jacobi-Bellman equation for partially-unknown constrained-input systems and it is shown that using this technique, an easy-to-check condition on the richness of the recorded data is sufficient to guarantee convergence to a near-optimal control law.
410 citations
••
TL;DR: This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems where two neural networks are tuned online and simultaneously to generate the optimal bounded control policy.
Abstract: This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems. The proposed PI algorithm is implemented on an actor-critic structure where two neural networks (NNs) are tuned online and simultaneously to generate the optimal bounded control policy. The requirement of complete knowledge of the system dynamics is obviated by employing a novel NN identifier in conjunction with the actor and critic NNs. It is shown how the identifier weights estimation error affects the convergence of the critic NN. A novel learning rule is developed to guarantee that the identifier weights converge to small neighborhoods of their ideal values exponentially fast. To provide an easy-to-check persistence of excitation condition, the experience replay technique is used. That is, recorded past experiences are used simultaneously with current data for the adaptation of the identifier weights. Stability of the whole system consisting of the actor, critic, system state, and system identifier is guaranteed while all three networks undergo adaptation. Convergence to a near-optimal control law is also shown. The effectiveness of the proposed method is illustrated with a simulation example.
371 citations
••
TL;DR: An output-feedback solution to the infinite-horizon linear quadratic tracking (LQT) problem for unknown discrete-time systems is proposed and a novel Bellman equation is developed that evaluates the value function related to a fixed policy by using only the input, output, and reference trajectory data from the augmented system.
Abstract: In this paper, an output-feedback solution to the infinite-horizon linear quadratic tracking (LQT) problem for unknown discrete-time systems is proposed. An augmented system composed of the system dynamics and the reference trajectory dynamics is constructed. The state of the augmented system is constructed from a limited number of measurements of the past input, output, and reference trajectory in the history of the augmented system. A novel Bellman equation is developed that evaluates the value function related to a fixed policy by using only the input, output, and reference trajectory data from the augmented system. By using approximate dynamic programming, a class of reinforcement learning methods, the LQT problem is solved online without requiring knowledge of the augmented system dynamics only by measuring the input, output, and reference trajectory from the augmented system. We develop both policy iteration (PI) and value iteration (VI) algorithms that converge to an optimal controller that require only measuring the input, output, and reference trajectory data. The convergence of the proposed PI and VI algorithms is shown. A simulation example is used to verify the effectiveness of the proposed control scheme.
175 citations
••
TL;DR: A novel fully distributed controller is developed based on backstepping technique and neuro-adaptive update mechanism to ensure bipartite consensus of multiple fractional-order nonlinear systems with output constraints and it is shown that all the closed-loop error signals are uniformly ultimately bounded.
58 citations
••
TL;DR: Finite-time bipartite synchronization of multi-agent systems is assessed here and a virtual affine variable is introduced, and neural network along with minimal learning parameter principle are employed to approximate composite uncertainties including unknown functions in the system dynamics, unknown control coefficients and control inputs.
44 citations
Cited by
More filters
••
TL;DR: The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or her own research.
Abstract: I have developed "tennis elbow" from lugging this book around the past four weeks, but it is worth the pain, the effort, and the aspirin. It is also worth the (relatively speaking) bargain price. Including appendixes, this book contains 894 pages of text. The entire panorama of the neural sciences is surveyed and examined, and it is comprehensive in its scope, from genomes to social behaviors. The editors explicitly state that the book is designed as "an introductory text for students of biology, behavior, and medicine," but it is hard to imagine any audience, interested in any fragment of neuroscience at any level of sophistication, that would not enjoy this book. The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or
7,563 citations
••
01 Jan 2014
TL;DR: This chapter is devoted to a more detailed examination of game theory, and two game theoretic scenarios were examined: Simultaneous-move and multi-stage games.
Abstract: This chapter is devoted to a more detailed examination of game theory. Game theory is an important tool for analyzing strategic behavior, is concerned with how individuals make decisions when they recognize that their actions affect, and are affected by, the actions of other individuals or groups. Strategic behavior recognizes that the decision-making process is frequently mutually interdependent. Game theory is the study of the strategic behavior involving the interaction of two or more individuals, teams, or firms, usually referred to as players. Two game theoretic scenarios were examined in this chapter: Simultaneous-move and multi-stage games. In simultaneous-move games the players effectively move at the same time. A normal-form game summarizes the players, possible strategies and payoffs from alternative strategies in a simultaneous-move game. Simultaneous-move games may be either noncooperative or cooperative. In contrast to noncooperative games, players of cooperative games engage in collusive behavior. A Nash equilibrium, which is a solution to a problem in game theory, occurs when the players’ payoffs cannot be improved by changing strategies. Simultaneous-move games may be either one-shot or repeated games. One-shot games are played only once. Repeated games are games that are played more than once. Infinitely-repeated games are played over and over again without end. Finitely-repeated games are played a limited number of times. Finitely-repeated games have certain or uncertain ends.
814 citations
••
TL;DR: Q-learning and the integral RL algorithm as core algorithms for discrete time (DT) and continuous-time (CT) systems, respectively are discussed, and a new direction of off-policy RL for both CT and DT systems is discussed.
Abstract: This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal $\mathcal {H}_{2}$ and $\mathcal {H}_\infty $ control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
536 citations
••
TL;DR: It is shown that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation and it is proven that any of the iteratives control laws can stabilize the nonlinear systems.
Abstract: This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.
535 citations