scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 2022"


Journal ArticleDOI
TL;DR: In this article , an adaptive optimal control approach based on reinforcement learning and adaptive dynamic programming is proposed to learn the optimal regulator with assured convergence rate for disturbed linear continuous-time systems.

59 citations


Journal ArticleDOI
TL;DR: In this article , an adaptive dynamic programming-based data-driven controller for hydraulic servo actuators (HSA) with unknown dynamics is proposed, which requires neither the knowledge of the HSA dynamics nor exosystem dynamics.
Abstract: <p style='text-indent:20px;'>The hydraulic servo actuators (HSA) are often used in the industry in tasks that request great powers, high accuracy and dynamic motion. It is well known that HSA is a highly complex nonlinear system, and that the system parameters cannot be accurately determined due to various uncertainties, inability to measure some parameters, and disturbances. This paper considers control problem of the HSA with unknown dynamics, based on adaptive dynamic programming via output feedback. Due to increasing practical application of the control algorithm, a linear discrete model of HSA is considered and an online learning data-driven controller is used, which is based on measured input and output data instead of unmeasurable states and unknown system parameters. Hence, the ADP based data-driven controller in this paper requires neither the knowledge of the HSA dynamics nor exosystem dynamics. The convergence of the ADP based control algorithm is also theoretically shown. Simulation results verify the feasibility and effectiveness of the proposed approach in solving the optimal control problem of HSA.</p>

58 citations


Journal ArticleDOI
01 Jan 2022-Energy
TL;DR: This work proposed a real-time dynamic optimal energy management (OEM) based on deep reinforcement learning (DRL) algorithm based on a novel policy-based DRL algorithm with continuous state and action spaces, which includes two phases: offline training and online operation.

53 citations


Journal ArticleDOI
22 Dec 2022
TL;DR: Wang et al. as mentioned in this paper surveyed the latest development of adaptive dynamic programming (ADP) based optimal control with communication constraints and summarized some applications of the ADP method in practical systems.
Abstract: Survey/review study Adaptive Dynamic Programming for Networked Control Systems under Communication Constraints: A Survey of Trends and Techniques Xueli Wang 1, Ying Sun 1,*, and Derui Ding 2 1 Department of Control Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China 2 School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne 3122, Australia * Correspondence: yingsun1991@163.com Received: 22 September 2022 Accepted: 28 November 2022 Published: 22 December 2022 Abstract: The adaptive dynamic programming (ADP) technology has been widely used benefiting from its recursive structure in forward and the prospective conception of reinforcement learning. Furthermore, ADP-based control issues with communication constraints arouse ever-increasing research consideration in theoretical analysis and engineering applications due mainly to the wide participation of digital communications in industrial systems. The latest development of ADP-based optimal control with communication constraints is systematically surveyed in this paper. To this end, the development of ADP-based dominant methods is first investigated from their structures and implementation. Then, technical challenges and corresponding approaches are comprehensively and thoroughly discussed and the existing results are reviewed according to the constraint types. Furthermore, some applications of the ADP method in practical systems are summarized. Finally, future topics are lighted on ADP-based control issues.

51 citations


Journal ArticleDOI
01 Jan 2022-Energy
TL;DR: In this paper , a real-time dynamic optimal energy management (OEM) based on deep reinforcement learning (DRL) algorithm is proposed to help the EMS make optimal schedule decisions, and the case study demonstrates the effectiveness and the computation efficiency of the proposed method.

39 citations


Journal ArticleDOI
TL;DR: In this paper , a Double Q-learning RL algorithm with state constraint and variable action space was adopted to determine the optimal energy management strategy for fuel cell/battery hybrid systems. But the authors did not consider the degradation of power sources.
Abstract: Energy management strategy (EMS) is the key to the performance of fuel cell / battery hybrid system. At present, reinforcement learning (RL) has been introduced into this field and has gradually become the focus of research. However, traditional EMSs only take the energy consumption into consideration when optimizing the operation economy, and ignore the cost caused by power source degradations. It would cause the problem of poor operation economy regarding Total Cost of Ownership (TCO). On the other hand, most studied RL algorithms have the disadvantages of overestimation and improper way of restricting battery SOC, which would lead to relatively poor control performance as well. To solve these problems, this paper establishes a TCO model including energy consumption, equivalent energy consumption and degradation of power sources at first, then adopt the Double Q-learning RL algorithm with state constraint and variable action space to determine the optimal EMS. Finally, using hardware-in-the-loop platform, the feasibility, superiority and generalization of proposed EMS is proved by comparing with the optimal dynamic programming and traditional RL EMS and equivalent consumption minimum strategy (ECMS) under both training and unknown operating conditions. Results prove that the proposed strategy has high global optimality and excellent SOC control ability regardless of training or unknown conditions.

36 citations


Journal ArticleDOI
01 Sep 2022
TL;DR: In this article , an event-triggered adaptive dynamic programming (ADP) algorithm is developed to solve the tracking control problem for partially unknown constrained uncertain systems, where the learning of neural network weights not only relaxes the initial admissible control but also executes only when the predefined execution rule is violated.
Abstract: An event-triggered adaptive dynamic programming (ADP) algorithm is developed in this article to solve the tracking control problem for partially unknown constrained uncertain systems. First, an augmented system is constructed, and the solution of the optimal tracking control problem of the uncertain system is transformed into an optimal regulation of the nominal augmented system with a discounted value function. The integral reinforcement learning is employed to avoid the requirement of augmented drift dynamics. Second, the event-triggered ADP is adopted for its implementation, where the learning of neural network weights not only relaxes the initial admissible control but also executes only when the predefined execution rule is violated. Third, the tracking error and the weight estimation error prove to be uniformly ultimately bounded, and the existence of a lower bound for the interexecution times is analyzed. Finally, simulation results demonstrate the effectiveness of the present event-triggered ADP method.

28 citations


Journal ArticleDOI
TL;DR: In this article , an adaptive dynamic programming (ADP) control method is proposed based on concurrent learning for unknown continuous-time nonlinear systems with control constraints, which reduces controller execution times and guarantees stability of the system.
Abstract: In this article, an event-triggered $H_{\infty }$ control method is proposed based on adaptive dynamic programming (ADP) with concurrent learning for unknown continuous-time nonlinear systems with control constraints. First, a system identification technique based on neural networks (NNs) is adopted to identify completely unknown systems. Second, a critic NN is employed to approximate the value function. A novel weight updating rule is developed based on the event-triggered control law and time-triggered disturbance law, which reduces controller execution times and guarantees the stability of the system. Subsequently, concurrent learning is applied to the weight updating rule to relax the demand for the traditional persistence of excitation condition that is difficult to implement online. Finally, the comparison between the time-triggered method and event-triggered method in simulation demonstrates the effectiveness of the developed constrained event-triggered ADP method.

26 citations


Journal ArticleDOI
01 Dec 2022
TL;DR: In this paper , an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework was proposed to solve the HJB equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems.
Abstract: In this article, we consider an iterative adaptive dynamic programming (ADP) algorithm within the Hamiltonian-driven framework to solve the Hamilton-Jacobi-Bellman (HJB) equation for the infinite-horizon optimal control problem in continuous time for nonlinear systems. First, a novel function, "min-Hamiltonian," is defined to capture the fundamental properties of the classical Hamiltonian. It is shown that both the HJB equation and the policy iteration (PI) algorithm can be formulated in terms of the min-Hamiltonian within the Hamiltonian-driven framework. Moreover, we develop an iterative ADP algorithm that takes into consideration the approximation errors during the policy evaluation step. We then derive a sufficient condition on the iterative value gradient to guarantee closed-loop stability of the equilibrium point as well as convergence to the optimal value. A model-free extension based on an off-policy reinforcement learning (RL) technique is also provided. Finally, numerical results illustrate the efficacy of the proposed framework.

23 citations


Journal ArticleDOI
TL;DR: In this article , a self-learning parallel control method based on adaptive dynamic programming (ADP) technique is developed for solving the optimal control problem of discrete-time time-varying nonlinear systems.
Abstract: In this article, a new self-learning parallel control method, which is based on adaptive dynamic programming (ADP) technique, is developed for solving the optimal control problem of discrete- time time-varying nonlinear systems. It aims to obtain an approximate optimal control law sequence and simultaneously guarantees the convergence of the value function. Establishing the time-varying artificial system by neural networks in a certain time-horizon, a control-sequence-improvement ADP algorithm is developed to obtain the control law sequence. For the first time, the criteria of the parallel execution are presented, such that the value function is proven to converge to a finite neighborhood of the optimal performance index function. Finally, numerical results and analysis are presented to demonstrate the effectiveness of the parallel control method.

22 citations


Journal ArticleDOI
TL;DR: In this paper , a novel event-triggered control (ETC) approach based on the deterministic policy gradient adaptive dynamic programming (ADP) algorithm is proposed to address zero-sum game problems for discrete-time (DT) nonlinear systems.
Abstract: In order to address zero-sum game problems for discrete-time (DT) nonlinear systems, this article develops a novel event-triggered control (ETC) approach based on the deterministic policy gradient (PG) adaptive dynamic programming (ADP) algorithm. By adopting the input and output data, the proposed ETC method updates the control law and the disturbance law with a gradient descent algorithm. Compared with the conventional PG ADP-based control scheme, the present controller is updated aperiodically to reduce the computational and communication burden. Then, the actor-critic-disturbance framework is adopted to obtain the optimal control law and the worst disturbance law, which guarantee the input-to-state stability of the closed-loop system. Moreover, a novel neural network weight updating law which guarantees the uniform ultimate boundedness of weight estimation errors is provided based on the experience replay technique. Finally, the validity of the present method is verified by simulation of two DT nonlinear systems.

Journal ArticleDOI
TL;DR: In this article , an online adaptive optimal control algorithm based on adaptive dynamic programming is developed to solve the multiplayer nonzero-sum game (MP-NZSG) for discrete-time unknown nonlinear systems.
Abstract: In this article, an online adaptive optimal control algorithm based on adaptive dynamic programming is developed to solve the multiplayer nonzero-sum game (MP-NZSG) for discrete-time unknown nonlinear systems. First, a model-free coupled globalized dual-heuristic dynamic programming (GDHP) structure is designed to solve the MP-NZSG problem, in which there is no model network or identifier. Second, in order to relax the requirement of systems dynamics, an online adaptive learning algorithm is developed to solve the Hamilton-Jacobi equation using the system states of two adjacent time steps. Third, a series of critic networks and action networks are used to approximate value functions and optimal policies for all players. All the neural network (NN) weights are updated online based on real-time system states. Fourth, the uniformly ultimate boundedness analysis of the NN approximation errors is proved based on the Lyapunov approach. Finally, simulation results are given to demonstrate the effectiveness of the developed scheme.

Posted ContentDOI
17 Aug 2022-bioRxiv
TL;DR: The bidirectional WFA algorithm (BiWFA), the first gap-affine algorithm capable of computing optimal alignments in O(s) memory while retaining WFA’s time complexity of O(ns), is presented.
Abstract: Motivation Pairwise sequence alignment remains a fundamental problem in computational biology and bioinformatics. Recent advances in genomics and sequencing technologies demand faster and scalable algorithms that can cope with the ever-increasing sequence lengths. Classical pairwise alignment algorithms based on dynamic programming are strongly limited by quadratic requirements in time and memory. The recently proposed wavefront alignment algorithm (WFA) introduced an efficient algorithm to perform exact gap-affine alignment in O(ns) time, where s is the optimal score and n is the sequence length. Notwithstanding these bounds, WFA’s O(s2) memory requirements become computationally impractical for genome-scale alignments, leading to a need for further improvement. Results In this paper, we present the bidirectional WFA algorithm (BiWFA), the first gap-affine algorithm capable of computing optimal alignments in O(s) memory while retaining WFA’s time complexity of O(ns). As a result, this work improves the lowest known memory bound O(n) to compute gap-affine alignments. In practice, our implementation never requires more than a few hundred MBs aligning noisy Oxford Nanopore Technologies reads up to 1 Mbp long while maintaining competitive execution times. Availability All code is publicly available at https://github.com/smarco/BiWFA-paper Contact santiagomsola@gmail.com

Journal ArticleDOI
TL;DR: In this paper , a reinforcement learning-based energy efficient speed planning strategy is proposed for autonomous electric vehicles, which learn an optimal control policy through a data-driven learning process, and achieves a near-optimal performance of 93.8% relative to the dynamic programming result.

Journal ArticleDOI
TL;DR: In this paper, a multi-dimensional approximate dynamic programming (ADP) algorithm was proposed for the real-time schedule of integrated heat and power system (IHPS) with battery and heat storage tank (HST).

Journal ArticleDOI
TL;DR: In this paper , an adaptive dynamic programming approach based on Bellman principle is proposed to achieve accurate current sharing and voltage regulation in a hybrid wind/solar system, which is based on distributed adaptive dynamic program approach.
Abstract: Renewable energy is an advisable choice to reduce fuel consumption and $\rm CO_{2}$ emission. Therein, wind energy and solar energy are the most promising contributors to reach this goal. Although the hybrid wind/solar system has been widely studied, the real-time current sharing based on their maximum capacities is rarely achieved in terms of seconds. Based on this, this paper proposes an accurate current sharing and voltage regulation approach in hybrid wind/solar systems, which is based on distributed adaptive dynamic programming approach. Firstly, the equivalent wind/solar model is built, which is an indispensable preprocessing to achieve the complementary between wind energy and solar energy. Therein, the wind energy and solar energy can output relative current according to their respective capacity ratio, which ensure the maximum utilization ratio of renewable energy source. Furthermore, current sharing and voltage regulation problem is switched into optimal control problem. Under this effect, each source agent aims to obtain the optimal control variable and achieve accurate current sharing/voltage regulation. Moreover, an adaptive dynamic programming approach based on Bellman principle is proposed. It can achieve accurate current sharing and voltage regulation. Finally, the simulation results are provided to illustrate the performance of the proposed adaptive dynamic programming approach.

Book ChapterDOI
01 Jan 2022
TL;DR: Deep Policy Dynamic Programming (DPDP) as discussed by the authors combines the strengths of learned neural heuristics with those of traditional dynamic programming algorithms, and prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
Abstract: AbstractRouting problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms guarantee optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP), the vehicle routing problem (VRP) and TSP with time windows (TSPTW) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming most other ‘neural approaches’ for solving TSPs, VRPs and TSPTWs with 100 nodes.KeywordsDynamic ProgrammingDeep LearningVehicle Routing

Journal ArticleDOI
TL;DR: In this paper , a dynamic prioritized policy gradient adaptive dynamic programming (ADP) method is developed to solve the optimal control problem of nonaffine nonlinear discrete-time systems, along with convergence analysis of the algorithm.
Abstract: With the industrialization of modern society, the pollution of water resources becomes more and more serious. Although purifying urban sewage through the wastewater treatment plants eases the burden of fragile ecosystems, the nonlinearities and uncertainties of biochemical reactions are difficult to address. In this article, a dynamic prioritized policy gradient adaptive dynamic programming (ADP) method is developed to solve the optimal control problem of nonaffine nonlinear discrete-time systems, along with convergence analysis of the algorithm. To the best of our knowledge, it is indispensable to conduct system modeling during the previous ADP research on wastewater treatment process control. By introducing the dynamic prioritized replay buffer and neural networks, the proposed ADP controller can track the setpoints of the wastewater treatment plant and alleviate the effects of disturbance without system modeling. The test results verify that the devised control method outperforms the proportional-integral-derivative strategy with less oscillation when unknown interference occurred.

Journal ArticleDOI
TL;DR: In this article , the adaptive control problem for continuous-time nonlinear systems described by differential equations is studied and a learning-based control algorithm is proposed to learn robust optimal controllers directly from real-time data.
Abstract: This article studies the adaptive optimal control problem for continuous-time nonlinear systems described by differential equations. A key strategy is to exploit the value iteration (VI) method proposed initially by Bellman in 1957 as a fundamental tool to solve dynamic programming problems. However, previous VI methods are all exclusively devoted to the Markov decision processes and discrete-time dynamical systems. In this article, we aim to fill up the gap by developing a new continuous-time VI method that will be applied to address the adaptive or nonadaptive optimal control problems for continuous-time systems described by differential equations. Like the traditional VI, the continuous-time VI algorithm retains the nice feature that there is no need to assume the knowledge of an initial admissible control policy. As a direct application of the proposed VI method, a new class of adaptive optimal controllers is obtained for nonlinear systems with totally unknown dynamics. A learning-based control algorithm is proposed to show how to learn robust optimal controllers directly from real-time data. Finally, two examples are given to illustrate the efficacy of the proposed methodology.

Journal ArticleDOI
01 Mar 2022
TL;DR: In this article , the authors study the McKean-Vlasov optimal control problem with common noise, which allows the law of the control process to appear in the state dynamics under various formulations: strong and weak ones, Markovian or non-Markovian.
Abstract: We study the McKean–Vlasov optimal control problem with common noise which allow the law of the control process to appear in the state dynamics under various formulations: strong and weak ones, Markovian or non-Markovian. By interpreting the controls as probability measures on an appropriate canonical space with two filtrations, we then develop the classical measurable selection, conditioning and concatenation arguments in this new context, and establish the dynamic programming principle under general conditions.

Journal ArticleDOI
TL;DR: In this article , an energy management strategy based on a hierarchical power splitting structure and deep reinforcement learning (DRL) is proposed for fuel cell hybrid electric vehicle equipped with battery (BAT) and ultracapacitor (UC).
Abstract: As for fuel cell hybrid electric vehicle equipped with battery (BAT) and ultracapacitor (UC), its dynamic topology structure is complex and different characteristics of three power sources induce challenges in energy management for fuel economy, power sources lifespan, and dynamic performance of the vehicle. In this paper, an energy management strategy (EMS) based on a hierarchical power splitting structure and deep reinforcement learning (DRL) is proposed. In the higher layer strategy of the proposed EMS, the UC is employed to supply peak power and recover braking energy through the adaptive filter based on fuzzy control. Then, the integrated DRL and equivalent consumption minimization strategy framework is proposed to optimize the power allocation of fuel cell (FC) and BAT in the lower layer, to ensure the highly efficient operation of FC and reduce hydrogen consumption. And the action trimming based on heuristic technique is proposed to further restrain the adverse effect of sudden peak power on FC lifespan. The simulation results show the proposed EMS can make the output of FC smoother, improve its working efficiency to alleviate the stress of BAT, and increase by 14.8% compared with the Q-learning strategy in fuel economy under WLTP driving cycle. Meanwhile, the obtained results under UDDSHDV show fuel economy of the proposed EMS can reach dynamic programming (DP) benchmark level of 89.7$\%$.

Journal ArticleDOI
TL;DR: In this article , a constrained adaptive dynamic programming (CADP) algorithm is proposed to solve general nonlinear nonaffine optimal control problems with known dynamics, which can directly deal with problems with state constraints.

Journal ArticleDOI
13 Jun 2022-Energies
TL;DR: In this paper , an energy management strategy optimization method for fuel cell hybrid electric vehicles based on dynamic programming is proposed to improve the fuel economy and system durability of vehicles, and the simulation results show that the equivalent 100 km hydrogen consumption of the strategy based on the dynamic programming optimization rules is reduced by 6.46% compared with that before the improvement.
Abstract: Fuel cell hybrid electric vehicles have attracted a large amount of attention in recent years owing to their advantages of zero emissions, high efficiency and low noise. To improve the fuel economy and system durability of vehicles, this paper proposes an energy management strategy optimization method for fuel cell hybrid electric vehicles based on dynamic programming. Rule-based and dynamic-programming-based strategies are developed based on building a fuel cell/battery hybrid system model. The rule-based strategy is improved with a power distribution scheme of dynamic programming strategy to improve the fuel economy of the vehicle. Furthermore, a limit on the rate of change of the output power of the fuel cell system is added to the rule-based strategy to avoid large load changes to improve the durability of the fuel cell. The simulation results show that the equivalent 100 km hydrogen consumption of the strategy based on the dynamic programming optimization rules is reduced by 6.46% compared with that before the improvement, and by limiting the rate of change of the output power of the fuel cell system, the times of large load changes are reduced. Therefore, the strategy based on the dynamic programming optimization rules effectively improves the fuel economy and system durability of vehicles.

Journal ArticleDOI
TL;DR: In this article , an operation strategy considering economic feasibility and photovoltaic self-consumption rate (SCR) for the energy management of office buildings under time-of-use (ToU) electricity price is presented.
Abstract: The optimal schedule of energy storage systems is an effective way to improve the economy and stability of grid connected photovoltaic-battery energy storage systems (PV-BESS). This study presents an operation strategy considering economic feasibility and photovoltaic self-consumption rate (SCR) for the energy management of office buildings under time-of-use (ToU) electricity price. The strategy aims to optimize FiT revenue streams for the PV-BESS by scheduling the overall energy flow in real time based on a dynamic programming algorithm. The battery control strategy based on a dynamic programming algorithm can control the energy flow in a flexible way to minimize net present value (NPV) in a typical year, while taking such factors as dynamic electricity price, the battery cycling aging, and demand response characteristics into account. An existing PV-BESS used for a middle-size office building in Beijing was taken as a case study to evaluate the optimization model. It is shown that the dispatch strategy could achieve superior performance in the cold region in China. Additionally, the indices affecting economic performance were figured out to validate the feasibility of the proposed algorithm. The results show that while the unit cost of energy storage is dropped to about 100$/kWh, the system could gain revenue under different electricity prices. Lastly, it is concluded that electricity price is the most sensitive parameter to the system's economy through sensitivity analysis.

Journal ArticleDOI
TL;DR: In this article , a distributed feedback-based optimization method, based on the principles of approximate dynamic programming, aiming for the optimal management and energy efficient operation of grid connected buildings is proposed.

Journal ArticleDOI
TL;DR: In this paper , the slope-weighted energy-based rapid control analysis (SERCA) algorithm is proposed to estimate near-optimal powertrain control trajectories while effectively dealing with broaded battery state-of-charge (SOC) window utilization and smooth HEV driving requirements.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors developed an online adaptive learning technique to complete the robust tracking control design for nonlinear uncertain systems, which uses the ideas of adaptive dynamic programming (ADP) proposed for optimal control.

Journal ArticleDOI
TL;DR: In this paper , a real-time schedule model of a microgrid for maximizing battery energy storage (BES) utilization is proposed to optimize the system under stochastic environments.
Abstract: This paper proposes a real-time schedule model of a microgrid (MG) for maximizing battery energy storage (BES) utilization. To this end, a BES life model is linearized using piece-wise linearization and big-M method to assess the BES life loss (BLL) in a real-time manner. The cost-effective schedule model of the MG with multiple energy resources aims to maximize BES utilization while ensuring its sufficient lifespan. Corresponding to the optimization model, approximate dynamic programming (ADP) for maximizing BES utilization (ADP-MBU) in the real-time schedule is proposed to optimize the system under stochastic environments. In ADP-MBU, a new value function approximation method employing the BES cumulative life loss (BCLL) is developed to improve the optimality and applicability. The proposed ADP-MBU algorithm can achieve satisfactory approximate optimality while reflecting the variation of real-time BLL. Case studies validate the applicability of the proposed MG schedule model and the advantages of the proposed ADP-MBU algorithm.

Journal ArticleDOI
01 Dec 2022
TL;DR: In this article , a novel data-based adaptive dynamic programming (ADP) method is presented to solve the optimal consensus tracking control problem for discrete-time (DT) multiagent systems (MASs) with multiple time delays.
Abstract: In this article, a novel data-based adaptive dynamic programming (ADP) method is presented to solve the optimal consensus tracking control problem for discrete-time (DT) multiagent systems (MASs) with multiple time delays. Necessary and sufficient conditions of the corresponding equivalent time-delay system are provided on the basis of the causal transformations. Benefitting from the construction of tracking error dynamics, the optimal tracking problem can be transformed into settling the Nash-equilibrium in the graphical game, which can be completed by solving the coupled Hamilton-Jacobi (HJ) equations. An error estimator is introduced to construct the tracking error of the MASs only using the input and output (I/O) data. Therefore, the designed data-based ADP algorithm can minimize the cost functions and ensure the consensus of MASs without the knowledge of system dynamics. Finally, a numerical example is given to demonstrate the effectiveness of the proposed method.

Journal ArticleDOI
Huilin Hu1
TL;DR: In this article , a distributed formation control problem of multi-quadrotor unmanned aerial vehicle (UAV) in the framework of event triggering is addressed, where an adaptive dynamic programming based on event triggering was developed to design the formation controller, and a critic-only network structure was adopted to approximate the optimal cost function.
Abstract: This paper is concerned with the distributed formation control problem of multi-quadrotor unmanned aerial vehicle (UAV) in the framework of event triggering. First, for the position loop, an adaptive dynamic programming based on event triggering is developed to design the formation controller. The critic-only network structure is adopted to approximate the optimal cost function. The merit of the proposed algorithm lies in that the event triggering mechanism is incorporated the neural network (NN) to reduce calculations and actions of the multi-UAV system, which is significant for the practical application. What’s more, a new weight update law based on the gradient descent technology is proposed for the critic NN, which can ensure that the solution converges to the optimal value online. Then, a finite-time attitude tracking controller is adopted for the attitude loop to achieve rapid attitude tracking. Finally, the efficiency of the proposed method is illustrated by numerical simulations and experimental verification.