Showing papers on "Dynamic programming published in 2021"

PDF

Open Access

Journal Article•DOI•

AoI-Minimal Trajectory Planning and Data Collection in UAV-Assisted Wireless Powered IoT Networks

[...]

Huimin Hu¹, Ke Xiong¹, Gang Qu², Qiang Ni³, Pingyi Fan⁴, Khaled Ben Letaief⁵ - Show less +2 more•Institutions (5)

Beijing Jiaotong University¹, University of Maryland, College Park², Lancaster University³, Tsinghua University⁴, Hong Kong University of Science and Technology⁵

15 Jan 2021-IEEE Internet of Things Journal

TL;DR: This article investigates the unmanned aerial vehicle (UAV)-assisted wireless powered Internet-of-Things system, where a UAV takes off from a data center, flies to each of the ground sensor nodes (SNs) in order to transfer energy and collect data from the SNs, and then returns to the data center.

...read moreread less

Abstract: This article investigates the unmanned aerial vehicle (UAV)-assisted wireless powered Internet-of-Things system, where a UAV takes off from a data center, flies to each of the ground sensor nodes (SNs) in order to transfer energy and collect data from the SNs, and then returns to the data center. For such a system, an optimization problem is formulated to minimize the average Age of Information (AoI) of the data collected from all ground SNs. Since the average AoI depends on the UAV’s trajectory, the time required for energy harvesting (EH) and data collection for each SN, these factors need to be optimized jointly. Moreover, instead of the traditional linear EH model, we employ a nonlinear model because the behavior of the EH circuits is nonlinear by nature. To solve this nonconvex problem, we propose to decompose it into two subproblems, i.e., a joint energy transfer and data collection time allocation problem and a UAV’s trajectory planning problem. For the first subproblem, we prove that it is convex and give an optimal solution by using Karush–Kuhn–Tucker (KKT) conditions. This solution is used as the input for the second subproblem, and we solve optimally it by designing dynamic programming (DP) and ant colony (AC) heuristic algorithms. The simulation results show that the DP-based algorithm obtains the minimal average AoI of the system, and the AC-based heuristic finds solutions with near-optimal average AoI. The results also reveal that the average AoI increases as the flying altitude of the UAV increases and linearly with the size of the collected data at each ground SN.

...read moreread less

138 citations

Journal Article•DOI•

A Self-Adaptive Differential Evolution Algorithm for Scheduling a Single Batch-Processing Machine With Arbitrary Job Sizes and Release Times

[...]

Zhou Shengchao¹, Lining Xing², Xu Zheng³, Ni Du⁴, Ling Wang⁵, Qingfu Zhang⁶ - Show less +2 more•Institutions (6)

Central South University¹, Central South University Forestry and Technology², University of Science and Technology of China³, Hunan Agricultural University⁴, Tsinghua University⁵, City University of Hong Kong⁶

17 Feb 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A self-adaptive differential evolution algorithm is developed for addressing a single BPM scheduling problem with unequal release times and job sizes and results demonstrate that the proposed self- Adaptive algorithm is more effective than other algorithms for this scheduling problem.

...read moreread less

Abstract: Batch-processing machines (BPMs) can process a number of jobs at a time, which can be found in many industrial systems. This article considers a single BPM scheduling problem with unequal release times and job sizes. The goal is to assign jobs into batches without breaking the machine capacity constraint and then sort the batches to minimize the makespan. A self-adaptive differential evolution algorithm is developed for addressing the problem. In our proposed algorithm, mutation operators are adaptively chosen based on their historical performances. Also, control parameter values are adaptively determined based on their historical performances. Our proposed algorithm is compared to CPLEX, existing metaheuristics for this problem and conventional differential evolution algorithms through comprehensive experiments. The experimental results demonstrate that our proposed self-adaptive algorithm is more effective than other algorithms for this scheduling problem.

...read moreread less

137 citations

Journal Article•DOI•

Event-Triggered Approximate Optimal Path-Following Control for Unmanned Surface Vehicles With State Constraints.

[...]

Zhou Weixiang¹, Jun Fu², Huaicheng Yan³, Du Xin⁴, Yueying Wang⁴, Hua Zhou⁴ - Show less +2 more•Institutions (4)

Shanghai Maritime University¹, Northeastern University (China)², East China University of Science and Technology³, Shanghai University⁴

05 Jul 2021-IEEE Transactions on Neural Networks

TL;DR: In this article, the authors investigated the problem of path following for the underactuated unmanned surface vehicles (USVs) subject to state constraints and proposed a useful control algorithm by combining the backstepping technique, adaptive dynamic programming (ADP), and the event-triggered mechanism.

...read moreread less

Abstract: This article investigates the problem of path following for the underactuated unmanned surface vehicles (USVs) subject to state constraints. A useful control algorithm is proposed by combining the backstepping technique, adaptive dynamic programming (ADP), and the event-triggered mechanism. The presented approach consists of three modules: guidance law, dynamic controller, and event triggering. First, to deal with the ``singularity'' problem, the guidance-based path-following (GBPF) principle is introduced in the guidance law loop. In contrast to the traditional barrier Lyapunov function (BLF) method, this article converts the USV's constraint model to a class of nonlinear systems without state constraints by introducing a nonlinear mapping. The control signal generated by the dynamic controller module consists of a backstepping-based feedforward control signal and an ADP-based approximate optimal feedback control signal. Therefore, the presented scheme can guarantee the approximate optimal performance. To approximate the cost function and its partial derivative, a critic neural network (NN) is constructed. By considering the event-triggered condition, the dynamic controller is further improved. Compared with traditional time-triggered control methods, the proposed approach can greatly reduce communication and computational burdens. This article proves that the closed-loop system is stable, and the simulation results and experimental validation are given to illustrate the effectiveness of the proposed approach.

...read moreread less

66 citations

Journal Article•DOI•

SDDP.jl: A Julia Package for Stochastic Dual Dynamic Programming

[...]

Oscar Dowson¹, Lea Kapelevich²•Institutions (2)

Northwestern University¹, Massachusetts Institute of Technology²

01 Jan 2021-Informs Journal on Computing

TL;DR: This work presents SDDP.jl, an open-source library for solving multistage stochastic programming problems using the Stochastic dual dynamic programming algorithm.

...read moreread less

Abstract: We present SDDP.jl, an open-source library for solving multistage stochastic programming problems using the stochastic dual dynamic programming algorithm. SDDP.jl is built on JuMP, an algebraic mod...

...read moreread less

59 citations

Journal Article•DOI•

Policy Iteration Q -Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems

[...]

Biao Luo¹, Yin Yang², Derong Liu³•Institutions (3)

Central South University¹, Khalifa University², Guangdong University of Technology³

23 Jun 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: In this article, the data-based two-player zero-sum game problem is considered for linear discrete-time systems and it is proved that the PIQL algorithm is equivalent to the Newton iteration method in the Banach space by using the Fréchet derivative.

...read moreread less

Abstract: In this article, the data-based two-player zero-sum game problem is considered for linear discrete-time systems. This problem theoretically depends on solving the discrete-time game algebraic Riccati equation (DTGARE), while it requires complete system dynamics. To avoid solving the DTGARE, the $Q$ -function is introduced and a data-based policy iteration $Q$ -learning (PIQL) algorithm is developed to learn the optimal $Q$ -function by using data collected from the real system. Writing the $Q$ -function in a quadratic form, it is proved that the PIQL algorithm is equivalent to the Newton iteration method in the Banach space by using the Frechet derivative. Then, the convergence of the PIQL algorithm can be guaranteed by Kantorovich’s theorem. For the realization of the PIQL algorithm, the off-policy learning scheme is proposed using real data rather than the system model. Finally, the efficiency of the developed data-based PIQL method is validated through simulation studies.

...read moreread less

55 citations

Journal Article•DOI•

An integrated energy-efficient train operation approach based on the space-time-speed network methodology

[...]

Xuekai Wang¹, Tao Tang¹, Shuai Su¹, Jiateng Yin¹, Ziyou Gao¹, Nan Lv - Show less +2 more•Institutions (1)

Beijing Jiaotong University¹

01 Jun 2021-Transportation Research Part E-logistics and Transportation Review

TL;DR: An integrated energy-efficient train operation method in which the driving strategy and the train timetable are jointly optimized is proposed, which can reduce the net energy consumption and the computing time by up to 25.0% compared to the result without optimization.

...read moreread less

Abstract: Reduction on the traction energy and increasing of the reused regenerative energy are two main ways for saving energy in metro systems, which are related to the driving strategy as well as the train timetable. To minimize the systematic energy, this paper proposes an integrated energy-efficient train operation method in which the driving strategy and the train timetable are jointly optimized. Firstly, the models of calculating the traction energy and the reuse of the regenerate energy are introduced with the constraints of the train operation. Then, the systematical optimization model is formulated by taking the net energy (i.e., the difference between the traction energy and the reused regenerate energy) as the objective function. Based on the Space–-Time-Speed network methodology, the optimization model is transformed into a discrete decision problem. Next, two algorithms are used to solve the problem. The dynamic programming algorithm is used to obtain the global optimal solution, and the discrete differential dynamic programming algorithm is applied to get the approximate optimal solution to reduce the computing time. Finally, two numeral examples are conducted to illustrate the effectiveness of the proposed method on energy saving. The method can reduce the net energy consumption by up to 25.0% compared to the result without optimization and by up to 8.7% compared to the result by using the two-stage method.

...read moreread less

55 citations

Journal Article•DOI•

Mayer-Type Optimal Control of Probabilistic Boolean Control Network With Uncertain Selection Probabilities

[...]

Mitsuru Toyoda¹, Yuhu Wu²•Institutions (2)

Tokyo Metropolitan University¹, Dalian University of Technology²

18 May 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This article considers a Mayer-type optimal control problem of probabilistic Boolean control networks (PBCNs) with uncertainty on selection probabilities which obey Beta Probabilistic distributions and deduces an equivalent formulation as a multistage decision problem.

...read moreread less

Abstract: This article considers a Mayer-type optimal control problem of probabilistic Boolean control networks (PBCNs) with uncertainty on selection probabilities which obey Beta probabilistic distributions. The expectation with respect to both the selection probabilities and the transitions of state variables is set as a cost function, and it deduces an equivalent formulation as a multistage decision problem. Furthermore, the dynamic programming technique is applied to solve the problem and performs a novel optimization algorithm in the fashion of semitensor product. A numerical example of a biological model of apoptosis protein demonstrates the effectiveness and feasibility of the proposed framework and algorithms.

...read moreread less

54 citations

Journal Article•DOI•

Tensor Decomposition Methods for High-dimensional Hamilton--Jacobi--Bellman Equations

[...]

Sergey Dolgov¹, Dante Kalise², Karl Kunisch³, Karl Kunisch⁴•Institutions (4)

University of Bath¹, University of Nottingham², University of Graz³, Austrian Academy of Sciences⁴

10 May 2021-SIAM Journal on Scientific Computing

TL;DR: A tensor decomposition approach for the solution of high-dimensional, fully nonlinear Hamilton-Jacobi-Bellman equations arising in optimal feedback control of nonlinear dynamics is presented in this article.

...read moreread less

Abstract: A tensor decomposition approach for the solution of high-dimensional, fully nonlinear Hamilton--Jacobi--Bellman equations arising in optimal feedback control of nonlinear dynamics is presented. The...

...read moreread less

50 citations

Journal Article•DOI•

Wasserstein Distributionally Robust Stochastic Control: A Data-Driven Approach

[...]

Insoon Yang¹•Institutions (1)

Systems Research Institute¹

01 Aug 2021-IEEE Transactions on Automatic Control

TL;DR: This article characterize an explicit form of the optimal control policy and the worst-case distribution policy for linear-quadratic problems with Wasserstein penalty and shows that the contraction property of associated Bellman operators extends a single-stage out-of-sample performance guarantee to the corresponding multistage guarantee without any degradation in the confidence level.

...read moreread less

Abstract: Standard stochastic control methods assume that the probability distribution of uncertain variables is available. Unfortunately, in practice, obtaining accurate distribution information is a challenging task. To resolve this issue, in this article we investigate the problem of designing a control policy that is robust against errors in the empirical distribution obtained from data. This problem can be formulated as a two-player zero-sum dynamic game problem, where the action space of the adversarial player is a Wasserstein ball centered at the empirical distribution. A dynamic programming solution is provided exploiting the reformulation techniques for Wasserstein distributionally robust optimization. We show that the contraction property of associated Bellman operators extends a single-stage out-of-sample performance guarantee , obtained using a measure concentration inequality, to the corresponding multistage guarantee without any degradation in the confidence level. Furthermore, we characterize an explicit form of the optimal control policy and the worst-case distribution policy for linear-quadratic problems with Wasserstein penalty.

...read moreread less

48 citations

Journal Article•DOI•

Approximate Dynamic Programming for Nonlinear-Constrained Optimizations

[...]

Xiong Yang¹, Haibo He², Xiangnan Zhong³•Institutions (3)

Tianjin University¹, University of Rhode Island², University of North Texas³

15 Apr 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: First, it is proved that the solution of the constrained optimization problem can be obtained through solving an array of optimal control problems of constrained auxiliary subsystems, and under the framework of approximate dynamic programming, a simultaneous policy iteration (SPI) algorithm is presented to solve the Hamilton–Jacobi–Bellman equations corresponding to the constrainediliary subsystems.

...read moreread less

Abstract: In this paper, we study the constrained optimization problem of a class of uncertain nonlinear interconnected systems. First, we prove that the solution of the constrained optimization problem can be obtained through solving an array of optimal control problems of constrained auxiliary subsystems. Then, under the framework of approximate dynamic programming, we present a simultaneous policy iteration (SPI) algorithm to solve the Hamilton–Jacobi–Bellman equations corresponding to the constrained auxiliary subsystems. By building an equivalence relationship, we demonstrate the convergence of the SPI algorithm. Meanwhile, we implement the SPI algorithm via an actor–critic structure, where actor networks are used to approximate optimal control policies and critic networks are applied to estimate optimal value functions. By using the least squares method and the Monte Carlo integration technique together, we are able to determine the weight vectors of actor and critic networks. Finally, we validate the developed control method through the simulation of a nonlinear interconnected plant.

...read moreread less

46 citations

Journal Article•DOI•

Optimization of Task Scheduling and Dynamic Service Strategy for Multi-UAV-Enabled Mobile-Edge Computing System

[...]

Yizhe Luo¹, Wenrui Ding¹, Baochang Zhang¹•Institutions (1)

Beihang University¹

18 Jan 2021-IEEE Transactions on Cognitive Communications and Networking

TL;DR: Experimental results demonstrate that the proposed two-layer optimization strategy achieves a favorable performance than those of greedy and random strategies in terms of total user energy consumption, the trajectory conflicts can be eliminated effectively, and the UAV trajectory can satisfy the safety constraints.

...read moreread less

Abstract: In this study, we introduce a multi-unmanned aerial vehicle (multi-UAV) enabled mobile edge computing (MEC) system, with UAVs as the computing server for the task offloading of ground users. The energy consumption for ground users is minimized by jointly optimizing the UAV task scheduling, bit allocation, and UAV trajectory in a unified framework. To accomplish such goal, we propose a two-layer optimization strategy, where the upper layer optimizes the UAV task scheduling based on a dynamic programming-based bidding optimization method, while the lower one solves the bit allocation and UAV trajectory. In particular, the lower layer is decoupled into several subproblems to reduce the computational complexity, which can be easily solved using an alternating direction method of multipliers. However, the UAV trajectories optimized by solving the decoupled subproblems may lead to path conflicts. As such, we further propose a re-optimization strategy to eliminate such conflicts. Experimental results demonstrate that the proposed strategy achieves a favorable performance than those of greedy and random strategies in terms of total user energy consumption, the trajectory conflicts can be eliminated effectively, and the UAV trajectory can satisfy the safety constraints.

...read moreread less

Posted Content•

Mean-field Markov decision processes with common noise and open-loop controls

[...]

Médéric Motte¹, Huyên Pham¹•Institutions (1)

Paris Diderot University¹

08 Sep 2021-arXiv: Optimization and Control

TL;DR: The correspondence between CMKV-MDP and a general lifted MDP on the space of probability measures is proved, and the dynamic programming Bellman fixed point equation satisfied by the value function is established.

...read moreread less

Abstract: We develop an exhaustive study of Markov decision process (MDP) under mean field interaction both on states and actions in the presence of common noise, and when optimization is performed over open-loop controls on infinite horizon. Such model, called CMKV-MDP for conditional McKean-Vlasov MDP, arises and is obtained here rigorously with a rate of convergence as the asymptotic problem of N-cooperative agents controlled by a social planner/influencer that observes the environment noises but not necessarily the individual states of the agents. We highlight the crucial role of relaxed controls and randomization hypothesis for this class of models with respect to classical MDP theory. We prove the correspondence between CMKV-MDP and a general lifted MDP on the space of probability measures, and establish the dynamic programming Bellman fixed point equation satisfied by the value function, as well as the existence of-optimal randomized feedback controls. The arguments of proof involve an original measurable optimal coupling for the Wasserstein distance. This provides a procedure for learning strategies in a large population of interacting collaborative agents. MSC Classification: 90C40, 49L20.

...read moreread less

Journal Article•DOI•

Adaptive Optimal Control of Linear Periodic Systems: An Off-Policy Value Iteration Approach

[...]

Bo Pang¹, Zhong-Ping Jiang¹•Institutions (1)

New York University¹

01 Feb 2021-IEEE Transactions on Automatic Control

TL;DR: A novel value iteration based off-policy adaptive dynamic programming (ADP) algorithm is proposed for a general class of CTLP systems, so that approximate optimal solutions can be obtained directly from the collected data, without the exact knowledge of system dynamics.

...read moreread less

Abstract: This article studies the infinite-horizon adaptive optimal control of continuous-time linear periodic (CTLP) systems. A novel value iteration (VI) based off-policy adaptive dynamic programming (ADP) algorithm is proposed for a general class of CTLP systems, so that approximate optimal solutions can be obtained directly from the collected data, without the exact knowledge of system dynamics. Under mild conditions, the proofs on uniform convergence of the proposed algorithm to the optimal solutions are given for both the model-based and model-free cases. The VI-based ADP algorithm is able to find suboptimal controllers without assuming the knowledge of an initial stabilizing controller. Application to the optimal control of a triple inverted pendulum subjected to a periodically varying load demonstrates the feasibility and effectiveness of the proposed method.

...read moreread less

Journal Article•DOI•

Fault-Tolerant Optimal Control for Discrete-Time Nonlinear System Subjected to Input Saturation: A Dynamic Event-Triggered Approach

[...]

Peng Zhang¹, Yuan Yuan¹, Lei Guo²•Institutions (2)

Northwestern Polytechnical University¹, Beihang University²

18 May 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper investigates the dynamic event-triggered fault-tolerant optimal control strategy for a class of output feedback nonlinear discrete-time systems subject to actuator faults and input saturations.

...read moreread less

Abstract: This paper investigates the dynamic event-triggered fault-tolerant optimal control strategy for a class of output feedback nonlinear discrete-time systems subject to actuator faults and input saturations To save the communication resources between the sensor and the controller, the so-called dynamic event-triggered mechanism is adopted to schedule the measurement signal A neural network-based observer is first designed to provide both the system states and fault information Then, with consideration of the actuator saturation phenomenon, the adaptive dynamic programming (ADP) algorithm is designed based on the estimates provided by the observer To reduce the computational burden, the optimal control strategy is implemented via the single network adaptive critic architecture The sufficient conditions are provided to guarantee the boundedness of the overall closed-loop systems Finally, the numerical simulations on a two-link flexible manipulator system are provided to verify the validity of the proposed control strategy

...read moreread less

Journal Article•DOI•

Interior Point Differential Dynamic Programming

[...]

Andrei Pavlov¹, Iman Shames¹, Chris Manzie¹•Institutions (1)

University of Melbourne¹

21 Jan 2021-IEEE Transactions on Control Systems and Technology

TL;DR: In this paper, a differential dynamic programming (DDP) algorithm for solving discrete-time finite-horizon optimal control problems with inequality constraints was proposed, which can handle nonlinear state and input inequality constraints without a discernible increase in its computational complexity relative to the unconstrained case.

...read moreread less

Abstract: This brief introduces a novel differential dynamic programming (DDP) algorithm for solving discrete-time finite-horizon optimal control problems with inequality constraints. Two variants, namely feasible- and infeasible-IPDDP algorithms, are developed using a primal–dual interior-point methodology, and their local quadratic convergence properties are characterized. We show that the stationary points of the algorithms are the perturbed KKT points, and thus can be moved arbitrarily close to a locally optimal solution. Being free from the burden of the active-set methods, it can handle nonlinear state and input inequality constraints without a discernible increase in its computational complexity relative to the unconstrained case. The performance of the proposed algorithms is demonstrated using numerical experiments on three different problems: control-limited inverted pendulum, car-parking, and unicycle motion control and obstacle avoidance.

...read moreread less

Journal Article•DOI•

Event-Triggered Control of Discrete-Time Zero-Sum Games via Deterministic Policy Gradient Adaptive Dynamic Programming

[...]

Yongwei Zhang¹, Bo Zhao², Derong Liu³, Shunchao Zhang¹•Institutions (3)

Guangdong University of Technology¹, Beijing Normal University², University of Illinois at Chicago³

31 Aug 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This article develops a novel event-triggered control (ETC) approach based on the deterministic policy gradient (PG) adaptive dynamic programming (ADP) algorithm which updates the control law and the disturbance law with a gradient descent algorithm.

...read moreread less

Abstract: In order to address zero-sum game problems for discrete-time (DT) nonlinear systems, this article develops a novel event-triggered control (ETC) approach based on the deterministic policy gradient (PG) adaptive dynamic programming (ADP) algorithm. By adopting the input and output data, the proposed ETC method updates the control law and the disturbance law with a gradient descent algorithm. Compared with the conventional PG ADP-based control scheme, the present controller is updated aperiodically to reduce the computational and communication burden. Then, the actor-critic-disturbance framework is adopted to obtain the optimal control law and the worst disturbance law, which guarantee the input-to-state stability of the closed-loop system. Moreover, a novel neural network weight updating law which guarantees the uniform ultimate boundedness of weight estimation errors is provided based on the experience replay technique. Finally, the validity of the present method is verified by simulation of two DT nonlinear systems.

...read moreread less

Journal Article•DOI•

Optimization of rule-based energy management strategies for hybrid vehicles using dynamic programming

[...]

Di Zhu, Ewan Pritchard, Sumanth Reddy Dadam, Vivek Kumar, Yang Xu - Show less +1 more

30 Mar 2021-Combustion Engines

TL;DR: This paper reports the utility factor-weighted energy consumption using a rule-based strategy under a real-world representative drive cycle and compares results from both rule- based and optimization-based strategies.

...read moreread less

Abstract: Reducing energy consumption is a key focus for hybrid electric vehicle (HEV) development. The popular vehicle dynamic model used in many energy management optimization studies does not capture the vehicle dynamics that the in-vehicle measurement system does. However, feedback from the measurement system is what the vehicle controller actually uses to manage energy consumption. Therefore, the optimization solely using the model does not represent what the vehicle controller sees in the vehicle. This paper reports the utility factor-weighted energy consumption using a rule-based strategy under a real-world representative drive cycle. In addition, the vehicle test data was used to perform the optimization approach. By comparing results from both rule-based and optimization-based strategies, the areas for further improving rule-based strategy are discussed. Furthermore, recent development of OBD raises a concern about the increase of energy consumption. This paper investigates the energy consumption increase with extensive OBD usage.

...read moreread less

Journal Article•DOI•

Event-Triggered ADP for Tracking Control of Partially Unknown Constrained Uncertain Systems.

[...]

Shan Xue¹, Biao Luo², Derong Liu³, Ying Gao¹•Institutions (3)

South China University of Technology¹, Central South University², University of Illinois at Chicago³

04 Mar 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: In this article, an event-triggered adaptive dynamic programming (ADP) algorithm is developed to solve the tracking control problem for partially unknown constrained uncertain systems, where the learning of neural network weights not only relaxes the initial admissible control but also executes only when the predefined execution rule is violated.

...read moreread less

Abstract: An event-triggered adaptive dynamic programming (ADP) algorithm is developed in this article to solve the tracking control problem for partially unknown constrained uncertain systems. First, an augmented system is constructed, and the solution of the optimal tracking control problem of the uncertain system is transformed into an optimal regulation of the nominal augmented system with a discounted value function. The integral reinforcement learning is employed to avoid the requirement of augmented drift dynamics. Second, the event-triggered ADP is adopted for its implementation, where the learning of neural network weights not only relaxes the initial admissible control but also executes only when the predefined execution rule is violated. Third, the tracking error and the weight estimation error prove to be uniformly ultimately bounded, and the existence of a lower bound for the interexecution times is analyzed. Finally, simulation results demonstrate the effectiveness of the present event-triggered ADP method.

...read moreread less

Journal Article•DOI•

Robust Neurooptimal Control for a Robot via Adaptive Dynamic Programming

[...]

Linghuan Kong¹, Wei He¹, Chenguang Yang², Changyin Sun³•Institutions (3)

University of Science and Technology Beijing¹, University of the West of England², Southeast University³

01 Jun 2021-IEEE Transactions on Neural Networks

TL;DR: In this paper, neural networks are employed to approximate the solution of the Hamilton-Jacobi-Isaacs equation under the frame of adaptive dynamic programming, based on the standard gradient attenuation algorithm and adaptive critic design.

...read moreread less

Abstract: We aim at the optimization of the tracking control of a robot to improve the robustness, under the effect of unknown nonlinear perturbations. First, an auxiliary system is introduced, and optimal control of the auxiliary system can be seen as an approximate optimal control of the robot. Then, neural networks (NNs) are employed to approximate the solution of the Hamilton–Jacobi–Isaacs equation under the frame of adaptive dynamic programming. Next, based on the standard gradient attenuation algorithm and adaptive critic design, NNs are trained depending on the designed updating law with relaxing the requirement of initial stabilizing control. In light of the Lyapunov stability theory, all the error signals can be proved to be uniformly ultimately bounded. A series of simulation studies are carried out to show the effectiveness of the proposed control.

...read moreread less

Journal Article•DOI•

Deep learning for solving dynamic economic models.

[...]

Lilia Maliar¹, Serguei Maliar², Pablo Winant³•Institutions (3)

City University of New York¹, Santa Clara University², École Polytechnique³

01 Sep 2021-Journal of Monetary Economics

TL;DR: A unified deep learning method that solves dynamic economic models by casting them into nonlinear regression equations for three fundamental objects of economic dynamics – lifetime reward functions, Bellman equations and Euler equations is introduced.

...read moreread less

Journal Article•DOI•

Optimal mesh discretization of the dynamic programming for hybrid electric vehicles

[...]

Claudio Maino¹, Daniela Anna Misul¹, Alessia Musa¹, Ezio Spessa¹•Institutions (1)

Polytechnic University of Turin¹

15 Jun 2021-Applied Energy

TL;DR: A self-adaptive statistical approach based on a proper management of any admissible battery energy variation is developed to significantly improve the calculation times required for HEV architectures while still attaining the best possible accuracy in terms of CO 2 emissions as well as total cost of ownership (TCO).

...read moreread less

Journal Article•DOI•

An Adaptive Dynamic Programming Scheme for Nonlinear Optimal Control With Unknown Dynamics and Its Application to Turbofan Engines

[...]

Tao Sun¹, Xi-Ming Sun¹•Institutions (1)

Dalian University of Technology¹

01 Jan 2021-IEEE Transactions on Industrial Informatics

TL;DR: An ADP algorithm for the optimal control problem with unknown nonlinear dynamic model is developed by using the basis function approximation method and Newton–Leibniz formula, which can update the control strategy online by utilizing input and output information of the system.

...read moreread less

Abstract: In this article, a novel adaptive dynamic programming (ADP) approach is proposed for the optimal control problem of nonlinear continuous control systems with unknown dynamics. First, an alternating iteration algorithm based on Hamilton–Jacobi–Bellman equation is proposed for the optimal control of known nonlinear control systems. Then, the convergence results of the alternating iteration algorithm are obtained by using mathematical induction and monotone bounded convergence theorem. Moreover, the global asymptotic stability of the nonlinear closed-loop system is proved. Second, based on the scheme of alternating iteration algorithm, an ADP algorithm for the optimal control problem with unknown nonlinear dynamic model is developed by using the basis function approximation method and Newton–Leibniz formula, which can update the control strategy online by utilizing input and output information of the system. In addition, the convergence analysis of the proposed ADP algorithm is derived. Finally, the feasibility of the established results is verified by two examples, and the ADP method is applied to the optimal tracking fuel control problem of turbofan engines.

...read moreread less

Journal Article•DOI•

Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach.

[...]

Tao Bian¹, Zhong-Ping Jiang¹•Institutions (1)

New York University¹

08 Jan 2021-IEEE Transactions on Neural Networks

TL;DR: In this article, the adaptive control problem for continuous-time nonlinear systems described by differential equations is studied and a learning-based control algorithm is proposed to learn robust optimal controllers directly from real-time data.

...read moreread less

Abstract: This article studies the adaptive optimal control problem for continuous-time nonlinear systems described by differential equations. A key strategy is to exploit the value iteration (VI) method proposed initially by Bellman in 1957 as a fundamental tool to solve dynamic programming problems. However, previous VI methods are all exclusively devoted to the Markov decision processes and discrete-time dynamical systems. In this article, we aim to fill up the gap by developing a new continuous-time VI method that will be applied to address the adaptive or nonadaptive optimal control problems for continuous-time systems described by differential equations. Like the traditional VI, the continuous-time VI algorithm retains the nice feature that there is no need to assume the knowledge of an initial admissible control policy. As a direct application of the proposed VI method, a new class of adaptive optimal controllers is obtained for nonlinear systems with totally unknown dynamics. A learning-based control algorithm is proposed to show how to learn robust optimal controllers directly from real-time data. Finally, two examples are given to illustrate the efficacy of the proposed methodology.

...read moreread less

Journal Article•DOI•

Zero-Sum Game-Based Optimal Secure Control Under Actuator Attacks

[...]

Chengwei Wu¹, Xiaolei Li¹, Wei Pan², Jianxing Liu¹, Ligang Wu¹ - Show less +1 more•Institutions (2)

Harbin Institute of Technology¹, Delft University of Technology²

01 Aug 2021-IEEE Transactions on Automatic Control

TL;DR: This article investigates the zero-sum game-based secure control problem for cyber–physical systems (CPS) under the actuator false data injection attacks and derives the optimal defending policy and the attack policy from the dynamic programming approach.

...read moreread less

Abstract: This article investigates the zero-sum game-based secure control problem for cyber–physical systems (CPS) under the actuator false data injection attacks. The physical process is described as a linear time-invariant discrete-time model. Both the process noise and the measurement noise are addressed in the design process. An optimal Kalman filter is given to estimate the system states. The adversary and the defender are modeled as two players. Under the zero-sum game framework, an optimal infinite-horizon quadratic cost function is defined. Employing the dynamic programming approach, the optimal defending policy and the attack policy are derived. The convergence of the cost function is proved. Moreover, the critical attack probability is derived, beyond which the cost cannot be bounded. Finally, simulation results are provided to validate the proposed secure scheme.

...read moreread less

Journal Article•DOI•

Time-optimal gearshift and energy management strategies for a hybrid electric race car

[...]

Pol Duhr¹, Grigorios Christodoulou¹, Camillo Balerna¹, Mauro Salazar², Alberto Cerofolini³, Christopher H. Onder¹ - Show less +2 more•Institutions (3)

ETH Zurich¹, Eindhoven University of Technology², Ferrari³

15 Jan 2021-Applied Energy

TL;DR: An algorithm to calculate the time-optimal energy management and gearshift strategies for the Formula 1 race car is presented, which combines convex optimization, dynamic programming and Pontryagin’s minimum principle in an iterative scheme to solve the arising mixed-integer optimization problem.

...read moreread less

Journal Article•DOI•

Learning-Based Balance Control of Wheel-Legged Robots

[...]

Leilei Cui¹, Shuai Wang², Jingfan Zhang³, Zhang Dongsheng², Lai Jie², Yu Zheng², Zhengyou Zhang², Zhong-Ping Jiang¹ - Show less +4 more•Institutions (3)

New York University¹, Tencent², University of Manchester³

27 Jul 2021

TL;DR: In this article, the adaptive optimal control problem for a wheel-legged robot in the absence of an accurate dynamic model is studied and a learning-based solution is derived from input-state data collected along the trajectories of the robot.

...read moreread less

Abstract: This letter studies the adaptive optimal control problem for a wheel-legged robot in the absence of an accurate dynamic model. A crucial strategy is to exploit recent advances in reinforcement learning (RL) and adaptive dynamic programming (ADP) to derive a learning-based solution to adaptive optimal control. It is shown that suboptimal controllers can be learned directly from input-state data collected along the trajectories of the robot. Rigorous proofs for the convergence of the novel data-driven value iteration (VI) algorithm and the stability of the closed-loop robot system are provided. Experiments are conducted to demonstrate the efficiency of the novel adaptive suboptimal controller derived from the data-driven VI algorithm in balancing the wheel-legged robot to the equilibrium.

...read moreread less

Journal Article•DOI•

Modeling and Trajectory Tracking Control for Magnetic Wheeled Mobile Robots Based on Improved Dual-Heuristic Dynamic Programming

[...]

Songyi Dian¹, Hongwei Fang¹, Tao Zhao¹, Qing Wu¹, Yi Hu¹, Rui Guo, Shengchuan Li² - Show less +3 more•Institutions (2)

Sichuan University¹, Electric Power Research Institute²

01 Feb 2021-IEEE Transactions on Industrial Informatics

TL;DR: In this article, a mathematical model of a magnetic-wheeled mobile robot used for wall climbing in some special industrial sites is established, and to realize the precise motion control of the robot, an intelligent discrete algorithm for trajectory tracking control is presented.

...read moreread less

Abstract: In this article, a mathematical model of a magnetic-wheeled mobile robot (MWMR) used for wall climbing in some special industrial sites is established, and to realize the precise motion control of the robot, an intelligent discrete algorithm for trajectory tracking control of the MWMR is presented. The robot is subjected to nonholonomic constraints when moving on the wall. The discrete mathematical model of the MWMR is established with the description of the kinematics and dynamics, where the dynamics are described by the second-order Lagrange's equations. Improved dual-heuristic dynamic programming (DHP) where the actor-critic structure adopts random vector functional link neural networks (RVFL NNs) is the main configuration of the trajectory tracking control algorithm. Moreover, the tracking control algorithm is supplied by a PD controller and a supervisory element to generate overall control signals. The improved strategy is to optimize the input layer weights of RVFL NNs by a genetic algorithm to improve the approximation performance of DHP. Simulations are performed to test the trajectory tracking control algorithm for this wall-climbing robot, and the results are compared with those of neural network tracking control algorithm. Comparative analysis verifies the effectiveness and advancement of the proposed method.

...read moreread less

Journal Article•DOI•

A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems

[...]

Chun Li¹, Jinliang Ding¹, Frank L. Lewis², Tianyou Chai¹•Institutions (2)

Northeastern University (China)¹, University of Texas at Arlington²

01 Jul 2021-Automatica

TL;DR: In this article, a novel formulation of the value function is presented for the optimal tracking problem (TP) of nonlinear discrete-time systems, and the optimal control policy can be deduced without considering the reference control input.

...read moreread less

Journal Article•DOI•

A Bisection Algorithm for Time-Optimal Trajectory Planning Along Fully Specified Paths

[...]

Eric Barnett¹, Clément Gosselin¹•Institutions (1)

Laval University¹

01 Feb 2021-IEEE Transactions on Robotics

TL;DR: A novel technique, called the bisection algorithm (BA), is introduced, which is fully implemented in C++ and extends dynamic programming approaches to the problem, and is shown to be significantly simpler, faster, and more robust than recently proposed algorithms.

...read moreread less

Abstract: The time-optimal trajectory planning problem involves minimizing the time required to follow a path defined in space, subject to kinematic and dynamic constraints. Here, we introduce a novel technique, called the bisection algorithm (BA), which is fully implemented in C++ and extends dynamic programming approaches to the problem. These approaches, which rely on dividing the global problem into a series of simpler subproblems, become increasingly advantageous compared to direct transcription methods as the number of problem constraints increases. In contrast to nearly all other dynamic programming approaches, BA does not rely on finding a maximum-velocity curve or explicitly finding acceleration switching points during the trajectory planning process. Additionally, only one forward and one backward integration are used, during which all constraints are imposed. This approach is made feasible through careful control of the numerical integration process and the use of a bisection algorithm to resolve constraint violations during integration. BA is shown to be significantly simpler, faster, and more robust than recently proposed algorithms: a direct comparison is made for a series of paths to be followed by a serial manipulator, subject to kinematic constraints. The wide applicability of BA is then established by solving the time-optimal problem for a parallel manipulator following a complex path, subject to both kinematic and dynamic constraints.

...read moreread less

Posted Content•

Deep Policy Dynamic Programming for Vehicle Routing Problems

[...]

Wouter Kool¹, Herke van Hoof, Joaquim A. S. Gromicho, Max Welling•Institutions (1)

University of Amsterdam¹

23 Feb 2021-arXiv: Learning

TL;DR: Deep Policy Dynamic Programming (DPDP) as mentioned in this paper prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.

...read moreread less

Abstract: Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms can find optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming other `neural approaches' for solving TSPs and VRPs with 100 nodes.

...read moreread less

Collapse