scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 2021"


Journal ArticleDOI
TL;DR: This article investigates the unmanned aerial vehicle (UAV)-assisted wireless powered Internet-of-Things system, where a UAV takes off from a data center, flies to each of the ground sensor nodes (SNs) in order to transfer energy and collect data from the SNs, and then returns to the data center.
Abstract: This article investigates the unmanned aerial vehicle (UAV)-assisted wireless powered Internet-of-Things system, where a UAV takes off from a data center, flies to each of the ground sensor nodes (SNs) in order to transfer energy and collect data from the SNs, and then returns to the data center. For such a system, an optimization problem is formulated to minimize the average Age of Information (AoI) of the data collected from all ground SNs. Since the average AoI depends on the UAV’s trajectory, the time required for energy harvesting (EH) and data collection for each SN, these factors need to be optimized jointly. Moreover, instead of the traditional linear EH model, we employ a nonlinear model because the behavior of the EH circuits is nonlinear by nature. To solve this nonconvex problem, we propose to decompose it into two subproblems, i.e., a joint energy transfer and data collection time allocation problem and a UAV’s trajectory planning problem. For the first subproblem, we prove that it is convex and give an optimal solution by using Karush–Kuhn–Tucker (KKT) conditions. This solution is used as the input for the second subproblem, and we solve optimally it by designing dynamic programming (DP) and ant colony (AC) heuristic algorithms. The simulation results show that the DP-based algorithm obtains the minimal average AoI of the system, and the AC-based heuristic finds solutions with near-optimal average AoI. The results also reveal that the average AoI increases as the flying altitude of the UAV increases and linearly with the size of the collected data at each ground SN.

138 citations


Journal ArticleDOI
TL;DR: A self-adaptive differential evolution algorithm is developed for addressing a single BPM scheduling problem with unequal release times and job sizes and results demonstrate that the proposed self- Adaptive algorithm is more effective than other algorithms for this scheduling problem.
Abstract: Batch-processing machines (BPMs) can process a number of jobs at a time, which can be found in many industrial systems. This article considers a single BPM scheduling problem with unequal release times and job sizes. The goal is to assign jobs into batches without breaking the machine capacity constraint and then sort the batches to minimize the makespan. A self-adaptive differential evolution algorithm is developed for addressing the problem. In our proposed algorithm, mutation operators are adaptively chosen based on their historical performances. Also, control parameter values are adaptively determined based on their historical performances. Our proposed algorithm is compared to CPLEX, existing metaheuristics for this problem and conventional differential evolution algorithms through comprehensive experiments. The experimental results demonstrate that our proposed self-adaptive algorithm is more effective than other algorithms for this scheduling problem.

137 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigated the problem of path following for the underactuated unmanned surface vehicles (USVs) subject to state constraints and proposed a useful control algorithm by combining the backstepping technique, adaptive dynamic programming (ADP), and the event-triggered mechanism.
Abstract: This article investigates the problem of path following for the underactuated unmanned surface vehicles (USVs) subject to state constraints. A useful control algorithm is proposed by combining the backstepping technique, adaptive dynamic programming (ADP), and the event-triggered mechanism. The presented approach consists of three modules: guidance law, dynamic controller, and event triggering. First, to deal with the ``singularity'' problem, the guidance-based path-following (GBPF) principle is introduced in the guidance law loop. In contrast to the traditional barrier Lyapunov function (BLF) method, this article converts the USV's constraint model to a class of nonlinear systems without state constraints by introducing a nonlinear mapping. The control signal generated by the dynamic controller module consists of a backstepping-based feedforward control signal and an ADP-based approximate optimal feedback control signal. Therefore, the presented scheme can guarantee the approximate optimal performance. To approximate the cost function and its partial derivative, a critic neural network (NN) is constructed. By considering the event-triggered condition, the dynamic controller is further improved. Compared with traditional time-triggered control methods, the proposed approach can greatly reduce communication and computational burdens. This article proves that the closed-loop system is stable, and the simulation results and experimental validation are given to illustrate the effectiveness of the proposed approach.

66 citations


Journal ArticleDOI
TL;DR: This work presents SDDP.jl, an open-source library for solving multistage stochastic programming problems using the Stochastic dual dynamic programming algorithm.
Abstract: We present SDDP.jl, an open-source library for solving multistage stochastic programming problems using the stochastic dual dynamic programming algorithm. SDDP.jl is built on JuMP, an algebraic mod...

59 citations


Journal ArticleDOI
TL;DR: In this article, the data-based two-player zero-sum game problem is considered for linear discrete-time systems and it is proved that the PIQL algorithm is equivalent to the Newton iteration method in the Banach space by using the Fréchet derivative.
Abstract: In this article, the data-based two-player zero-sum game problem is considered for linear discrete-time systems. This problem theoretically depends on solving the discrete-time game algebraic Riccati equation (DTGARE), while it requires complete system dynamics. To avoid solving the DTGARE, the $Q$ -function is introduced and a data-based policy iteration $Q$ -learning (PIQL) algorithm is developed to learn the optimal $Q$ -function by using data collected from the real system. Writing the $Q$ -function in a quadratic form, it is proved that the PIQL algorithm is equivalent to the Newton iteration method in the Banach space by using the Frechet derivative. Then, the convergence of the PIQL algorithm can be guaranteed by Kantorovich’s theorem. For the realization of the PIQL algorithm, the off-policy learning scheme is proposed using real data rather than the system model. Finally, the efficiency of the developed data-based PIQL method is validated through simulation studies.

55 citations


Journal ArticleDOI
Xuekai Wang1, Tao Tang1, Shuai Su1, Jiateng Yin1, Ziyou Gao1, Nan Lv 
TL;DR: An integrated energy-efficient train operation method in which the driving strategy and the train timetable are jointly optimized is proposed, which can reduce the net energy consumption and the computing time by up to 25.0% compared to the result without optimization.
Abstract: Reduction on the traction energy and increasing of the reused regenerative energy are two main ways for saving energy in metro systems, which are related to the driving strategy as well as the train timetable. To minimize the systematic energy, this paper proposes an integrated energy-efficient train operation method in which the driving strategy and the train timetable are jointly optimized. Firstly, the models of calculating the traction energy and the reuse of the regenerate energy are introduced with the constraints of the train operation. Then, the systematical optimization model is formulated by taking the net energy (i.e., the difference between the traction energy and the reused regenerate energy) as the objective function. Based on the Space–-Time-Speed network methodology, the optimization model is transformed into a discrete decision problem. Next, two algorithms are used to solve the problem. The dynamic programming algorithm is used to obtain the global optimal solution, and the discrete differential dynamic programming algorithm is applied to get the approximate optimal solution to reduce the computing time. Finally, two numeral examples are conducted to illustrate the effectiveness of the proposed method on energy saving. The method can reduce the net energy consumption by up to 25.0% compared to the result without optimization and by up to 8.7% compared to the result by using the two-stage method.

55 citations


Journal ArticleDOI
TL;DR: This article considers a Mayer-type optimal control problem of probabilistic Boolean control networks (PBCNs) with uncertainty on selection probabilities which obey Beta Probabilistic distributions and deduces an equivalent formulation as a multistage decision problem.
Abstract: This article considers a Mayer-type optimal control problem of probabilistic Boolean control networks (PBCNs) with uncertainty on selection probabilities which obey Beta probabilistic distributions. The expectation with respect to both the selection probabilities and the transitions of state variables is set as a cost function, and it deduces an equivalent formulation as a multistage decision problem. Furthermore, the dynamic programming technique is applied to solve the problem and performs a novel optimization algorithm in the fashion of semitensor product. A numerical example of a biological model of apoptosis protein demonstrates the effectiveness and feasibility of the proposed framework and algorithms.

54 citations


Journal ArticleDOI
TL;DR: A tensor decomposition approach for the solution of high-dimensional, fully nonlinear Hamilton-Jacobi-Bellman equations arising in optimal feedback control of nonlinear dynamics is presented in this article.
Abstract: A tensor decomposition approach for the solution of high-dimensional, fully nonlinear Hamilton--Jacobi--Bellman equations arising in optimal feedback control of nonlinear dynamics is presented. The...

50 citations


Journal ArticleDOI
TL;DR: This article characterize an explicit form of the optimal control policy and the worst-case distribution policy for linear-quadratic problems with Wasserstein penalty and shows that the contraction property of associated Bellman operators extends a single-stage out-of-sample performance guarantee to the corresponding multistage guarantee without any degradation in the confidence level.
Abstract: Standard stochastic control methods assume that the probability distribution of uncertain variables is available. Unfortunately, in practice, obtaining accurate distribution information is a challenging task. To resolve this issue, in this article we investigate the problem of designing a control policy that is robust against errors in the empirical distribution obtained from data. This problem can be formulated as a two-player zero-sum dynamic game problem, where the action space of the adversarial player is a Wasserstein ball centered at the empirical distribution. A dynamic programming solution is provided exploiting the reformulation techniques for Wasserstein distributionally robust optimization. We show that the contraction property of associated Bellman operators extends a single-stage out-of-sample performance guarantee , obtained using a measure concentration inequality, to the corresponding multistage guarantee without any degradation in the confidence level. Furthermore, we characterize an explicit form of the optimal control policy and the worst-case distribution policy for linear-quadratic problems with Wasserstein penalty.

48 citations


Journal ArticleDOI
TL;DR: First, it is proved that the solution of the constrained optimization problem can be obtained through solving an array of optimal control problems of constrained auxiliary subsystems, and under the framework of approximate dynamic programming, a simultaneous policy iteration (SPI) algorithm is presented to solve the Hamilton–Jacobi–Bellman equations corresponding to the constrainediliary subsystems.
Abstract: In this paper, we study the constrained optimization problem of a class of uncertain nonlinear interconnected systems. First, we prove that the solution of the constrained optimization problem can be obtained through solving an array of optimal control problems of constrained auxiliary subsystems. Then, under the framework of approximate dynamic programming, we present a simultaneous policy iteration (SPI) algorithm to solve the Hamilton–Jacobi–Bellman equations corresponding to the constrained auxiliary subsystems. By building an equivalence relationship, we demonstrate the convergence of the SPI algorithm. Meanwhile, we implement the SPI algorithm via an actor–critic structure, where actor networks are used to approximate optimal control policies and critic networks are applied to estimate optimal value functions. By using the least squares method and the Monte Carlo integration technique together, we are able to determine the weight vectors of actor and critic networks. Finally, we validate the developed control method through the simulation of a nonlinear interconnected plant.

46 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed two-layer optimization strategy achieves a favorable performance than those of greedy and random strategies in terms of total user energy consumption, the trajectory conflicts can be eliminated effectively, and the UAV trajectory can satisfy the safety constraints.
Abstract: In this study, we introduce a multi-unmanned aerial vehicle (multi-UAV) enabled mobile edge computing (MEC) system, with UAVs as the computing server for the task offloading of ground users. The energy consumption for ground users is minimized by jointly optimizing the UAV task scheduling, bit allocation, and UAV trajectory in a unified framework. To accomplish such goal, we propose a two-layer optimization strategy, where the upper layer optimizes the UAV task scheduling based on a dynamic programming-based bidding optimization method, while the lower one solves the bit allocation and UAV trajectory. In particular, the lower layer is decoupled into several subproblems to reduce the computational complexity, which can be easily solved using an alternating direction method of multipliers. However, the UAV trajectories optimized by solving the decoupled subproblems may lead to path conflicts. As such, we further propose a re-optimization strategy to eliminate such conflicts. Experimental results demonstrate that the proposed strategy achieves a favorable performance than those of greedy and random strategies in terms of total user energy consumption, the trajectory conflicts can be eliminated effectively, and the UAV trajectory can satisfy the safety constraints.

Posted Content
TL;DR: The correspondence between CMKV-MDP and a general lifted MDP on the space of probability measures is proved, and the dynamic programming Bellman fixed point equation satisfied by the value function is established.
Abstract: We develop an exhaustive study of Markov decision process (MDP) under mean field interaction both on states and actions in the presence of common noise, and when optimization is performed over open-loop controls on infinite horizon. Such model, called CMKV-MDP for conditional McKean-Vlasov MDP, arises and is obtained here rigorously with a rate of convergence as the asymptotic problem of N-cooperative agents controlled by a social planner/influencer that observes the environment noises but not necessarily the individual states of the agents. We highlight the crucial role of relaxed controls and randomization hypothesis for this class of models with respect to classical MDP theory. We prove the correspondence between CMKV-MDP and a general lifted MDP on the space of probability measures, and establish the dynamic programming Bellman fixed point equation satisfied by the value function, as well as the existence of-optimal randomized feedback controls. The arguments of proof involve an original measurable optimal coupling for the Wasserstein distance. This provides a procedure for learning strategies in a large population of interacting collaborative agents. MSC Classification: 90C40, 49L20.

Journal ArticleDOI
TL;DR: A novel value iteration based off-policy adaptive dynamic programming (ADP) algorithm is proposed for a general class of CTLP systems, so that approximate optimal solutions can be obtained directly from the collected data, without the exact knowledge of system dynamics.
Abstract: This article studies the infinite-horizon adaptive optimal control of continuous-time linear periodic (CTLP) systems. A novel value iteration (VI) based off-policy adaptive dynamic programming (ADP) algorithm is proposed for a general class of CTLP systems, so that approximate optimal solutions can be obtained directly from the collected data, without the exact knowledge of system dynamics. Under mild conditions, the proofs on uniform convergence of the proposed algorithm to the optimal solutions are given for both the model-based and model-free cases. The VI-based ADP algorithm is able to find suboptimal controllers without assuming the knowledge of an initial stabilizing controller. Application to the optimal control of a triple inverted pendulum subjected to a periodically varying load demonstrates the feasibility and effectiveness of the proposed method.

Journal ArticleDOI
TL;DR: This paper investigates the dynamic event-triggered fault-tolerant optimal control strategy for a class of output feedback nonlinear discrete-time systems subject to actuator faults and input saturations.
Abstract: This paper investigates the dynamic event-triggered fault-tolerant optimal control strategy for a class of output feedback nonlinear discrete-time systems subject to actuator faults and input saturations To save the communication resources between the sensor and the controller, the so-called dynamic event-triggered mechanism is adopted to schedule the measurement signal A neural network-based observer is first designed to provide both the system states and fault information Then, with consideration of the actuator saturation phenomenon, the adaptive dynamic programming (ADP) algorithm is designed based on the estimates provided by the observer To reduce the computational burden, the optimal control strategy is implemented via the single network adaptive critic architecture The sufficient conditions are provided to guarantee the boundedness of the overall closed-loop systems Finally, the numerical simulations on a two-link flexible manipulator system are provided to verify the validity of the proposed control strategy

Journal ArticleDOI
TL;DR: In this paper, a differential dynamic programming (DDP) algorithm for solving discrete-time finite-horizon optimal control problems with inequality constraints was proposed, which can handle nonlinear state and input inequality constraints without a discernible increase in its computational complexity relative to the unconstrained case.
Abstract: This brief introduces a novel differential dynamic programming (DDP) algorithm for solving discrete-time finite-horizon optimal control problems with inequality constraints. Two variants, namely feasible- and infeasible-IPDDP algorithms, are developed using a primal–dual interior-point methodology, and their local quadratic convergence properties are characterized. We show that the stationary points of the algorithms are the perturbed KKT points, and thus can be moved arbitrarily close to a locally optimal solution. Being free from the burden of the active-set methods, it can handle nonlinear state and input inequality constraints without a discernible increase in its computational complexity relative to the unconstrained case. The performance of the proposed algorithms is demonstrated using numerical experiments on three different problems: control-limited inverted pendulum, car-parking, and unicycle motion control and obstacle avoidance.

Journal ArticleDOI
TL;DR: This article develops a novel event-triggered control (ETC) approach based on the deterministic policy gradient (PG) adaptive dynamic programming (ADP) algorithm which updates the control law and the disturbance law with a gradient descent algorithm.
Abstract: In order to address zero-sum game problems for discrete-time (DT) nonlinear systems, this article develops a novel event-triggered control (ETC) approach based on the deterministic policy gradient (PG) adaptive dynamic programming (ADP) algorithm. By adopting the input and output data, the proposed ETC method updates the control law and the disturbance law with a gradient descent algorithm. Compared with the conventional PG ADP-based control scheme, the present controller is updated aperiodically to reduce the computational and communication burden. Then, the actor-critic-disturbance framework is adopted to obtain the optimal control law and the worst disturbance law, which guarantee the input-to-state stability of the closed-loop system. Moreover, a novel neural network weight updating law which guarantees the uniform ultimate boundedness of weight estimation errors is provided based on the experience replay technique. Finally, the validity of the present method is verified by simulation of two DT nonlinear systems.

Journal ArticleDOI
TL;DR: This paper reports the utility factor-weighted energy consumption using a rule-based strategy under a real-world representative drive cycle and compares results from both rule- based and optimization-based strategies.
Abstract: Reducing energy consumption is a key focus for hybrid electric vehicle (HEV) development. The popular vehicle dynamic model used in many energy management optimization studies does not capture the vehicle dynamics that the in-vehicle measurement system does. However, feedback from the measurement system is what the vehicle controller actually uses to manage energy consumption. Therefore, the optimization solely using the model does not represent what the vehicle controller sees in the vehicle. This paper reports the utility factor-weighted energy consumption using a rule-based strategy under a real-world representative drive cycle. In addition, the vehicle test data was used to perform the optimization approach. By comparing results from both rule-based and optimization-based strategies, the areas for further improving rule-based strategy are discussed. Furthermore, recent development of OBD raises a concern about the increase of energy consumption. This paper investigates the energy consumption increase with extensive OBD usage.

Journal ArticleDOI
TL;DR: In this article, an event-triggered adaptive dynamic programming (ADP) algorithm is developed to solve the tracking control problem for partially unknown constrained uncertain systems, where the learning of neural network weights not only relaxes the initial admissible control but also executes only when the predefined execution rule is violated.
Abstract: An event-triggered adaptive dynamic programming (ADP) algorithm is developed in this article to solve the tracking control problem for partially unknown constrained uncertain systems. First, an augmented system is constructed, and the solution of the optimal tracking control problem of the uncertain system is transformed into an optimal regulation of the nominal augmented system with a discounted value function. The integral reinforcement learning is employed to avoid the requirement of augmented drift dynamics. Second, the event-triggered ADP is adopted for its implementation, where the learning of neural network weights not only relaxes the initial admissible control but also executes only when the predefined execution rule is violated. Third, the tracking error and the weight estimation error prove to be uniformly ultimately bounded, and the existence of a lower bound for the interexecution times is analyzed. Finally, simulation results demonstrate the effectiveness of the present event-triggered ADP method.

Journal ArticleDOI
TL;DR: In this paper, neural networks are employed to approximate the solution of the Hamilton-Jacobi-Isaacs equation under the frame of adaptive dynamic programming, based on the standard gradient attenuation algorithm and adaptive critic design.
Abstract: We aim at the optimization of the tracking control of a robot to improve the robustness, under the effect of unknown nonlinear perturbations. First, an auxiliary system is introduced, and optimal control of the auxiliary system can be seen as an approximate optimal control of the robot. Then, neural networks (NNs) are employed to approximate the solution of the Hamilton–Jacobi–Isaacs equation under the frame of adaptive dynamic programming. Next, based on the standard gradient attenuation algorithm and adaptive critic design, NNs are trained depending on the designed updating law with relaxing the requirement of initial stabilizing control. In light of the Lyapunov stability theory, all the error signals can be proved to be uniformly ultimately bounded. A series of simulation studies are carried out to show the effectiveness of the proposed control.

Journal ArticleDOI
TL;DR: A unified deep learning method that solves dynamic economic models by casting them into nonlinear regression equations for three fundamental objects of economic dynamics – lifetime reward functions, Bellman equations and Euler equations is introduced.

Journal ArticleDOI
TL;DR: A self-adaptive statistical approach based on a proper management of any admissible battery energy variation is developed to significantly improve the calculation times required for HEV architectures while still attaining the best possible accuracy in terms of CO 2 emissions as well as total cost of ownership (TCO).

Journal ArticleDOI
TL;DR: An ADP algorithm for the optimal control problem with unknown nonlinear dynamic model is developed by using the basis function approximation method and Newton–Leibniz formula, which can update the control strategy online by utilizing input and output information of the system.
Abstract: In this article, a novel adaptive dynamic programming (ADP) approach is proposed for the optimal control problem of nonlinear continuous control systems with unknown dynamics. First, an alternating iteration algorithm based on Hamilton–Jacobi–Bellman equation is proposed for the optimal control of known nonlinear control systems. Then, the convergence results of the alternating iteration algorithm are obtained by using mathematical induction and monotone bounded convergence theorem. Moreover, the global asymptotic stability of the nonlinear closed-loop system is proved. Second, based on the scheme of alternating iteration algorithm, an ADP algorithm for the optimal control problem with unknown nonlinear dynamic model is developed by using the basis function approximation method and Newton–Leibniz formula, which can update the control strategy online by utilizing input and output information of the system. In addition, the convergence analysis of the proposed ADP algorithm is derived. Finally, the feasibility of the established results is verified by two examples, and the ADP method is applied to the optimal tracking fuel control problem of turbofan engines.

Journal ArticleDOI
TL;DR: In this article, the adaptive control problem for continuous-time nonlinear systems described by differential equations is studied and a learning-based control algorithm is proposed to learn robust optimal controllers directly from real-time data.
Abstract: This article studies the adaptive optimal control problem for continuous-time nonlinear systems described by differential equations. A key strategy is to exploit the value iteration (VI) method proposed initially by Bellman in 1957 as a fundamental tool to solve dynamic programming problems. However, previous VI methods are all exclusively devoted to the Markov decision processes and discrete-time dynamical systems. In this article, we aim to fill up the gap by developing a new continuous-time VI method that will be applied to address the adaptive or nonadaptive optimal control problems for continuous-time systems described by differential equations. Like the traditional VI, the continuous-time VI algorithm retains the nice feature that there is no need to assume the knowledge of an initial admissible control policy. As a direct application of the proposed VI method, a new class of adaptive optimal controllers is obtained for nonlinear systems with totally unknown dynamics. A learning-based control algorithm is proposed to show how to learn robust optimal controllers directly from real-time data. Finally, two examples are given to illustrate the efficacy of the proposed methodology.

Journal ArticleDOI
TL;DR: This article investigates the zero-sum game-based secure control problem for cyber–physical systems (CPS) under the actuator false data injection attacks and derives the optimal defending policy and the attack policy from the dynamic programming approach.
Abstract: This article investigates the zero-sum game-based secure control problem for cyber–physical systems (CPS) under the actuator false data injection attacks. The physical process is described as a linear time-invariant discrete-time model. Both the process noise and the measurement noise are addressed in the design process. An optimal Kalman filter is given to estimate the system states. The adversary and the defender are modeled as two players. Under the zero-sum game framework, an optimal infinite-horizon quadratic cost function is defined. Employing the dynamic programming approach, the optimal defending policy and the attack policy are derived. The convergence of the cost function is proved. Moreover, the critical attack probability is derived, beyond which the cost cannot be bounded. Finally, simulation results are provided to validate the proposed secure scheme.

Journal ArticleDOI
TL;DR: An algorithm to calculate the time-optimal energy management and gearshift strategies for the Formula 1 race car is presented, which combines convex optimization, dynamic programming and Pontryagin’s minimum principle in an iterative scheme to solve the arising mixed-integer optimization problem.

Journal ArticleDOI
27 Jul 2021
TL;DR: In this article, the adaptive optimal control problem for a wheel-legged robot in the absence of an accurate dynamic model is studied and a learning-based solution is derived from input-state data collected along the trajectories of the robot.
Abstract: This letter studies the adaptive optimal control problem for a wheel-legged robot in the absence of an accurate dynamic model. A crucial strategy is to exploit recent advances in reinforcement learning (RL) and adaptive dynamic programming (ADP) to derive a learning-based solution to adaptive optimal control. It is shown that suboptimal controllers can be learned directly from input-state data collected along the trajectories of the robot. Rigorous proofs for the convergence of the novel data-driven value iteration (VI) algorithm and the stability of the closed-loop robot system are provided. Experiments are conducted to demonstrate the efficiency of the novel adaptive suboptimal controller derived from the data-driven VI algorithm in balancing the wheel-legged robot to the equilibrium.

Journal ArticleDOI
TL;DR: In this article, a mathematical model of a magnetic-wheeled mobile robot used for wall climbing in some special industrial sites is established, and to realize the precise motion control of the robot, an intelligent discrete algorithm for trajectory tracking control is presented.
Abstract: In this article, a mathematical model of a magnetic-wheeled mobile robot (MWMR) used for wall climbing in some special industrial sites is established, and to realize the precise motion control of the robot, an intelligent discrete algorithm for trajectory tracking control of the MWMR is presented. The robot is subjected to nonholonomic constraints when moving on the wall. The discrete mathematical model of the MWMR is established with the description of the kinematics and dynamics, where the dynamics are described by the second-order Lagrange's equations. Improved dual-heuristic dynamic programming (DHP) where the actor-critic structure adopts random vector functional link neural networks (RVFL NNs) is the main configuration of the trajectory tracking control algorithm. Moreover, the tracking control algorithm is supplied by a PD controller and a supervisory element to generate overall control signals. The improved strategy is to optimize the input layer weights of RVFL NNs by a genetic algorithm to improve the approximation performance of DHP. Simulations are performed to test the trajectory tracking control algorithm for this wall-climbing robot, and the results are compared with those of neural network tracking control algorithm. Comparative analysis verifies the effectiveness and advancement of the proposed method.

Journal ArticleDOI
TL;DR: In this article, a novel formulation of the value function is presented for the optimal tracking problem (TP) of nonlinear discrete-time systems, and the optimal control policy can be deduced without considering the reference control input.

Journal ArticleDOI
TL;DR: A novel technique, called the bisection algorithm (BA), is introduced, which is fully implemented in C++ and extends dynamic programming approaches to the problem, and is shown to be significantly simpler, faster, and more robust than recently proposed algorithms.
Abstract: The time-optimal trajectory planning problem involves minimizing the time required to follow a path defined in space, subject to kinematic and dynamic constraints. Here, we introduce a novel technique, called the bisection algorithm (BA), which is fully implemented in C++ and extends dynamic programming approaches to the problem. These approaches, which rely on dividing the global problem into a series of simpler subproblems, become increasingly advantageous compared to direct transcription methods as the number of problem constraints increases. In contrast to nearly all other dynamic programming approaches, BA does not rely on finding a maximum-velocity curve or explicitly finding acceleration switching points during the trajectory planning process. Additionally, only one forward and one backward integration are used, during which all constraints are imposed. This approach is made feasible through careful control of the numerical integration process and the use of a bisection algorithm to resolve constraint violations during integration. BA is shown to be significantly simpler, faster, and more robust than recently proposed algorithms: a direct comparison is made for a series of paths to be followed by a serial manipulator, subject to kinematic constraints. The wide applicability of BA is then established by solving the time-optimal problem for a parallel manipulator following a complex path, subject to both kinematic and dynamic constraints.

Posted Content
TL;DR: Deep Policy Dynamic Programming (DPDP) as mentioned in this paper prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
Abstract: Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms can find optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming other `neural approaches' for solving TSPs and VRPs with 100 nodes.