scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 2020"


Proceedings ArticleDOI
31 May 2020
TL;DR: This work introduces Crocoddyl, an open-source framework tailored for efficient multi-contact optimal control, and proposes a novel optimal control algorithm called Feasibility-driven Differential Dynamic Programming (FDDP), which shows a greater globalization strategy compared to classical DDP algorithms.
Abstract: We introduce Crocoddyl (Contact RObot COntrol by Differential DYnamic Library), an open-source framework tailored for efficient multi-contact optimal control. Crocoddyl efficiently computes the state trajectory and the control policy for a given predefined sequence of contacts. Its efficiency is due to the use of sparse analytical derivatives, exploitation of the problem structure, and data sharing. It employs differential geometry to properly describe the state of any geometrical system, e.g. floating-base systems. Additionally, we propose a novel optimal control algorithm called Feasibility-driven Differential Dynamic Programming (FDDP). Our method does not add extra decision variables which often increases the computation time per iteration due to factorization. FDDP shows a greater globalization strategy compared to classical Differential Dynamic Programming (DDP) algorithms. Concretely, we propose two modifications to the classical DDP algorithm. First, the backward pass accepts infeasible state-control trajectories. Second, the rollout keeps the gaps open during the early "exploratory" iterations (as expected in multipleshooting methods with only equality constraints). We showcase the performance of our framework using different tasks. With our method, we can compute highly-dynamic maneuvers (e.g. jumping, front-flip) within few milliseconds.

127 citations


Journal ArticleDOI
TL;DR: The proposed control approach significantly improves the controller’s robustness in the face of uncertain signal timing, without requiring to know the distribution of the random variable a priori.
Abstract: This article focuses on the speed planning problem for connected and automated vehicles (CAVs) communicating to traffic lights. The uncertainty of traffic signal timing for signalized intersections on the road is considered. The eco-driving problem is formulated as a data-driven chance-constrained robust optimization problem. Effective red-light duration (ERD) is defined as a random variable, and describes the feasible passing time through the signalized intersections. Usually, the true probability distribution for ERD is unknown. Consequently, a data-driven approach is adopted to formulate chance constraints based on empirical sample data. This incorporates robustness into the eco-driving control problem with respect to uncertain signal timing. Dynamic programming (DP) is employed to solve the optimization problem. The simulation results demonstrate that the proposed method can generate optimal speed reference trajectories with 40% less vehicle fuel consumption, while maintaining the arrival time at a similar level compared to a modified intelligent driver model (IDM). The proposed control approach significantly improves the controller’s robustness in the face of uncertain signal timing, without requiring to know the distribution of the random variable a priori .

110 citations


Journal ArticleDOI
TL;DR: It is proved that semiglobal uniform ultimate boundedness can be guaranteed for states and NN weight errors with the ADP-based ETOC, and a predetermined upper bound is provided by proving the existence of a lower bound for interexecution time.
Abstract: This paper studies the problem of event-triggered optimal control (ETOC) for continuous-time nonlinear systems and proposes a novel event-triggering condition that enables designing ETOC methods directly based on the solution of the Hamilton–Jacobi–Bellman (HJB) equation. We provide formal performance guarantees by proving a predetermined upper bound. Moreover, we also prove the existence of a lower bound for interexecution time. For implementation purposes, an adaptive dynamic programming (ADP) method is developed to realize the ETOC using a critic neural network (NN) to approximate the value function of the HJB equation. Subsequently, we prove that semiglobal uniform ultimate boundedness can be guaranteed for states and NN weight errors with the ADP-based ETOC. Simulation results demonstrate the effectiveness of the developed ADP-based ETOC method.

107 citations


Journal ArticleDOI
TL;DR: The experimental results demonstrate that DPED outperforms the classical ensembles on all datasets in terms of both accuracy and size of the ensemble and verify the reliability, stability, and effectiveness of the proposed DPED algorithm.
Abstract: In recent years, classifier ensemble techniques have drawn the attention of many researchers in the machine learning research community. The ultimate goal of these researches is to improve the accuracy of the ensemble compared to the individual classifiers. In this paper, a novel algorithm for building ensembles called dynamic programming-based ensemble design algorithm (DPED) is introduced and studied in detail. The underlying theory behind DPED is based on cooperative game theory in the first phase and applying a dynamic programming approach in the second phase. The main objective of DPED is to reduce the size of the ensemble while encouraging extra diversity in order to improve the accuracy. The performance of the DPED algorithm is compared empirically with the classical ensemble model and with a well-known algorithm called “the most diverse.” The experiments were carried out with 13 datasets from UCI and three ensemble models. Each ensemble model is constructed from 15 different base classifiers. The experimental results demonstrate that DPED outperforms the classical ensembles on all datasets in terms of both accuracy and size of the ensemble. Regarding the comparison with the most diverse algorithm, the number of selected classifiers by DPED across all datasets and all domains is less than or equal to the number selected by the most diverse algorithm. Experiment on blog spam dataset, for instance, shows that DPED provides an accuracy of 96.47 compared to 93.87 obtained by the most diverse using 40% training size. Finally, the experimental results verify the reliability, stability, and effectiveness of the proposed DPED algorithm.

96 citations


Journal ArticleDOI
TL;DR: To address the uncertain renewable energy in the day-ahead optimal dispatch of energy and reserve, a multi-stage stochastic programming model is established in this paper to minimize the expected total costs and to deal with the “Curse of Dimensionality” of stochastically programming.
Abstract: To address the uncertain renewable energy in the day-ahead optimal dispatch of energy and reserve, a multi-stage stochastic programming model is established in this paper to minimize the expected total costs. The uncertainties over the multiple stages are characterized by a scenario tree and the optimal dispatch scheme is cast as a decision tree which guarantees the flexibility to decide the reasonable outputs of generation and the adequate reserves accounting for different realizations of renewable energy. Most importantly, to deal with the “Curse of Dimensionality” of stochastic programming, stochastic dual dynamic programming (SDDP) is employed, which decomposes the original problem into several sub-problems according to the stages. Specifically, the SDDP algorithm performs forward pass and backward pass repeatedly until the convergence criterion is satisfied. At each iteration, the original problem is approximated by creating a linear piecewise function. Besides, an improved convergence criterion is adopted to narrow the optimization gaps. The results on the IEEE 118-bus system and real-life provincial power grid show the effectiveness of the proposed model and method.

95 citations


Journal ArticleDOI
TL;DR: An event-triggered approach is developed based on ADP, which samples the states and updates the weights of NNs at the same time when the event-triggering condition is violated, such that the computational complexity is reduced.
Abstract: In this paper, the zero-sum game problem is considered for partially unknown continuous-time nonlinear systems, and an event-triggered adaptive dynamic programming (ADP) method is developed to solve the problem. First, an identifier neural network (NN) and a critic NN are applied to approximate the drift system dynamics and the optimal value function, respectively. Subsequently, an event-triggered approach is developed based on ADP, which samples the states and updates the weights of NNs at the same time when the event-triggering condition is violated, such that the computational complexity is reduced. It is proved that the states and the error of NN weights are uniformly ultimately bounded. Finally, the effectiveness of the developed ADP-based event-triggered method is verified through simulation studies.

92 citations


Journal ArticleDOI
TL;DR: It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state Feedback Control with the traditional performance indexfunction.
Abstract: This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems. Unlike existing optimal state feedback control, the control input of the optimal parallel control is introduced into the feedback system. However, due to the introduction of control input into the feedback system, the optimal state feedback control methods can not be applied directly. To address this problem, an augmented system and an augmented performance index function are proposed firstly. Thus, the general nonlinear system is transformed into an affine nonlinear system. The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically. It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function. Moreover, an adaptive dynamic programming ( ADP ) technique is utilized to implement the optimal parallel tracking control using a critic neural network ( NN ) to approximate the value function online. The stability analysis of the closed-loop system is performed using the Lyapunov theory, and the tracking error and NN weights errors are uniformly ultimately bounded ( UUB ) . Also, the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals. Finally, the effectiveness of the developed optimal parallel control method is verified in two cases.

82 citations


Journal ArticleDOI
TL;DR: The proposed robust optimal control algorithm tunes the parameters of critic-only neural network by event-triggering condition and runs in a plug-n-play framework without system functions, where fewer transmissions and less computation are required as all the measurements received simultaneously.
Abstract: In this paper, a novel event-sampled robust optimal controller is proposed for a class of continuous-time constrained-input nonlinear systems with unknown dynamics. In order to solve the robust optimal control problem, an online data-driven identifier is established to construct the system dynamics, and an event-sampled critic-only adaptive dynamic programming method is developed to replace the conventional time-driven actor–critic structure. The designed online identification method runs during the solving process and is not applied as a priori part for the solutions, which simplifies the architecture and reduces computational load. The proposed robust optimal control algorithm tunes the parameters of critic-only neural network (NN) by event-triggering condition and runs in a plug-n-play framework without system functions, where fewer transmissions and less computation are required as all the measurements received simultaneously. Based on the novel design, the stability of system and the convergence of critic NN are demonstrated by Lyapunov theory, where the state is asymptotically stable and weight error is guaranteed to be uniformly ultimately bounded. Finally, the applications in a basic nonlinear system and the complex rotational–translational actuator problem demonstrate the effectiveness of the proposed method.

73 citations


Posted Content
TL;DR: This work proposes a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems, and experimentally shows that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.
Abstract: Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal is to find an optimal solution among a finite set of possibilities. The well-known challenge one faces with combinatorial optimization is the state-space explosion problem: the number of possibilities grows exponentially with the problem size, which makes solving intractable for large problems. In the last years, deep reinforcement learning (DRL) has shown its promise for designing good heuristics dedicated to solve NP-hard combinatorial optimization problems. However, current approaches have two shortcomings: (1) they mainly focus on the standard travelling salesman problem and they cannot be easily extended to other problems, and (2) they only provide an approximate solution with no systematic ways to improve it or to prove optimality. In another context, constraint programming (CP) is a generic tool to solve combinatorial optimization problems. Based on a complete search procedure, it will always find the optimal solution if we allow an execution time large enough. A critical design choice, that makes CP non-trivial to use in practice, is the branching decision, directing how the search space is explored. In this work, we propose a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems. The core of our approach is based on a dynamic programming formulation, that acts as a bridge between both techniques. We experimentally show that our solver is efficient to solve two challenging problems: the traveling salesman problem with time windows, and the 4-moments portfolio optimization problem. Results obtained show that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.

70 citations


Journal ArticleDOI
01 Jan 2020-Energy
TL;DR: Simulation and experimental results highlight that the proposed strategy can lead to less fuel consumption, compared to traditional equivalent consumption minimization strategy, thereby proving its feasibility.

69 citations


Journal ArticleDOI
TL;DR: Through event-triggered approach, the constrained near-optimal control problem for a class of nonlinear discrete-time systems is investigated and solved by heuristic dynamic programming (HDP) technique and a nonquadratic performance index is introduced.
Abstract: In this paper, through event-triggered approach, the constrained near-optimal control problem for a class of nonlinear discrete-time systems is investigated and solved by heuristic dynamic programming (HDP) technique. The proposed method can reduce the amount of computation remarkably without deteriorating the system stability. In order to overcome the control constraints and reduce the computational burden, a nonquadratic performance index is introduced. Then, stability analysis of the event-triggered system with control constraints and an event-triggered constrained controller design algorithm are given. Three neural networks are used in the HDP scheme, which are designed to identify the unknown nonlinear system, approximate value function, and control law, respectively. In the model neural network, an effective method is developed to initialize its weights. Finally, two examples are included to demonstrate the present method.

Journal ArticleDOI
TL;DR: A Q-learning-based in-vehicle learning system that is free of physical models and can robustly converge to an optimal energy control solution is presented and a new initialization strategy, which combines the optimal learning with a properly selected penalty function is introduced.
Abstract: Energy optimization for plug-in hybrid electric vehicles (PHEVs) is a challenging problem due to the system complexity and many physical and operational constraints in PHEVs. In this paper, we present a Q-learning-based in-vehicle learning system that is free of physical models and can robustly converge to an optimal energy control solution. The proposed machine learning algorithms combine neuro-dynamic programming (NDP) with future trip information to effectively estimate the expected future energy cost (expected cost-to-go) for a given vehicle state and control actions. The convergences of these learning algorithms were demonstrated on both fixed and randomly selected drive cycles. Based on the characteristics of these learning algorithms, we propose a two-stage deployment solution for PHEV power management applications. Furthermore, we introduce a new initialization strategy, which combines the optimal learning with a properly selected penalty function. This initialization scheme can reduce the learning convergence time by 70%, which is a significant improvement for in-vehicle implementation efficiency. Finally, we develop a neural network (NN) for predicting battery state-of-charge (SoC), rendering the proposed power management controller completely free of physical models.

Journal ArticleDOI
TL;DR: In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems by considering the constraint of the impulsive interval.
Abstract: In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems. Considering the constraint of the impulsive interval, in each iteration, the iterative impulsive value function under each possible impulsive interval is obtained, and then the iterative value function and iterative control law are achieved. A new convergence analysis method is developed which proves an iterative value function to converge to the optimum as the iteration index increases to infinity. The properties of the iterative control law are analyzed, and the detailed implementation of the optimal impulsive control law is presented. Finally, two simulation examples with comparisons are given to show the effectiveness of the developed method.

Journal ArticleDOI
TL;DR: A state-based sequential network reconfiguration strategy by using a Markov decision process (MDP) model with the objective of minimizing renewable distributed generation curtailment and load shedding under operational constraints is developed.
Abstract: Growing penetration of renewable distributed generation, a major concern nowadays, has played a critical role in distribution system operation. This paper develops a state-based sequential network reconfiguration strategy by using a Markov decision process (MDP) model with the objective of minimizing renewable distributed generation curtailment and load shedding under operational constraints. Available power outputs of distributed generators and the system topology in each decision time are represented as Markov states, which are driven to other Markov states in next decision time in consideration of uncertainties of renewable distributed generation. For each Markov state in each decision time, a recursive optimization model with a current cost and a future cost is developed to make state-based actions, including system reconfiguration, load shedding, and distributed generation curtailment. To address the curse of dimensionality caused by enormous states and actions in the proposed model, an approximate dynamic programming (ADP) approach, including post-decision states and forward dynamic algorithm, is used to solve the proposed MDP-based model. IEEE 33-bus system and IEEE 123-bus system are used to validate the proposed model.

Journal ArticleDOI
TL;DR: This article presents a nonmodel-based controller design for vehicle dynamic systems to improve lateral stability, where output tracking control and adaptive dynamic programming approaches are employed to track the desired yaw rate and, at the same time, mitigate the sideslip angle, roll angle, and roll rate of the vehicle.
Abstract: This article presents a nonmodel-based controller design for vehicle dynamic systems to improve lateral stability, where output tracking control and adaptive dynamic programming approaches are employed to track the desired yaw rate and, at the same time, mitigate the sideslip angle, roll angle, and roll rate of the vehicle. Moreover, different from some existing optimization methods in control allocation, the proposed control strategies, which distribute tire forces by learning, are only using the information of states, input, and reference signal instead of the knowledge of the vehicle system. The iterative process repeatedly uses the information about state and input to calculate the feedback gain. It can significantly reduce the learning time and computational burden. The effectiveness of the proposed controller design method is shown by CarSim simulations.

Journal ArticleDOI
TL;DR: In this paper, the authors designed the UAV trajectory to minimize the total energy consumption while satisfying the requested timeout (RT) requirement and energy budget, which is accomplished via jointly optimizing the path and UAV's velocities along subsequent hops.
Abstract: In this paper, we design the UAV trajectory to minimize the total energy consumption while satisfying the requested timeout (RT) requirement and energy budget, which is accomplished via jointly optimizing the path and UAV's velocities along subsequent hops. The corresponding optimization problem is difficult to solve due to its non-convexity and combinatorial nature. To overcome this difficulty, we solve the original problem via two consecutive steps. Firstly, we propose two algorithms, namely heuristic search, and dynamic programming (DP) to obtain a feasible set of paths without violating the GU's RT requirements based on the traveling salesman problem with time window (TSPTW). Then, they are compared with exhaustive search and traveling salesman problem (TSP) used as reference methods. While the exhaustive algorithm achieves the best performance at a high computation cost, the heuristic algorithm exhibits poorer performance with low complexity. As a result, the DP is proposed as a practical trade-off between the exhaustive and heuristic algorithms. Specifically, the DP algorithm results in near-optimal performance at a much lower complexity. Secondly, for given feasible paths, we propose an energy minimization problem via a joint optimization of the UAV's velocities along subsequent hops. Finally, numerical results are presented to demonstrate the effectiveness of our proposed algorithms. The results show that the DP-based algorithm approaches the exhaustive search's performance with a significantly reduced complexity. It is also shown that the proposed solutions outperform the state-of-the-art benchmarks in terms of both energy consumption and outage performance.

Journal ArticleDOI
TL;DR: The proposed real-time method is effective in reducing the net electricity purchase cost compared to other existing energy management methods and introduces a rule-based controller underneath the optimization layer in finer time resolution at the power electronics converter control level.
Abstract: A computationally proficient real-time energy management method with stochastic optimization is presented for a residential photovoltaic (PV)-storage hybrid system comprised of a solar PV generation and a battery energy storage (BES). Existing offline energy management approaches for day-ahead scheduling of BES suffer from energy loss in real time due to the stochastic nature of load and solar generation. On the other hand, typical online algorithms do not offer optimal solutions for minimizing electricity purchase costs to the owners. To overcome these limitations, we propose an integrated energy management framework consisting of an offline optimization model concurrent with a real-time rule-based controller. The optimization is performed in receding horizon with load and solar generation forecast profiles using deep learning-based long short term memory method in rolling horizon to reduce the daily electricity purchase costs. The optimization model is formulated as a multistage stochastic program where we use the stochastic dual dynamic programming algorithm in the receding horizon to update the optimal set point for BES dispatch at a fixed interval. To prevent loss of energy during optimal solution update intervals, we introduce a rule-based controller underneath the optimization layer in finer time resolution at the power electronics converter control level. The proposed framework is evaluated using a real-time controller-hardware-in-the-loop test platform in an OPAL-RT simulator. The proposed real-time method is effective in reducing the net electricity purchase cost compared to other existing energy management methods.

Proceedings Article
01 Jan 2020
TL;DR: A new Variational Policy Gradient Theorem for RL with general utilities is derived, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.
Abstract: In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases. Such generality invalidates the Bellman equation. As this means that dynamic programming no longer works, we focus on direct policy search. Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function. We develop a variational Monte Carlo gradient estimation algorithm to compute the policy gradient based on sample paths. We prove that the variational policy gradient scheme converges globally to the optimal policy for the general objective, though the optimization problem is nonconvex. We also establish its rate of convergence of the order $O(1/t)$ by exploiting the hidden convexity of the problem, and proves that it converges exponentially when the problem admits hidden strong convexity. Our analysis applies to the standard RL problem with cumulative rewards as a special case, in which case our result improves the available convergence rate.

Journal ArticleDOI
TL;DR: The results show that compared to the scenario with 100% HVs, ramp-merging can be smoother in mixed traffic environment and traffic throughput can be further increased by 10–15%.
Abstract: The rapid conceptual development and commercialization of connected automated vehicle (CAV) has led to the problem of mixed traffic, i.e., traffic mixed with CAVs and conventional human-operated vehicles (HVs). The paper studies cooperative decision-making for mixed traffic (CDMMT). Using discrete optimization, a CDMMT mechanism is developed to facilitate ramp merging, and to properly capture the cooperative and non-cooperative behaviors in mixed traffic. The CDMMT mechanism can be described as a bi-level optimization program in which state-constrained optimal control-based trajectory design problems are imbedded in a sequencing problem. A bi-level dynamic programming-based solution approach is developed to efficiently solve the problem. The proposed modeling mechanism and solution approach are generic to deterministic decisions and can guarantee system-efficient solutions. A micro-simulation environment is built for model validation and analysis of mixed traffic. The results show that compared to the scenario with 100% HVs, ramp-merging can be smoother in mixed traffic environment. At high CAV penetration, the section throughput increases about 18%. With the proposed CDMMT mechanism, traffic throughput can be further increased by 10–15%. The proposed methods form the basis of traffic analysis and cooperative control at ramp-merging sections under mixed traffic environment.

Journal ArticleDOI
TL;DR: It is shown that the reinforcement learning-based strategy can obtain global optimality in the optimal control problem with an infinite horizon, which can also be obtained by stochastic dynamic programming.
Abstract: Energy management strategy is an important factor in determining the fuel economy of hybrid electric vehicles; thus, much research on how to distribute the required power to engines and motors of hybrid vehicles is required. Recently, various studies have been conducted based on reinforcement learning to optimally control the hybrid electric vehicle. In fact, the fundamental control approach of reinforcement learning shares many control frameworks with the control approach by using deterministic dynamic programming or stochastic dynamic programming. In this study, we compare the reinforcement learning based strategy by using these dynamic programming-based control approaches. For optimal control of hybrid electric vehicle, each control method was compared in terms of fuel efficiency by performing simulation by using various driving cycles. Based on our simulations, we showed the reinforcement learning-based strategy can obtain global optimality in the optimal control problem with an infinite horizon, which can also be obtained by stochastic dynamic programming. We also showed that the reinforcement learning-based strategy can present a solution close to the optimal one using deterministic dynamic programming, while a reinforcement learning-based strategy is more appropriate for a time variant controller with boundary value constraints. In addition, we verified the convergence characteristics of the control strategy based on reinforcement learning, when transfer learning was performed through value initialization using stochastic dynamic programming.

Journal ArticleDOI
01 Oct 2020-Energy
TL;DR: A series of numerical simulation results validate that the optimal performance yielded from global optimal strategy can be exploited online to attain the satisfied cost reduction, compared with equivalent consumption minimum strategy, with the assistance of estimated real time co-state and slacked reference.

Journal ArticleDOI
TL;DR: A model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback using the proposed VI method, which does not require an initially stabilizing policy.
Abstract: In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input–output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.

Journal ArticleDOI
TL;DR: A three-stage hybrid method is developed to satisfy this practical requirement, where the domain knowledge is used to build a virtual load curve balancing the load features and electricity contracts of multiple power grids; the results indicate that the hybrid method can achieve satisfactory scheduling results in different cases.

Journal ArticleDOI
TL;DR: This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations, and shows that the use of DNNs can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise.

Journal ArticleDOI
TL;DR: A data-based control method based on adaptive dynamic programming (ADP) algorithm is proposed for a class of discrete-time (DT) systems in the case of multiple delays that only composed of input and output data.
Abstract: In this paper, a data-based control method based on adaptive dynamic programming (ADP) algorithm is proposed for a class of discrete-time (DT) systems in the case of multiple delays. Data-based ADP method is implemented by virtue of the measured input and output data. The condition of the existence of the corresponding equivalent multiple delays system is derived according to the characteristics of time-delay system. A novel data-based state equation is developed that only composed of input and output data, which is very meaningful in practical applications. By using the data-based ADP method, the output feedback control problem is solved only by measuring the input and output of the system with multiple delays. The convergence proofs of the designed policy iteration and value iteration algorithms are given, respectively. The simulation example is presented to demonstrate the validity of the proposed data-based ADP method.

Journal ArticleDOI
TL;DR: In this paper, a dynamic predictive traffic signal control framework for isolated intersections is proposed in a cross-sectional VII environment, which has the ability to predict vehicle arrivals and use this to optimize traffic signals.
Abstract: With the development of modern wireless communication technology, especially the vehicle infrastructure integration (VII) technology, vehicles’ information such as identification, location, and speed can be readily obtained at upstream cross-section. This information can be used to support traffic signal timing optimization in real time. A dynamic predictive traffic signal control framework for isolated intersections is proposed in a cross-sectional VII environment, which has the ability to predict vehicle arrivals and use this to optimize traffic signals. The proposed dynamic predictive control framework includes a dynamic platoon dispersion model (DPDM) which uses the vehicles’ speed data from the cross-sectional VII environment, as opposed to traditional vehicle passing/existing data, to predict the arriving flow distribution at the downstream stop-line. Then, a dynamic programming algorithm based on the exhaustive optimization of phases (EOP) is proposed working in rolling optimization (RO) scheme with a 2s time horizon. The signal timings are continuously optimized by regarding the minimization of intersection delay as the optimization objective, and setting the green time duration of each phase as a constraint. In the end, the proposed dynamic predictive control framework is tested in a simulated cross-sectional VII environment and a case study carried out based on a real road network. The results show that the proposed framework can reduce the average delay and queue length by up to 33% and 35%, respectively, compared with the traditional full-actuated control.

31 Jul 2020
TL;DR: This paper proposes a method to automate the tuning of convex optimization control policies by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters.
Abstract: Many control policies used in various applications determine the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex control-Lyapunov or approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a crude grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex optimization problem with respect to its parameters. We illustrate our method on several examples.

Journal ArticleDOI
TL;DR: An approximate dynamic programming (DP) approach for energy-efficient subway train scheduling problem with time-dependent demand is designed, where the conceptions of states, policies, state transitions, and reward function are introduced.
Abstract: Owing to environmental concerns, the energy-efficient subway train scheduling problem is necessary in subway operation management. This paper designs an approximate dynamic programming (DP) approach for energy-efficient subway train scheduling problem with time-dependent demand. The train traffic model is proposed with the dynamic equations for the evolution of train headway, train passenger loads, and the energy consumption along the subway line. For the dynamic changing of the onboard passengers with time, the total train energy usage is modeled as the sum of energy consumptions from the traction system and auxiliary facilities. A nonlinear DP problem is formulated to generate a near optimal timetable to realize the tradeoff among the utilization of trains, passenger waiting time, service levels, and energy consumption. To overcome the curse of dimensionality in this optimization problem, we construct an approximate DP framework, where the conceptions of states, policies, state transitions, and reward function are introduced. And this algorithm is able to converge to a good solution with a short time compared to the genetic algorithm and differential evolution algorithm. Finally, the numerical experiments are given to demonstrate the effectiveness of the proposed model and algorithm.

Journal ArticleDOI
TL;DR: A general (universally applicable) dynamic programming formulation is introduced, its well-posedness is established, and new existence results for optimal policies in decentralized stochastic control are obtained.
Abstract: For sequential stochastic control problems with standard Borel measurement and control action spaces, we introduce a general (universally applicable) dynamic programming formulation, establish its ...

Journal ArticleDOI
TL;DR: An event-triggered ADP control method based on adaptive dynamic programming with concurrent learning for unknown continuous-time nonlinear systems with control constraints with system identification technique based on neural networks to identify completely unknown systems.
Abstract: In this article, an event-triggered H∞ control method is proposed based on adaptive dynamic programming (ADP) with concurrent learning for unknown continuous-time nonlinear systems with control constraints. First, a system identification technique based on neural networks (NNs) is adopted to identify completely unknown systems. Second, a critic NN is employed to approximate the value function. A novel weight updating rule is developed based on the event-triggered control law and time-triggered disturbance law, which reduces controller execution times and guarantees the stability of the system. Subsequently, concurrent learning is applied to the weight updating rule to relax the demand for the traditional persistence of excitation condition that is difficult to implement online. Finally, the comparison between the time-triggered method and event-triggered method in simulation demonstrates the effectiveness of the developed constrained event-triggered ADP method.