Showing papers on "Dynamic programming published in 2020"

PDF

Open Access

Proceedings Article•DOI•

Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control

[...]

Carlos Mastalli¹, Rohan Budhiraja¹, Wolfgang Merkt², Guilhem Saurel¹, Bilal Hammoud³, Maximilien Naveau³, Justin Carpentier⁴, Ludovic Righetti³, Sethu Vijayakumar², Nicolas Mansard¹ - Show less +6 more•Institutions (4)

Centre national de la recherche scientifique¹, University of Edinburgh², Max Planck Society³, PSL Research University⁴

31 May 2020

TL;DR: This work introduces Crocoddyl, an open-source framework tailored for efficient multi-contact optimal control, and proposes a novel optimal control algorithm called Feasibility-driven Differential Dynamic Programming (FDDP), which shows a greater globalization strategy compared to classical DDP algorithms.

...read moreread less

Abstract: We introduce Crocoddyl (Contact RObot COntrol by Differential DYnamic Library), an open-source framework tailored for efficient multi-contact optimal control. Crocoddyl efficiently computes the state trajectory and the control policy for a given predefined sequence of contacts. Its efficiency is due to the use of sparse analytical derivatives, exploitation of the problem structure, and data sharing. It employs differential geometry to properly describe the state of any geometrical system, e.g. floating-base systems. Additionally, we propose a novel optimal control algorithm called Feasibility-driven Differential Dynamic Programming (FDDP). Our method does not add extra decision variables which often increases the computation time per iteration due to factorization. FDDP shows a greater globalization strategy compared to classical Differential Dynamic Programming (DDP) algorithms. Concretely, we propose two modifications to the classical DDP algorithm. First, the backward pass accepts infeasible state-control trajectories. Second, the rollout keeps the gaps open during the early "exploratory" iterations (as expected in multipleshooting methods with only equality constraints). We showcase the performance of our framework using different tasks. With our method, we can compute highly-dynamic maneuvers (e.g. jumping, front-flip) within few milliseconds.

...read moreread less

127 citations

Journal Article•DOI•

Optimal Eco-Driving Control of Connected and Autonomous Vehicles Through Signalized Intersections

[...]

Chao Sun¹, Jacopo Guanetti², Francesco Borrelli², Scott J. Moura²•Institutions (2)

Beijing Institute of Technology¹, University of California, Berkeley²

21 Jan 2020-IEEE Internet of Things Journal

TL;DR: The proposed control approach significantly improves the controller’s robustness in the face of uncertain signal timing, without requiring to know the distribution of the random variable a priori.

...read moreread less

Abstract: This article focuses on the speed planning problem for connected and automated vehicles (CAVs) communicating to traffic lights. The uncertainty of traffic signal timing for signalized intersections on the road is considered. The eco-driving problem is formulated as a data-driven chance-constrained robust optimization problem. Effective red-light duration (ERD) is defined as a random variable, and describes the feasible passing time through the signalized intersections. Usually, the true probability distribution for ERD is unknown. Consequently, a data-driven approach is adopted to formulate chance constraints based on empirical sample data. This incorporates robustness into the eco-driving control problem with respect to uncertain signal timing. Dynamic programming (DP) is employed to solve the optimization problem. The simulation results demonstrate that the proposed method can generate optimal speed reference trajectories with 40% less vehicle fuel consumption, while maintaining the arrival time at a similar level compared to a modified intelligent driver model (IDM). The proposed control approach significantly improves the controller’s robustness in the face of uncertain signal timing, without requiring to know the distribution of the random variable a priori .

...read moreread less

110 citations

Journal Article•DOI•

Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming

[...]

Biao Luo¹, Yin Yang², Derong Liu³, Huai-Ning Wu⁴•Institutions (4)

Central South University¹, Khalifa University², Guangdong University of Technology³, Beihang University⁴

01 Jan 2020-IEEE Transactions on Neural Networks

TL;DR: It is proved that semiglobal uniform ultimate boundedness can be guaranteed for states and NN weight errors with the ADP-based ETOC, and a predetermined upper bound is provided by proving the existence of a lower bound for interexecution time.

...read moreread less

Abstract: This paper studies the problem of event-triggered optimal control (ETOC) for continuous-time nonlinear systems and proposes a novel event-triggering condition that enables designing ETOC methods directly based on the solution of the Hamilton–Jacobi–Bellman (HJB) equation. We provide formal performance guarantees by proving a predetermined upper bound. Moreover, we also prove the existence of a lower bound for interexecution time. For implementation purposes, an adaptive dynamic programming (ADP) method is developed to realize the ETOC using a critic neural network (NN) to approximate the value function of the HJB equation. Subsequently, we prove that semiglobal uniform ultimate boundedness can be guaranteed for states and NN weight errors with the ADP-based ETOC. Simulation results demonstrate the effectiveness of the developed ADP-based ETOC method.

...read moreread less

107 citations

Journal Article•DOI•

An optimal pruning algorithm of classifier ensembles: dynamic programming approach

[...]

Omar A. Alzubi¹, Jafar A. Alzubi¹, Mohammed Alweshah¹, Issa Qiqieh¹, Sara Al-Shami¹, Manikandan Ramachandran - Show less +2 more•Institutions (1)

Al-Balqa` Applied University¹

23 Feb 2020-Neural Computing and Applications

TL;DR: The experimental results demonstrate that DPED outperforms the classical ensembles on all datasets in terms of both accuracy and size of the ensemble and verify the reliability, stability, and effectiveness of the proposed DPED algorithm.

...read moreread less

Abstract: In recent years, classifier ensemble techniques have drawn the attention of many researchers in the machine learning research community. The ultimate goal of these researches is to improve the accuracy of the ensemble compared to the individual classifiers. In this paper, a novel algorithm for building ensembles called dynamic programming-based ensemble design algorithm (DPED) is introduced and studied in detail. The underlying theory behind DPED is based on cooperative game theory in the first phase and applying a dynamic programming approach in the second phase. The main objective of DPED is to reduce the size of the ensemble while encouraging extra diversity in order to improve the accuracy. The performance of the DPED algorithm is compared empirically with the classical ensemble model and with a well-known algorithm called “the most diverse.” The experiments were carried out with 13 datasets from UCI and three ensemble models. Each ensemble model is constructed from 15 different base classifiers. The experimental results demonstrate that DPED outperforms the classical ensembles on all datasets in terms of both accuracy and size of the ensemble. Regarding the comparison with the most diverse algorithm, the number of selected classifiers by DPED across all datasets and all domains is less than or equal to the number selected by the most diverse algorithm. Experiment on blog spam dataset, for instance, shows that DPED provides an accuracy of 96.47 compared to 93.87 obtained by the most diverse using 40% training size. Finally, the experimental results verify the reliability, stability, and effectiveness of the proposed DPED algorithm.

...read moreread less

96 citations

Journal Article•DOI•

Multi-Stage Stochastic Programming to Joint Economic Dispatch for Energy and Reserve With Uncertain Renewable Energy

[...]

Runzhao Lu¹, Tao Ding¹, Boyu Qin¹, Jin Ma², Xin Fang³, Zhao Yang Dong⁴ - Show less +2 more•Institutions (4)

Xi'an Jiaotong University¹, University of Sydney², National Renewable Energy Laboratory³, University of New South Wales⁴

01 Jul 2020-IEEE Transactions on Sustainable Energy

TL;DR: To address the uncertain renewable energy in the day-ahead optimal dispatch of energy and reserve, a multi-stage stochastic programming model is established in this paper to minimize the expected total costs and to deal with the “Curse of Dimensionality” of stochastically programming.

...read moreread less

Abstract: To address the uncertain renewable energy in the day-ahead optimal dispatch of energy and reserve, a multi-stage stochastic programming model is established in this paper to minimize the expected total costs. The uncertainties over the multiple stages are characterized by a scenario tree and the optimal dispatch scheme is cast as a decision tree which guarantees the flexibility to decide the reasonable outputs of generation and the adequate reserves accounting for different realizations of renewable energy. Most importantly, to deal with the “Curse of Dimensionality” of stochastic programming, stochastic dual dynamic programming (SDDP) is employed, which decomposes the original problem into several sub-problems according to the stages. Specifically, the SDDP algorithm performs forward pass and backward pass repeatedly until the convergence criterion is satisfied. At each iteration, the original problem is approximated by creating a linear piecewise function. Besides, an improved convergence criterion is adopted to narrow the optimization gaps. The results on the IEEE 118-bus system and real-life provincial power grid show the effectiveness of the proposed model and method.

...read moreread less

95 citations

Journal Article•DOI•

Event-Triggered Adaptive Dynamic Programming for Zero-Sum Game of Partially Unknown Continuous-Time Nonlinear Systems

[...]

Shan Xue¹, Biao Luo², Derong Liu³•Institutions (3)

University of Science and Technology Beijing¹, Chinese Academy of Sciences², Guangdong University of Technology³

01 Sep 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An event-triggered approach is developed based on ADP, which samples the states and updates the weights of NNs at the same time when the event-triggering condition is violated, such that the computational complexity is reduced.

...read moreread less

Abstract: In this paper, the zero-sum game problem is considered for partially unknown continuous-time nonlinear systems, and an event-triggered adaptive dynamic programming (ADP) method is developed to solve the problem. First, an identifier neural network (NN) and a critic NN are applied to approximate the drift system dynamics and the optimal value function, respectively. Subsequently, an event-triggered approach is developed based on ADP, which samples the states and updates the weights of NNs at the same time when the event-triggering condition is violated, such that the computational complexity is reduced. It is proved that the states and the error of NN weights are uniformly ultimately bounded. Finally, the effectiveness of the developed ADP-based event-triggered method is verified through simulation studies.

...read moreread less

92 citations

Journal Article•DOI•

Parallel control for optimal tracking via adaptive dynamic programming

[...]

Jingwei Lu¹, Qinglai Wei¹, Fei-Yue Wang¹•Institutions (1)

Chinese Academy of Sciences¹

26 Oct 2020-IEEE/CAA Journal of Automatica Sinica

TL;DR: It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state Feedback Control with the traditional performance indexfunction.

...read moreread less

Abstract: This paper studies the problem of optimal parallel tracking control for continuous-time general nonlinear systems. Unlike existing optimal state feedback control, the control input of the optimal parallel control is introduced into the feedback system. However, due to the introduction of control input into the feedback system, the optimal state feedback control methods can not be applied directly. To address this problem, an augmented system and an augmented performance index function are proposed firstly. Thus, the general nonlinear system is transformed into an affine nonlinear system. The difference between the optimal parallel control and the optimal state feedback control is analyzed theoretically. It is proven that the optimal parallel control with the augmented performance index function can be seen as the suboptimal state feedback control with the traditional performance index function. Moreover, an adaptive dynamic programming ( ADP ) technique is utilized to implement the optimal parallel tracking control using a critic neural network ( NN ) to approximate the value function online. The stability analysis of the closed-loop system is performed using the Lyapunov theory, and the tracking error and NN weights errors are uniformly ultimately bounded ( UUB ) . Also, the optimal parallel controller guarantees the continuity of the control input under the circumstance that there are finite jump discontinuities in the reference signals. Finally, the effectiveness of the developed optimal parallel control method is verified in two cases.

...read moreread less

82 citations

Journal Article•DOI•

Robust Optimal Control Scheme for Unknown Constrained-Input Nonlinear Systems via a Plug-n-Play Event-Sampled Critic-Only Algorithm

[...]

Huaguang Zhang¹, Kun Zhang¹, Geyang Xiao¹, He Jiang¹•Institutions (1)

Northeastern University (China)¹

01 Sep 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: The proposed robust optimal control algorithm tunes the parameters of critic-only neural network by event-triggering condition and runs in a plug-n-play framework without system functions, where fewer transmissions and less computation are required as all the measurements received simultaneously.

...read moreread less

Abstract: In this paper, a novel event-sampled robust optimal controller is proposed for a class of continuous-time constrained-input nonlinear systems with unknown dynamics. In order to solve the robust optimal control problem, an online data-driven identifier is established to construct the system dynamics, and an event-sampled critic-only adaptive dynamic programming method is developed to replace the conventional time-driven actor–critic structure. The designed online identification method runs during the solving process and is not applied as a priori part for the solutions, which simplifies the architecture and reduces computational load. The proposed robust optimal control algorithm tunes the parameters of critic-only neural network (NN) by event-triggering condition and runs in a plug-n-play framework without system functions, where fewer transmissions and less computation are required as all the measurements received simultaneously. Based on the novel design, the stability of system and the convergence of critic NN are demonstrated by Lyapunov theory, where the state is asymptotically stable and weight error is guaranteed to be uniformly ultimately bounded. Finally, the applications in a basic nonlinear system and the complex rotational–translational actuator problem demonstrate the effectiveness of the proposed method.

...read moreread less

73 citations

Posted Content•

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

[...]

Quentin Cappart¹, Thierry Moisan¹, Louis-Martin Rousseau¹, Isabeau Prémont-Schwarz, Andre A. Cire² - Show less +1 more•Institutions (2)

École Polytechnique de Montréal¹, University of Toronto²

02 Jun 2020-arXiv: Artificial Intelligence

TL;DR: This work proposes a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems, and experimentally shows that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.

...read moreread less

Abstract: Combinatorial optimization has found applications in numerous fields, from aerospace to transportation planning and economics. The goal is to find an optimal solution among a finite set of possibilities. The well-known challenge one faces with combinatorial optimization is the state-space explosion problem: the number of possibilities grows exponentially with the problem size, which makes solving intractable for large problems. In the last years, deep reinforcement learning (DRL) has shown its promise for designing good heuristics dedicated to solve NP-hard combinatorial optimization problems. However, current approaches have two shortcomings: (1) they mainly focus on the standard travelling salesman problem and they cannot be easily extended to other problems, and (2) they only provide an approximate solution with no systematic ways to improve it or to prove optimality. In another context, constraint programming (CP) is a generic tool to solve combinatorial optimization problems. Based on a complete search procedure, it will always find the optimal solution if we allow an execution time large enough. A critical design choice, that makes CP non-trivial to use in practice, is the branching decision, directing how the search space is explored. In this work, we propose a general and hybrid approach, based on DRL and CP, for solving combinatorial optimization problems. The core of our approach is based on a dynamic programming formulation, that acts as a bridge between both techniques. We experimentally show that our solver is efficient to solve two challenging problems: the traveling salesman problem with time windows, and the 4-moments portfolio optimization problem. Results obtained show that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.

...read moreread less

70 citations

Journal Article•DOI•

An adaptive equivalent consumption minimization strategy for plug-in hybrid electric vehicles based on traffic information

[...]

Zhenzhen Lei¹, Zhenzhen Lei², Datong Qin², Liliang Hou², Jingyu Peng², Yonggang Liu², Zheng Chen³ - Show less +3 more•Institutions (3)

Chongqing University of Science and Technology¹, Chongqing University², Kunming University of Science and Technology³

01 Jan 2020-Energy

TL;DR: Simulation and experimental results highlight that the proposed strategy can lead to less fuel consumption, compared to traditional equivalent consumption minimization strategy, thereby proving its feasibility.

...read moreread less

69 citations

Journal Article•DOI•

Event-Triggered Adaptive Critic Control Design for Discrete-Time Constrained Nonlinear Systems

[...]

Mingming Ha¹, Ding Wang², Derong Liu³•Institutions (3)

University of Science and Technology Beijing¹, Chinese Academy of Sciences², Guangdong University of Technology³

01 Sep 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Through event-triggered approach, the constrained near-optimal control problem for a class of nonlinear discrete-time systems is investigated and solved by heuristic dynamic programming (HDP) technique and a nonquadratic performance index is introduced.

...read moreread less

Abstract: In this paper, through event-triggered approach, the constrained near-optimal control problem for a class of nonlinear discrete-time systems is investigated and solved by heuristic dynamic programming (HDP) technique. The proposed method can reduce the amount of computation remarkably without deteriorating the system stability. In order to overcome the control constraints and reduce the computational burden, a nonquadratic performance index is introduced. Then, stability analysis of the event-triggered system with control constraints and an event-triggered constrained controller design algorithm are given. Three neural networks are used in the HDP scheme, which are designed to identify the unknown nonlinear system, approximate value function, and control law, respectively. In the model neural network, an effective method is developed to initialize its weights. Finally, two examples are included to demonstrate the present method.

...read moreread less

Journal Article•DOI•

Optimal Power Management Based on Q-Learning and Neuro-Dynamic Programming for Plug-in Hybrid Electric Vehicles

[...]

Chang Liu¹, Yi Lu Murphey¹•Institutions (1)

University of Michigan¹

01 Jun 2020-IEEE Transactions on Neural Networks

TL;DR: A Q-learning-based in-vehicle learning system that is free of physical models and can robustly converge to an optimal energy control solution is presented and a new initialization strategy, which combines the optimal learning with a properly selected penalty function is introduced.

...read moreread less

Abstract: Energy optimization for plug-in hybrid electric vehicles (PHEVs) is a challenging problem due to the system complexity and many physical and operational constraints in PHEVs. In this paper, we present a Q-learning-based in-vehicle learning system that is free of physical models and can robustly converge to an optimal energy control solution. The proposed machine learning algorithms combine neuro-dynamic programming (NDP) with future trip information to effectively estimate the expected future energy cost (expected cost-to-go) for a given vehicle state and control actions. The convergences of these learning algorithms were demonstrated on both fixed and randomly selected drive cycles. Based on the characteristics of these learning algorithms, we propose a two-stage deployment solution for PHEV power management applications. Furthermore, we introduce a new initialization strategy, which combines the optimal learning with a properly selected penalty function. This initialization scheme can reduce the learning convergence time by 70%, which is a significant improvement for in-vehicle implementation efficiency. Finally, we develop a neural network (NN) for predicting battery state-of-charge (SoC), rendering the proposed power management controller completely free of physical models.

...read moreread less

Journal Article•DOI•

Discrete-Time Impulsive Adaptive Dynamic Programming

[...]

Qinglai Wei¹, Ruizhuo Song², Zehua Liao¹, Benkai Li¹, Frank L. Lewis³ - Show less +1 more•Institutions (3)

Chinese Academy of Sciences¹, University of Science and Technology Beijing², University of Texas at Arlington³

01 Oct 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems by considering the constraint of the impulsive interval.

...read moreread less

Abstract: In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal impulsive control problems for infinite horizon discrete-time nonlinear systems. Considering the constraint of the impulsive interval, in each iteration, the iterative impulsive value function under each possible impulsive interval is obtained, and then the iterative value function and iterative control law are achieved. A new convergence analysis method is developed which proves an iterative value function to converge to the optimum as the iteration index increases to infinity. The properties of the iterative control law are analyzed, and the detailed implementation of the optimal impulsive control law is presented. Finally, two simulation examples with comparisons are given to show the effectiveness of the developed method.

...read moreread less

Journal Article•DOI•

MDP-Based Distribution Network Reconfiguration With Renewable Distributed Generation: Approximate Dynamic Programming Approach

[...]

Chong Wang¹, Shunbo Lei², Ping Ju¹, Chen Chen³, Chaoyi Peng⁴, Yunhe Hou⁵ - Show less +2 more•Institutions (5)

Hohai University¹, University of Michigan², Xi'an Jiaotong University³, China Southern Power Grid Company⁴, University of Hong Kong⁵

03 Jan 2020-IEEE Transactions on Smart Grid

TL;DR: A state-based sequential network reconfiguration strategy by using a Markov decision process (MDP) model with the objective of minimizing renewable distributed generation curtailment and load shedding under operational constraints is developed.

...read moreread less

Abstract: Growing penetration of renewable distributed generation, a major concern nowadays, has played a critical role in distribution system operation. This paper develops a state-based sequential network reconfiguration strategy by using a Markov decision process (MDP) model with the objective of minimizing renewable distributed generation curtailment and load shedding under operational constraints. Available power outputs of distributed generators and the system topology in each decision time are represented as Markov states, which are driven to other Markov states in next decision time in consideration of uncertainties of renewable distributed generation. For each Markov state in each decision time, a recursive optimization model with a current cost and a future cost is developed to make state-based actions, including system reconfiguration, load shedding, and distributed generation curtailment. To address the curse of dimensionality caused by enormous states and actions in the proposed model, an approximate dynamic programming (ADP) approach, including post-decision states and forward dynamic algorithm, is used to solve the proposed MDP-based model. IEEE 33-bus system and IEEE 123-bus system are used to validate the proposed model.

...read moreread less

Journal Article•DOI•

A Model-Free Control Strategy for Vehicle Lateral Stability With Adaptive Dynamic Programming

[...]

Weichao Sun¹, Xin Wang¹, Changzhu Zhang²•Institutions (2)

Harbin Institute of Technology¹, Tongji University²

01 Dec 2020-IEEE Transactions on Industrial Electronics

TL;DR: This article presents a nonmodel-based controller design for vehicle dynamic systems to improve lateral stability, where output tracking control and adaptive dynamic programming approaches are employed to track the desired yaw rate and, at the same time, mitigate the sideslip angle, roll angle, and roll rate of the vehicle.

...read moreread less

Abstract: This article presents a nonmodel-based controller design for vehicle dynamic systems to improve lateral stability, where output tracking control and adaptive dynamic programming approaches are employed to track the desired yaw rate and, at the same time, mitigate the sideslip angle, roll angle, and roll rate of the vehicle. Moreover, different from some existing optimization methods in control allocation, the proposed control strategies, which distribute tire forces by learning, are only using the information of states, input, and reference signal instead of the knowledge of the vehicle system. The iterative process repeatedly uses the information about state and input to calculate the feedback gain. It can significantly reduce the learning time and computational burden. The effectiveness of the proposed controller design method is shown by CarSim simulations.

...read moreread less

Journal Article•DOI•

Coarse Trajectory Design for Energy Minimization in UAV-Enabled

[...]

Dinh-Hieu Tran¹, Thang X. Vu¹, Symeon Chatzinotas¹, Shahram Shahbazpanahi¹, Bjorn Ottersten¹ - Show less +1 more•Institutions (1)

University of Luxembourg¹

10 Jun 2020-IEEE Transactions on Vehicular Technology

TL;DR: In this paper, the authors designed the UAV trajectory to minimize the total energy consumption while satisfying the requested timeout (RT) requirement and energy budget, which is accomplished via jointly optimizing the path and UAV's velocities along subsequent hops.

...read moreread less

Abstract: In this paper, we design the UAV trajectory to minimize the total energy consumption while satisfying the requested timeout (RT) requirement and energy budget, which is accomplished via jointly optimizing the path and UAV's velocities along subsequent hops. The corresponding optimization problem is difficult to solve due to its non-convexity and combinatorial nature. To overcome this difficulty, we solve the original problem via two consecutive steps. Firstly, we propose two algorithms, namely heuristic search, and dynamic programming (DP) to obtain a feasible set of paths without violating the GU's RT requirements based on the traveling salesman problem with time window (TSPTW). Then, they are compared with exhaustive search and traveling salesman problem (TSP) used as reference methods. While the exhaustive algorithm achieves the best performance at a high computation cost, the heuristic algorithm exhibits poorer performance with low complexity. As a result, the DP is proposed as a practical trade-off between the exhaustive and heuristic algorithms. Specifically, the DP algorithm results in near-optimal performance at a much lower complexity. Secondly, for given feasible paths, we propose an energy minimization problem via a joint optimization of the UAV's velocities along subsequent hops. Finally, numerical results are presented to demonstrate the effectiveness of our proposed algorithms. The results show that the DP-based algorithm approaches the exhaustive search's performance with a significantly reduced complexity. It is also shown that the proposed solutions outperform the state-of-the-art benchmarks in terms of both energy consumption and outage performance.

...read moreread less

Journal Article•DOI•

Real-Time Stochastic Optimization of Energy Storage Management Using Deep Learning-Based Forecasts for Residential PV Applications

[...]

Faeza Hafiz¹, M A Awal¹, Anderson Rodrigo de Queiroz², Iqbal Husain¹•Institutions (2)

North Carolina State University¹, North Carolina Central University²

21 Jan 2020-IEEE Transactions on Industry Applications

TL;DR: The proposed real-time method is effective in reducing the net electricity purchase cost compared to other existing energy management methods and introduces a rule-based controller underneath the optimization layer in finer time resolution at the power electronics converter control level.

...read moreread less

Abstract: A computationally proficient real-time energy management method with stochastic optimization is presented for a residential photovoltaic (PV)-storage hybrid system comprised of a solar PV generation and a battery energy storage (BES). Existing offline energy management approaches for day-ahead scheduling of BES suffer from energy loss in real time due to the stochastic nature of load and solar generation. On the other hand, typical online algorithms do not offer optimal solutions for minimizing electricity purchase costs to the owners. To overcome these limitations, we propose an integrated energy management framework consisting of an offline optimization model concurrent with a real-time rule-based controller. The optimization is performed in receding horizon with load and solar generation forecast profiles using deep learning-based long short term memory method in rolling horizon to reduce the daily electricity purchase costs. The optimization model is formulated as a multistage stochastic program where we use the stochastic dual dynamic programming algorithm in the receding horizon to update the optimal set point for BES dispatch at a fixed interval. To prevent loss of energy during optimal solution update intervals, we introduce a rule-based controller underneath the optimization layer in finer time resolution at the power electronics converter control level. The proposed framework is evaluated using a real-time controller-hardware-in-the-loop test platform in an OPAL-RT simulator. The proposed real-time method is effective in reducing the net electricity purchase cost compared to other existing energy management methods.

...read moreread less

Proceedings Article•

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

[...]

Junyu Zhang¹, Alec Koppel², Amrit Singh Bedi², Csaba Szepesvári³, Mengdi Wang - Show less +1 more•Institutions (3)

Princeton University¹, United States Army Research Laboratory², University of Alberta³

01 Jan 2020

TL;DR: A new Variational Policy Gradient Theorem for RL with general utilities is derived, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.

...read moreread less

Abstract: In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases. Such generality invalidates the Bellman equation. As this means that dynamic programming no longer works, we focus on direct policy search. Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function. We develop a variational Monte Carlo gradient estimation algorithm to compute the policy gradient based on sample paths. We prove that the variational policy gradient scheme converges globally to the optimal policy for the general objective, though the optimization problem is nonconvex. We also establish its rate of convergence of the order $O(1/t)$ by exploiting the hidden convexity of the problem, and proves that it converges exponentially when the problem admits hidden strong convexity. Our analysis applies to the standard RL problem with cumulative rewards as a special case, in which case our result improves the available convergence rate.

...read moreread less

Journal Article•DOI•

Cooperative decision-making for mixed traffic: A ramp merging example

[...]

Zhanbo Sun¹, Tianyu Huang¹, Peitong Zhang²•Institutions (2)

Southwest Jiaotong University¹, Xihua University²

01 Nov 2020-Transportation Research Part C-emerging Technologies

TL;DR: The results show that compared to the scenario with 100% HVs, ramp-merging can be smoother in mixed traffic environment and traffic throughput can be further increased by 10–15%.

...read moreread less

Abstract: The rapid conceptual development and commercialization of connected automated vehicle (CAV) has led to the problem of mixed traffic, i.e., traffic mixed with CAVs and conventional human-operated vehicles (HVs). The paper studies cooperative decision-making for mixed traffic (CDMMT). Using discrete optimization, a CDMMT mechanism is developed to facilitate ramp merging, and to properly capture the cooperative and non-cooperative behaviors in mixed traffic. The CDMMT mechanism can be described as a bi-level optimization program in which state-constrained optimal control-based trajectory design problems are imbedded in a sequencing problem. A bi-level dynamic programming-based solution approach is developed to efficiently solve the problem. The proposed modeling mechanism and solution approach are generic to deterministic decisions and can guarantee system-efficient solutions. A micro-simulation environment is built for model validation and analysis of mixed traffic. The results show that compared to the scenario with 100% HVs, ramp-merging can be smoother in mixed traffic environment. At high CAV penetration, the section throughput increases about 18%. With the proposed CDMMT mechanism, traffic throughput can be further increased by 10–15%. The proposed methods form the basis of traffic analysis and cooperative control at ramp-merging sections under mixed traffic environment.

...read moreread less

Journal Article•DOI•

Comparative Analysis of Energy Management Strategies for HEV: Dynamic Programming and Reinforcement Learning

[...]

Heeyun Lee¹, Changhee Song¹, Namwook Kim², Suk Won Cha¹•Institutions (2)

Seoul National University¹, Hanyang University²

07 Apr 2020-IEEE Access

TL;DR: It is shown that the reinforcement learning-based strategy can obtain global optimality in the optimal control problem with an infinite horizon, which can also be obtained by stochastic dynamic programming.

...read moreread less

Abstract: Energy management strategy is an important factor in determining the fuel economy of hybrid electric vehicles; thus, much research on how to distribute the required power to engines and motors of hybrid vehicles is required. Recently, various studies have been conducted based on reinforcement learning to optimally control the hybrid electric vehicle. In fact, the fundamental control approach of reinforcement learning shares many control frameworks with the control approach by using deterministic dynamic programming or stochastic dynamic programming. In this study, we compare the reinforcement learning based strategy by using these dynamic programming-based control approaches. For optimal control of hybrid electric vehicle, each control method was compared in terms of fuel efficiency by performing simulation by using various driving cycles. Based on our simulations, we showed the reinforcement learning-based strategy can obtain global optimality in the optimal control problem with an infinite horizon, which can also be obtained by stochastic dynamic programming. We also showed that the reinforcement learning-based strategy can present a solution close to the optimal one using deterministic dynamic programming, while a reinforcement learning-based strategy is more appropriate for a time variant controller with boundary value constraints. In addition, we verified the convergence characteristics of the control strategy based on reinforcement learning, when transfer learning was performed through value initialization using stochastic dynamic programming.

...read moreread less

Journal Article•DOI•

A predictive energy management strategy for multi-mode plug-in hybrid electric vehicles based on multi neural networks

[...]

Yitao Wu¹, Yuanjian Zhang², Guang Li³, Jiangwei Shen¹, Zheng Chen¹, Zheng Chen³, Yonggang Liu⁴ - Show less +3 more•Institutions (4)

Kunming University of Science and Technology¹, Queen's University Belfast², Queen Mary University of London³, Chongqing University⁴

01 Oct 2020-Energy

TL;DR: A series of numerical simulation results validate that the optimal performance yielded from global optimal strategy can be exploited online to attain the satisfied cost reduction, compared with equivalent consumption minimum strategy, with the assistance of estimated real time co-state and slacked reference.

...read moreread less

Journal Article•DOI•

Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback

[...]

Syed Ali Asad Rizvi¹, Zongli Lin¹•Institutions (1)

University of Virginia¹

01 Nov 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback using the proposed VI method, which does not require an initially stabilizing policy.

...read moreread less

Abstract: In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input–output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.

...read moreread less

Journal Article•DOI•

An effective three-stage hybrid optimization method for source-network-load power generation of cascade hydropower reservoirs serving multiple interconnected power grids

[...]

Zhong-kai Feng¹, Wen-jing Niu², Xiong Cheng³, Jia-yang Wang⁴, Sen Wang, Zhen-guo Song - Show less +2 more•Institutions (4)

Huazhong University of Science and Technology¹, Changjiang Water Resources Commission², China Three Gorges University³, Electric Power Research Institute⁴

10 Feb 2020-Journal of Cleaner Production

TL;DR: A three-stage hybrid method is developed to satisfy this practical requirement, where the domain knowledge is used to build a virtual load curve balancing the load features and electricity contracts of multiple power grids; the results indicate that the hybrid method can achieve satisfactory scheduling results in different cases.

...read moreread less

Journal Article•DOI•

A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system

[...]

Jong Woo Kim¹, Byung Jun Park¹, Haeun Yoo², Tae Hoon Oh¹, Jay H. Lee², Jong Min Lee¹ - Show less +2 more•Institutions (2)

Seoul National University¹, KAIST²

01 Mar 2020-Journal of Process Control

TL;DR: This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations, and shows that the use of DNNs can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise.

...read moreread less

Journal Article•DOI•

Data-Based Adaptive Dynamic Programming for a Class of Discrete-Time Systems With Multiple Delays

[...]

Huaguang Zhang¹, Yang Liu¹, Geyang Xiao¹, He Jiang¹•Institutions (1)

Northeastern University (China)¹

01 Feb 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A data-based control method based on adaptive dynamic programming (ADP) algorithm is proposed for a class of discrete-time (DT) systems in the case of multiple delays that only composed of input and output data.

...read moreread less

Abstract: In this paper, a data-based control method based on adaptive dynamic programming (ADP) algorithm is proposed for a class of discrete-time (DT) systems in the case of multiple delays. Data-based ADP method is implemented by virtue of the measured input and output data. The condition of the existence of the corresponding equivalent multiple delays system is derived according to the characteristics of time-delay system. A novel data-based state equation is developed that only composed of input and output data, which is very meaningful in practical applications. By using the data-based ADP method, the output feedback control problem is solved only by measuring the input and output of the system with multiple delays. The convergence proofs of the designed policy iteration and value iteration algorithms are given, respectively. The simulation example is presented to demonstrate the validity of the proposed data-based ADP method.

...read moreread less

Journal Article•DOI•

A Dynamic Predictive Traffic Signal Control Framework in a Cross-Sectional Vehicle Infrastructure Integration Environment

[...]

Zhihong Yao¹, Luou Shen¹, Ronghui Liu², Yangsheng Jiang¹, Xiaoguang Yang³ - Show less +1 more•Institutions (3)

Southwest Jiaotong University¹, University of Leeds², Tongji University³

01 Apr 2020-IEEE Transactions on Intelligent Transportation Systems

TL;DR: In this paper, a dynamic predictive traffic signal control framework for isolated intersections is proposed in a cross-sectional VII environment, which has the ability to predict vehicle arrivals and use this to optimize traffic signals.

...read moreread less

Abstract: With the development of modern wireless communication technology, especially the vehicle infrastructure integration (VII) technology, vehicles’ information such as identification, location, and speed can be readily obtained at upstream cross-section. This information can be used to support traffic signal timing optimization in real time. A dynamic predictive traffic signal control framework for isolated intersections is proposed in a cross-sectional VII environment, which has the ability to predict vehicle arrivals and use this to optimize traffic signals. The proposed dynamic predictive control framework includes a dynamic platoon dispersion model (DPDM) which uses the vehicles’ speed data from the cross-sectional VII environment, as opposed to traditional vehicle passing/existing data, to predict the arriving flow distribution at the downstream stop-line. Then, a dynamic programming algorithm based on the exhaustive optimization of phases (EOP) is proposed working in rolling optimization (RO) scheme with a 2s time horizon. The signal timings are continuously optimized by regarding the minimization of intersection delay as the optimization objective, and setting the green time duration of each phase as a constraint. In the end, the proposed dynamic predictive control framework is tested in a simulated cross-sectional VII environment and a case study carried out based on a real road network. The results show that the proposed framework can reduce the average delay and queue length by up to 33% and 35%, respectively, compared with the traditional full-actuated control.

...read moreread less

Learning Convex Optimization Control Policies

[...]

Akshay Agrawal, Shane Barratt, Stephen Boyd, Bartolomeo Stellato

31 Jul 2020

TL;DR: This paper proposes a method to automate the tuning of convex optimization control policies by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters.

...read moreread less

Abstract: Many control policies used in various applications determine the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex control-Lyapunov or approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a crude grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex optimization problem with respect to its parameters. We illustrate our method on several examples.

...read moreread less

Journal Article•DOI•

Energy-Efficient Subway Train Scheduling Design With Time-Dependent Demand Based on an Approximate Dynamic Programming Approach

[...]

Renming Liu¹, Shukai Li¹, Lixing Yang¹, Jiateng Yin¹•Institutions (1)

Beijing Jiaotong University¹

01 Jul 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An approximate dynamic programming (DP) approach for energy-efficient subway train scheduling problem with time-dependent demand is designed, where the conceptions of states, policies, state transitions, and reward function are introduced.

...read moreread less

Abstract: Owing to environmental concerns, the energy-efficient subway train scheduling problem is necessary in subway operation management. This paper designs an approximate dynamic programming (DP) approach for energy-efficient subway train scheduling problem with time-dependent demand. The train traffic model is proposed with the dynamic equations for the evolution of train headway, train passenger loads, and the energy consumption along the subway line. For the dynamic changing of the onboard passengers with time, the total train energy usage is modeled as the sum of energy consumptions from the traction system and auxiliary facilities. A nonlinear DP problem is formulated to generate a near optimal timetable to realize the tradeoff among the utilization of trains, passenger waiting time, service levels, and energy consumption. To overcome the curse of dimensionality in this optimization problem, we construct an approximate DP framework, where the conceptions of states, policies, state transitions, and reward function are introduced. And this algorithm is able to converge to a good solution with a short time compared to the genetic algorithm and differential evolution algorithm. Finally, the numerical experiments are given to demonstrate the effectiveness of the proposed model and algorithm.

...read moreread less

Journal Article•DOI•

A Universal Dynamic Program and Refined Existence Results for Decentralized Stochastic Control

[...]

Serdar Yüksel

08 Sep 2020-Siam Journal on Control and Optimization

TL;DR: A general (universally applicable) dynamic programming formulation is introduced, its well-posedness is established, and new existence results for optimal policies in decentralized stochastic control are obtained.

...read moreread less

Abstract: For sequential stochastic control problems with standard Borel measurement and control action spaces, we introduce a general (universally applicable) dynamic programming formulation, establish its ...

...read moreread less

Journal Article•DOI•

Constrained Event-Triggered H∞ Control Based on Adaptive Dynamic Programming With Concurrent Learning

[...]

Shan Xue¹, Biao Luo², Derong Liu³, Yin Yang⁴•Institutions (4)

South China University of Technology¹, Central South University², University of Illinois at Chicago³, Khalifa University⁴

12 Jun 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An event-triggered ADP control method based on adaptive dynamic programming with concurrent learning for unknown continuous-time nonlinear systems with control constraints with system identification technique based on neural networks to identify completely unknown systems.

...read moreread less

Abstract: In this article, an event-triggered H∞ control method is proposed based on adaptive dynamic programming (ADP) with concurrent learning for unknown continuous-time nonlinear systems with control constraints. First, a system identification technique based on neural networks (NNs) is adopted to identify completely unknown systems. Second, a critic NN is employed to approximate the value function. A novel weight updating rule is developed based on the event-triggered control law and time-triggered disturbance law, which reduces controller execution times and guarantees the stability of the system. Subsequently, concurrent learning is applied to the weight updating rule to relax the demand for the traditional persistence of excitation condition that is difficult to implement online. Finally, the comparison between the time-triggered method and event-triggered method in simulation demonstrates the effectiveness of the developed constrained event-triggered ADP method.

...read moreread less

Collapse