scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 2019"


Journal ArticleDOI
TL;DR: The proposed ADPED algorithm can be adaptive to both day-ahead and intra-day operation under uncertainty and can make full use of historical prediction error distribution to reduce the influence of inaccurate forecast on the system operation.
Abstract: This paper proposes an approximate dynamic programming (ADP)-based approach for the economic dispatch (ED) of microgrid with distributed generations. The time-variant renewable generation, electricity price, and the power demand are considered as stochastic variables in this paper. An ADP-based ED (ADPED) algorithm is proposed to optimally operate the microgrid under these uncertainties. To deal with the uncertainties, Monte Carlo method is adopted to sample the training scenarios to give empirical knowledge to ADPED. The piecewise linear function (PLF) approximation with improved slope updating strategy is employed for the proposed method. With sufficient information extracted from these scenarios and embedded in the PLF function, the proposed ADPED algorithm can not only be used in day-ahead scheduling but also the intra-day optimization process. The algorithm can make full use of historical prediction error distribution to reduce the influence of inaccurate forecast on the system operation. Numerical simulations demonstrate the effectiveness of the proposed approach. The near-optimal decision obtained by ADPED is very close to the global optimality. And it can be adaptive to both day-ahead and intra-day operation under uncertainty.

198 citations


Journal ArticleDOI
TL;DR: An extension to SDDP—called stochastic dual dynamic integer programming (SDDiP)—for solving MSIP problems with binary state variables is proposed and it is shown that, under fairly reasonable assumptions, an MSIP problem with general state variables can be approximated by one withbinary state variables to desired precision with only a modest increase in problem size.
Abstract: Multistage stochastic integer programming (MSIP) combines the difficulty of uncertainty, dynamics, and non-convexity, and constitutes a class of extremely challenging problems. A common formulation for these problems is a dynamic programming formulation involving nested cost-to-go functions. In the linear setting, the cost-to-go functions are convex polyhedral, and decomposition algorithms, such as nested Benders’ decomposition and its stochastic variant, stochastic dual dynamic programming (SDDP), which proceed by iteratively approximating these functions by cuts or linear inequalities, have been established as effective approaches. However, it is difficult to directly adapt these algorithms to MSIP due to the nonconvexity of integer programming value functions. In this paper we propose an extension to SDDP—called stochastic dual dynamic integer programming (SDDiP)—for solving MSIP problems with binary state variables. The crucial component of the algorithm is a new reformulation of the subproblems in each stage and a new class of cuts, termed Lagrangian cuts, derived from a Lagrangian relaxation of a specific reformulation of the subproblems in each stage, where local copies of state variables are introduced. We show that the Lagrangian cuts satisfy a tightness condition and provide a rigorous proof of the finite convergence of SDDiP with probability one. We show that, under fairly reasonable assumptions, an MSIP problem with general state variables can be approximated by one with binary state variables to desired precision with only a modest increase in problem size. Thus our proposed SDDiP approach is applicable to very general classes of MSIP problems. Extensive computational experiments on three classes of real-world problems, namely electric generation expansion, financial portfolio management, and network revenue management, show that the proposed methodology is very effective in solving large-scale multistage stochastic integer optimization problems.

196 citations


Journal ArticleDOI
TL;DR: A modular and tractable framework for solving an adaptive distributionally robust linear optimization problem, where the worst-case expected cost is minimized over an ambiguity set of probability distributions, and it is shown that the adaptive Distributionally robustlinear optimization problem can be formulated as a classical robust optimization problem.
Abstract: We develop a modular and tractable framework for solving an adaptive distributionally robust linear optimization problem, where we minimize the worst-case expected cost over an ambiguity set of pro...

192 citations


Journal ArticleDOI
TL;DR: A novel dynamic energy management system is developed to incorporate efficient management of energy storage system into MG real- time dispatch while considering power flow constraints and uncertainties in load, renewable generation and real-time electricity price.
Abstract: This paper focuses on economical operation of a microgrid (MG) in real-time. A novel dynamic energy management system is developed to incorporate efficient management of energy storage system into MG real-time dispatch while considering power flow constraints and uncertainties in load, renewable generation and real-time electricity price. The developed dynamic energy management mechanism does not require long-term forecast and optimization or distribution knowledge of the uncertainty, but can still optimize the long-term operational costs of MGs. First, the real-time scheduling problem is modeled as a finite-horizon Markov decision process over a day. Then, approximate dynamic programming and deep recurrent neural network learning are employed to derive a near optimal real-time scheduling policy. Last, using real power grid data from California independent system operator, a detailed simulation study is carried out to validate the effectiveness of the proposed method.

155 citations


Journal ArticleDOI
TL;DR: An efficient DP-SH (dynamic programming with shooting heuristic as a subroutine) algorithm for the integrated optimization problem that can simultaneously optimize the trajectories of CAVs and intersection controllers is proposed and a two-step approach is developed to effectively obtain near-optimal intersection and trajectory control plans.
Abstract: Connected and automated vehicle (CAV) technologies offer promising solutions to challenges that face today’s transportation systems. Vehicular trajectory control and intersection controller optimization based on CAV technologies are two approaches that have significant potential to mitigate congestion, lessen the risk of crashes, reduce fuel consumption, and decrease emissions at intersections. These two approaches should be integrated into a single process such that both aspects can be optimized simultaneously to achieve maximum benefits. This paper proposes an efficient DP-SH (dynamic programming with shooting heuristic as a subroutine) algorithm for the integrated optimization problem that can simultaneously optimize the trajectories of CAVs and intersection controllers (i.e., signal timing and phasing of traffic signals), and develops a two-step approach (DP-SH and trajectory optimization) to effectively obtain near-optimal intersection and trajectory control plans. Also, the proposed DP-SH algorithm can also consider mixed traffic stream scenarios with different levels of CAV market penetration. Numerical experiments are conducted, and the results prove the efficiency and sound performance of the proposed optimization framework. The proposed DP-SH algorithm, compared to the adaptive signal control, can reduce the average travel time by up to 35.72% and save the consumption by up to 31.5%. In mixed traffic scenarios, system performance improves with increasing market penetration rates. Even with low levels of penetration, there are significant benefits in fuel consumption savings. The computational efficiency, as evidenced in the case studies, indicates the applicability of DP-SH for real-time implementation.

155 citations


Journal ArticleDOI
TL;DR: The proposed energy management strategy, based on double deep Q-learning algorithm, prevents training process falling into the overoptimistic estimate of policy value and highlights its significant advantages in terms of the iterative convergence rate and optimization performance.

143 citations


Journal ArticleDOI
TL;DR: A joint computation offloading and multiuser scheduling algorithm in NB-IoT edge computing system that minimizes the long-term average weighted sum of delay and power consumption under stochastic traffic arrival is proposed.
Abstract: The Internet of Things (IoT) connects a huge number of resource-constraint IoT devices to the Internet, which generate massive amount of data that can be offloaded to the cloud for computation. As some of the applications may require very low latency, the emerging mobile edge computing (MEC) architecture offers cloud services by deploying MEC servers at the mobile base stations (BSs). The IoT devices can transmit the offloaded data to the BS for computation at the MEC server. Narrowband-IoT (NB-IoT) is a new cellular technology for the transmission of IoT data to the BS. In this paper, we propose a joint computation offloading and multiuser scheduling algorithm in NB-IoT edge computing system that minimizes the long-term average weighted sum of delay and power consumption under stochastic traffic arrival. We formulate the dynamic optimization problem into an infinite-horizon average-reward continuous-time Markov decision process (CTMDP) model. In order to deal with the curse-of-dimensionality problem, we use the approximate dynamic programming techniques, i.e., the linear value-function approximation and temporal-difference learning with post-decision state and semi-gradient descent method, to derive a simple algorithm for the solution of the CTMDP model. The proposed algorithm is semi-distributed, where the offloading algorithm is performed locally at the IoT devices, while the scheduling algorithm is auction-based where the IoT devices submit bids to the BS to make the scheduling decision centrally. Simulation results show that the proposed algorithm provides significant performance improvement over the two baseline algorithms and the MUMTO algorithm which is designed based on the deterministic task model.

103 citations


Journal ArticleDOI
TL;DR: This work proposes a new type of decomposition algorithm, based on the recently proposed framework of stochastic dual dynamic integer programming (SDDiP), to solve the multistage stochastics unit commitment (MSUC) problem and proposes a variety of computational enhancements to SDDiP.
Abstract: Unit commitment (UC) is a key operational problem in power systems for the optimal schedule of daily generation commitment. Incorporating uncertainty in this already difficult mixed-integer optimization problem introduces significant computational challenges. Most existing stochastic UC models consider either a two-stage decision structure, where the commitment schedule for the entire planning horizon is decided before the uncertainty is realized, or a multistage stochastic programming model with relatively small scenario trees to ensure tractability. We propose a new type of decomposition algorithm, based on the recently proposed framework of stochastic dual dynamic integer programming (SDDiP), to solve the multistage stochastic unit commitment (MSUC) problem. We propose a variety of computational enhancements to SDDiP, and conduct systematic and extensive computational experiments to demonstrate that the proposed method is able to handle elaborate stochastic processes and can solve MSUCs with a huge number of scenarios that are impossible to handle by existing methods.

99 citations


Journal ArticleDOI
TL;DR: This paper proposes an approximate dynamic programming (ADP) based algorithm for the real-time operation of the microgrid under uncertainties, which decomposes the original multitime periods MINLP problem into single-time period nonlinear programming problems.
Abstract: This paper proposes an approximate dynamic programming (ADP) based algorithm for the real-time operation of the microgrid under uncertainties. First, the optimal operation of the microgrid is formulated as a stochastic mixed-integer nonlinear programming (MINLP) problem, combining the ac power flow and the detailed operational character of the battery. For this NP-hard problem, the proposed ADP based energy management algorithm decomposes the original multitime periods MINLP problem into single-time period nonlinear programming problems. Thus, the sequential decisions can be made by solving Bellman's equation. Historical data is utilized offline to improve the optimality of the real-time decision, and the dependency on the forecast information is reduced. Comparative numerical simulations with several existing methods demonstrate the effectiveness and efficiency of the proposed algorithm.

95 citations


Journal ArticleDOI
TL;DR: This work is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure, and leads to significantly more accurate predictions on the longest sequence families in that database, as well as improved accuracies for long-range base pairs.
Abstract: Motivation Predicting the secondary structure of an ribonucleic acid (RNA) sequence is useful in many applications. Existing algorithms [based on dynamic programming] suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Results We present a novel alternative O(n3)-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in O(n) time and O(n) space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5'-to-3') direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models. Availability and implementation Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (sequence limit: 100 000nt). Supplementary information Supplementary data are available at Bioinformatics online.

88 citations


Journal ArticleDOI
TL;DR: In the improved BAP, to speed up the solution for the pricing problem, a multi-vehicle approximate dynamic programming (MVADP) algorithm that is based on the labeling algorithm is developed that reduces labels by integrating the calculation of pricing problems for all vehicle types.
Abstract: Heterogeneous fleet vehicles can be used to reduce carbon emissions. We propose an improved branch-and-price (BAP) algorithm to precisely solve the heterogeneous fleet green vehicle routing problem with time windows (HFGVRPTW). In the improved BAP, to speed up the solution for the pricing problem, we develop a multi-vehicle approximate dynamic programming (MVADP) algorithm that is based on the labeling algorithm. The MVADP algorithm reduces labels by integrating the calculation of pricing problems for all vehicle types. In addition, to rapidly obtain a tighter upper bound, we propose an integer branch method. For each branch, we solve the master problem with the integer constraint by the CPLEX solver using the columns produced by column generation. We retain the smaller of the obtained integer solution and the current upper bound, and the branches are thus reduced significantly. Extensive computational experiments were performed on the Solomon benchmark instances. The results show that the branches and computational time were reduced significantly by the improved BAP algorithm.

Journal ArticleDOI
01 Jan 2019-Energy
TL;DR: An approximate optimization method, called rapid dynamic programming (Rapid-DP), is developed and discussed in this paper, and is leveraged, for the first time, to optimize key powertrain parameters for power split hybrid electric vehicles.

Journal ArticleDOI
TL;DR: A novel hybrid modeling method combining both recurrent neural networks and Ornstein–Uhlenbeck process is developed to obtain accurate power models for both photovoltaic panels and loads and formulate the energy management issue into a stochastic optimal control problem and solve it via dynamic programming approach.
Abstract: In this paper, an energy management issue is considered for energy Internet where microgrids (MGs) are interconnected via energy routers (ERs). Focusing on an individual MG, we propose controllers in microturbines (MTs) and the ER, such that the following three criteria are hold simultaneously. First, a bottom-up energy management approach is realized. Second, the operation cost of utilizing battery energy storage devices is minimized. Third, the situation of overcontrol with respect to MTs is considered to be avoided. Besides, we develop a novel hybrid modeling method combining both recurrent neural networks and Ornstein–Uhlenbeck process to obtain accurate power models for both photovoltaic panels and loads. Next, we formulate our energy management issue into a stochastic optimal control problem and solve it via dynamic programming approach. Finally, examples illustrating the feasibility of the proposed methods are provided.

Journal ArticleDOI
TL;DR: A policy iteration algorithm based on distributed asynchronous update mechanism is proposed to learn the coupled Hamilton–Jacobi–Bellman equations online and the measured data-based critic-actor neural networks are adopted to approximate the value functions and the control policies, respectively.
Abstract: This paper is concerned with data-driven distributed optimal consensus control for unknown multiagent systems (MASs) with input delays. The input-delayed MAS model is first converted into a delay-free form using a model reduction method. By establishing an equivalent relationship on the predesigned performance indices of the two MASs, optimal consensus control of input-delayed MAS can be fully transformed to that of delay-free MAS. Based on the coupled Hamilton–Jacobi equations and Bellman’s optimality principle, optimal consensus control policies are derived for the transformed delay-free MAS. Then a policy iteration algorithm based on distributed asynchronous update mechanism is proposed to learn the coupled Hamilton–Jacobi–Bellman equations online. To perform the proposed data-driven adaptive dynamic programming algorithm, we adopt the measured data-based critic-actor neural networks to approximate the value functions and the control policies, respectively. Finally, a simulation example is given to illustrate the effectiveness of the proposed method.

Journal ArticleDOI
TL;DR: A novel adaptive dynamic programming (ADP) algorithm is developed to solve the optimal tracking control problem of discrete-time multi-agent systems and an actor-critic neural network is used to approximate both the iterative control laws and the Iterative performance index functions.

Journal ArticleDOI
TL;DR: A novel iterative adaptive dynamic programming (ADP) algorithm is developed to obtain the desired suboptimal solution with the help of auxiliary quasi-HJB equation, and the algorithm convergence is investigated via the intensive use of the mathematical analysis.

Journal ArticleDOI
TL;DR: This paper addresses the energy maximization problem of wave energy converters subject to nonlinearities and constraints, and presents an efficient online control strategy based on the principle of adaptive dynamic programming (ADP) for solving the associated Hamilton–Jacobi–Bellman equation.
Abstract: In this paper, we address the energy maximization problem of wave energy converters (WEC) subject to nonlinearities and constraints, and present an efficient online control strategy based on the principle of adaptive dynamic programming (ADP) for solving the associated Hamilton–Jacobi–Bellman equation. To solve the derived constrained nonlinear optimal control problem, a critic neural network (NN) is used to approximate the time-dependant optimal cost value and then calculate the practical suboptimal causal control action. The proposed novel WEC control strategy leads to a simplified ADP framework without involving the widely used actor NN. The significantly improved computational efficacy of the proposed control makes it attractive for its practical implementation on a WEC to achieve a reduced unit cost of energy output, which is especially important when the dynamics of a WEC are complicated and need to be described accurately by a high-order model with nonlinearities and constraints. Simulation results are provided to show the efficacy of the proposed control method.

Proceedings ArticleDOI
02 Jul 2019
TL;DR: It is proved that, for any given sampling strategy, the Maximum Age First (MAF) scheduling strategy provides the best age performance among all scheduling strategies.
Abstract: In this paper, we consider the problem of minimizing the age of information in a multi-source system, where samples are taken from multiple sources and sent to a destination via a channel with random delay. Due to interference, only one source can be scheduled at a time. We consider the problem of finding a decision policy that determines the sampling times and transmission order of the sources for minimizing the total average peak age (TaPA) and the total average age (TaA) of the sources. Our investigation of this problem results in an important separation principle: The optimal scheduling strategy and the optimal sampling strategy are independent of each other. In particular, we prove that, for any given sampling strategy, the Maximum Age First (MAF) scheduling strategy provides the best age performance among all scheduling strategies. This transforms our overall optimization problem into an optimal sampling problem, given that the decision policy follows the MAF scheduling strategy. While the zero-wait sampling strategy (in which a sample is generated once the channel becomes idle) is shown to be optimal for minimizing the TaPA, it does not always minimize the TaA. We use Dynamic Programming (DP) to investigate the optimal sampling problem for minimizing the TaA. Finally, we provide an approximate analysis of Bellman's equation to approximate the TaA-optimal sampling strategy by a water-filling solution which is shown to be very close to optimal through numerical evaluations.

Journal ArticleDOI
TL;DR: The case study demonstrates that it is possible but time-consuming to solve the MTHS problem to optimality, and shows that a new type of cut, known as strengthened Benders cut, significantly contributes to close the optimality gap compared to classical Benders cuts.
Abstract: Hydropower producers rely on stochastic optimization when scheduling their resources over long periods of time. Due to its computational complexity, the optimization problem is normally cast as a stochastic linear program. In a future power market with more volatile power prices, it becomes increasingly important to capture parts of the hydropower operational characteristics that are not easily linearized, e.g., unit commitment and nonconvex generation curves. Stochastic dual dynamic programming (SDDP) is a state-of-the-art algorithm for long- and medium-term hydropower scheduling with a linear problem formulation. A recently proposed extension of the SDDP method known as stochastic dual dynamic integer programming (SDDiP) has proven convergence also in the nonconvex case. We apply the SDDiP algorithm to the medium-term hydropower scheduling (MTHS) problem and elaborate on how to incorporate stagewise-dependent stochastic variables on the right-hand sides and the objective of the optimization problem. Finally, we demonstrate the capability of the SDDiP algorithm on a case study for a Norwegian hydropower producer. The case study demonstrates that it is possible but time-consuming to solve the MTHS problem to optimality. However, the case study shows that a new type of cut, known as strengthened Benders cut, significantly contributes to close the optimality gap compared to classical Benders cuts.

Journal ArticleDOI
TL;DR: A load-adaptive rule based control strategy that has the stronger capability of battery protecting and energy-saving under unknown load patterns, and can achieve near-optimal energy management in real time with low computational cost is proposed.

Journal ArticleDOI
TL;DR: The whole controller, consisting of distributed adaptive feedforward tracking controller and distributed optimal feedback controller, not only guarantees that all signals in the closed-loop system are uniformly ultimately bounded, but also guarantees that the cooperative cost function is minimized.
Abstract: This paper investigates the distributed optimal tracking control problem for nonlinear multiagent systems with a fixed directed graph. The dynamics of followers are in strict-feedback form with unknown nonlinearities and input saturation. Fuzzy logic systems and auxiliary system are introduced to identify the unknown nonlinearities and compensate the effect of input saturation, respectively. Then, by using the command-filtered backstepping technique, the distributed optimal tracking control problem is transformed into a distributed optimal regulation problem of tracking error dynamics in affine form. Subsequently, the distributed optimal feedback controller is derived via adaptive dynamic programming technique, in which a critic network is constructed to approximate the associated cost function online with a designed weight update law. Therefore, the whole controller, consisting of distributed adaptive feedforward tracking controller and distributed optimal feedback controller, not only guarantees that all signals in the closed-loop system are uniformly ultimately bounded, but also guarantees that the cooperative cost function is minimized. The effectiveness of the proposed method is demonstrated by simulation on the cooperative guidance problem of multimissile systems.

Journal ArticleDOI
TL;DR: A novel near-optimal control scheme for a class of unknown nonlinear continuous-time non-zero-sum (NZS) differential games is investigated and an identifier-critic architecture is developed to obtain the event-triggered controller.
Abstract: In this paper, by incorporating the event-triggered mechanism and the adaptive dynamic programming algorithm, a novel near-optimal control scheme for a class of unknown nonlinear continuous-time non-zero-sum (NZS) differential games is investigated. First, a generalized fuzzy hyperbolic model based identifier is established, using only the input–output data, to relax the requirement for the complete system dynamics. Then, under the event-based framework, the coupled Hamilton–Jacobi equations are derived for the multiplayer NZS games. Then, the adaptive critic design method is employed to approximate the optimal control policies; thus, an identifier-critic architecture is developed to obtain the event-triggered controller. By the virtue of the Lyapunov theory, a state-dependent triggering condition, which is different from the existing works, is developed to achieve the stability of the closed-loop control system both for the continuous and jump dynamics. Finally, two numerical examples are simulated to substantiate the feasibility of the analytical design.

Journal ArticleDOI
TL;DR: The convergence of MsHDP algorithm is proved by demonstrating that it converges to the solution of the Bellman equation.
Abstract: In this paper, the optimal output tracking control problem of discrete-time nonlinear systems is considered. First, the augmented system is derived and the tracking control problem is converted to the regulation problem with a discounted performance index, which relies on the solution of the Bellman equation. It is known that policy iteration and value iteration are two classical algorithms for solving the Bellman equation. Through analysis of the two algorithms, it is found that policy iteration converges fast while requires an initial admissible control policy, and value iteration avoids the requirement of an initial admissible control policy but converges slowly. To achieve the tradeoff between policy iteration and value iteration, the multistep heuristic dynamic programming (MsHDP) is proposed by using multistep policy evaluation scheme. The convergence of MsHDP algorithm is proved by demonstrating that it converges to the solution of the Bellman equation. Subsequently, neural network-based actor-critic structure is developed to implement the MsHDP algorithm. The effectiveness and advantages of the developed MsHDP method are validated through comparative simulation studies.

Journal ArticleDOI
TL;DR: This work studies the traffic signal control problem with connected vehicles by assuming a fixed cycle length and proposes a two-step method to make sure that the obtained optimal solution can lead to the fixed cyclelength.
Abstract: We study the traffic signal control problem with connected vehicles by assuming a fixed cycle length so that the proposed model can be extended readily for the coordination of multiple signals. The problem can be first formulated as a mixed-integer nonlinear program, by considering the information of individual vehicle’s trajectories (i.e., second-by-second vehicle locations and speeds) and their realistic driving/car-following behaviors. The objective function is to minimize the weighted sum of total fuel consumption and travel time. Due to the large dimension of the problem and the complexity of the nonlinear car-following model, solving the nonlinear program directly is challenging. We then reformulate the problem as a dynamic programming model by dividing the timing decisions into stages (one stage for a signal phase) and approximating the fuel consumption and travel time of a stage as functions of the state and decision variables of the stage. We also propose a two-step method to make sure that the obtained optimal solution can lead to the fixed cycle length. Numerical experiments are provided to test the performance of the proposed model using data generated by traffic simulation.

Journal ArticleDOI
TL;DR: The results revealed the superior performances of the branch and bound dynamic programming, and hybrid genetic algorithm with simulated annealing methods over all the compared algorithms, and indicated that the hybrid algorithm can be applied as an alternative to solve small- and large-sized 0–1 knapsack problems.
Abstract: In this paper, we present some initial results of several meta-heuristic optimization algorithms, namely, genetic algorithms, simulated annealing, branch and bound, dynamic programming, greedy search algorithm, and a hybrid genetic algorithm-simulated annealing for solving the 0-1 knapsack problems Each algorithm is designed in such a way that it penalizes infeasible solutions and optimizes the feasible solution The experiments are carried out using both low-dimensional and high-dimensional knapsack problems The numerical results of the hybrid algorithm are compared with the results achieved by the individual algorithms The results revealed the superior performances of the branch and bound dynamic programming, and hybrid genetic algorithm with simulated annealing methods over all the compared algorithms This performance was established by taking into account both the algorithm computational time and the solution quality In addition, the obtained results also indicated that the hybrid algorithm can be applied as an alternative to solve small- and large-sized 0-1 knapsack problems

Journal ArticleDOI
TL;DR: In the paper “Robust Dual Dynamic Programming,” Angelos Georghiou, Angelos Tsoukalas, and Wolfram Wiesemann propose a novel solution scheme for addressing planning problems with long horizons.
Abstract: In the paper “Robust Dual Dynamic Programming,” Angelos Georghiou, Angelos Tsoukalas, and Wolfram Wiesemann propose a novel solution scheme for addressing planning problems with long horizons. Such...

Journal ArticleDOI
TL;DR: This paper investigates the robust control issues of nonlinear multiplayer systems by utilizing adaptive dynamic programming methods and fills a gap in the ADP field, where actuator uncertainties for multiplayer systems are still not addressed.
Abstract: This paper investigates the robust control issues of nonlinear multiplayer systems by utilizing adaptive dynamic programming (ADP) methods and fills a gap in the ADP field, where actuator uncertainties for multiplayer systems are still not addressed. Two types of actuator uncertainties including bounded nonlinear perturbation and unknown constant actuator fault are taken into consideration. First, a data-driven reinforcement learning (RL) approach is derived to learn the optimal solutions of multiplayer nonzero-sum games. Then, based on the obtained optimal control policies, two robust control schemes are developed to handle these two different types of uncertainties, respectively, and the associated stability analysis is also provided. To implement the proposed iterative RL approach, a single neural network (NN) architecture with least-square-based updating law is given, which reduces the computation burden compared with the traditional dual NN architecture. Finally, two numerical examples are shown to test the feasibility of our proposed schemes.

Journal ArticleDOI
TL;DR: A new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems and it is shown that the proposed algorithms converge to the solution of the LQR Riccati equation.
Abstract: Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input–output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.

Journal ArticleDOI
Ji Li1, Quan Zhou1, Yinglong He1, Bin Shuai1, Ziyang Li1, Huw Williams1, Hongming Xu1 
TL;DR: An online predictive control strategy for series-parallel plug-in hybrid electric vehicles (PHEVs) is investigated, resulting in a novel online optimization methodology named the dual-loop online intelligent programming (DOIP) that is proposed for velocity prediction and energy-flow control.

Journal ArticleDOI
01 Dec 2019-Energy
TL;DR: A hybrid optimization method that combines the GA and dynamic programming (DP) is proposed that increases the overall performance and the computing time is acceptable for the scheduling of the energy system.