scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 2018"


Journal ArticleDOI
TL;DR: Simulations show that this approach is able to dramatically enhance the scalability of task admission at a marginal cost of extra energy, as compared with the optimal branch and bound method, and can be efficiently implemented for online programming.
Abstract: Task admission is critical to delay-sensitive applications in mobile edge computing, but is technically challenging due to its combinatorial mixed nature and consequently limited scalability. We propose an asymptotically optimal task admission approach which is able to guarantee task delays and achieve $(1-\epsilon)$ -approximation of the computationally prohibitive maximum energy saving at a time-complexity linearly scaling with devices. $\epsilon $ is linear to the quantization interval of energy. The key idea is to transform the mixed integer programming of task admission to an integer programming (IP) problem with the optimal substructure by pre-admitting resource-restrained devices. Another important aspect is a new quantized dynamic programming algorithm which we develop to exploit the optimal substructure and solve the IP. The quantization interval of energy is optimized to achieve an $[\mathcal {O}(\epsilon),\mathcal {O}(1/\epsilon)]$ -tradeoff between the optimality loss and time complexity of the algorithm. Simulations show that our approach is able to dramatically enhance the scalability of task admission at a marginal cost of extra energy, as compared with the optimal branch and bound method, and can be efficiently implemented for online programming.

163 citations


Journal ArticleDOI
TL;DR: This work proposes a dynamic programming approach that takes advantage of the nested structure of the battery storage problem by solving smaller subproblems with reduced state spaces, over different time scales.
Abstract: We are interested in optimizing the use of battery storage for multiple applications, in particular energy arbitrage and frequency regulation. The nature of this problem requires the battery to make charging and discharging decisions at different time scales while accounting for the stochastic information such as load demand, electricity prices, and regulation signals. Solving the problem for even a single-day operation would be computationally intractable due to the large state space and the number of time steps. We propose a dynamic programming approach that takes advantage of the nested structure of the problem by solving smaller subproblems with reduced state spaces, over different time scales.

163 citations


Journal ArticleDOI
TL;DR: It is theoretically proved that the iterative value function sequence strictly converges to the solution of the coupled Hamilton–Jacobi–Bellman equation and a novel online iterative scheme is proposed, which runs based on the data sampled from the augmented system and the gradient of the value function.
Abstract: This paper focuses on the distributed optimal cooperative control for continuous-time nonlinear multiagent systems (MASs) with completely unknown dynamics via adaptive dynamic programming (ADP) technology. By introducing predesigned extra compensators, the augmented neighborhood error systems are derived, which successfully circumvents the system knowledge requirement for ADP. It is revealed that the optimal consensus protocols actually work as the solutions of the MAS differential game. Policy iteration algorithm is adopted, and it is theoretically proved that the iterative value function sequence strictly converges to the solution of the coupled Hamilton–Jacobi–Bellman equation. Based on this point, a novel online iterative scheme is proposed, which runs based on the data sampled from the augmented system and the gradient of the value function. Neural networks are employed to implement the algorithm and the weights are updated, in the least-square sense, to the ideal value, which yields approximated optimal consensus protocols. Finally, a numerical example is given to illustrate the effectiveness of the proposed scheme.

142 citations


Journal ArticleDOI
01 Jul 2018
TL;DR: A novel dynamic programming (DP) algorithm is designed to accelerate the insertion operation from cubic or quadric time in previous work to only linear time, and on basis of the DP algorithm, a greedy based solution to the URPSM problem is proposed.
Abstract: There has been a dramatic growth of shared mobility applications such as ride-sharing, food delivery and crowdsourced parcel delivery. Shared mobility refers to transportation services that are shared among users, where a central issue is route planning. Given a set of workers and requests, route planning finds for each worker a route, i.e., a sequence of locations to pick up and drop off passengers/parcels that arrive from time to time, with different optimization objectives. Previous studies lack practicability due to their conflicted objectives and inefficiency in inserting a new request into a route, a basic operation called insertion. In this paper, we present a unified formulation of route planning called URPSM. It has a well-defined parameterized objective function which eliminates the contradicted objectives in previous studies and enables flexible multi-objective route planning for shared mobility. We prove the problem is NP-hard and there is no polynomial-time algorithm with constant competitive ratio for the URPSM problem and its variants. In response, we devise an effective and efficient solution to address the URPSM problem approximately. We design a novel dynamic programming (DP) algorithm to accelerate the insertion operation from cubic or quadric time in previous work to only linear time. On basis of the DP algorithm, we propose a greedy based solution to the URPSM problem. Experimental results on real datasets show that our solution outperforms the state-of-the-arts by 1.2 to 12.8 times in effectiveness, and also runs 2.6 to 20.7 times faster.

138 citations


Journal ArticleDOI
TL;DR: Monotonicity of the local value iteration ADP algorithm is presented, which shows that under some special conditions of the initial value function and the learning rate function, the iterative value function can monotonically converge to the optimum.
Abstract: In this paper, convergence properties are established for the newly developed discrete-time local value iteration adaptive dynamic programming (ADP) algorithm. The present local iterative ADP algorithm permits an arbitrary positive semidefinite function to initialize the algorithm. Employing a state-dependent learning rate function, for the first time, the iterative value function and iterative control law can be updated in a subset of the state space instead of the whole state space, which effectively relaxes the computational burden. A new analysis method for the convergence property is developed to prove that the iterative value functions will converge to the optimum under some mild constraints. Monotonicity of the local value iteration ADP algorithm is presented, which shows that under some special conditions of the initial value function and the learning rate function, the iterative value function can monotonically converge to the optimum. Finally, three simulation examples and comparisons are given to illustrate the performance of the developed algorithm.

128 citations


Journal ArticleDOI
TL;DR: Stability analysis shows that the system in closed-loop with the developed control policy is leader-to-formation stable, with guaranteed robustness to unmeasurable leader disturbance.
Abstract: This note proposes a novel data-driven solution to the cooperative adaptive optimal control problem of leader-follower multiagent systems under switching network topology. The dynamics of all the followers are unknown, and the leader is modeled by a perturbed exosystem. Through the combination of adaptive dynamic programming and internal model principle, an approximate optimal controller is iteratively learned online using real-time input-state data. Rigorous stability analysis shows that the system in closed-loop with the developed control policy is leader-to-formation stable, with guaranteed robustness to unmeasurable leader disturbance. Numerical results illustrate the effectiveness of the proposed data-driven algorithm.

126 citations


Journal ArticleDOI
TL;DR: A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the higher and lower optimums of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required.
Abstract: In this paper, a novel adaptive dynamic programming (ADP) algorithm, called “iterative zero-sum ADP algorithm,” is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.

114 citations


Journal ArticleDOI
TL;DR: An assessment study of a novel approach that combines discrete state-space Dynamic Programming and Pontryagin's Maximum Principle for online optimal control of hybrid electric vehicles (HEV) yields a close-to-optimal solution by solving the optimal control problem over one hundred thousand times faster than the benchmark method.
Abstract: An assessment study of a novel approach is presented that combines discrete state-space Dynamic Programming and Pontryagin's Maximum Principle for online optimal control of hybrid electric vehicles (HEV). In addition to electric energy storage, engine state and gear, kinetic energy, and travel time are considered states in this paper. After presenting the corresponding model using a parallel HEV as an example, a benchmark method with Dynamic Programming is introduced which is used to show the solution quality of the novel approach. It is illustrated that the proposed method yields a close-to-optimal solution by solving the optimal control problem over one hundred thousand times faster than the benchmark method. Finally, a potential online usage is assessed by comparing solution quality and calculation time with regard to the quantization of the state space.

111 citations


Journal ArticleDOI
TL;DR: A novel policy iteration technique for solving positive semidefinite HJB equations with rigorous convergence analysis is proposed and a two-phase data-driven learning method is developed and implemented online by ADP.
Abstract: This paper proposes a novel data-driven control approach to address the problem of adaptive optimal tracking for a class of nonlinear systems taking the strict-feedback form. Adaptive dynamic programming (ADP) and nonlinear output regulation theories are integrated for the first time to compute an adaptive near-optimal tracker without any a priori knowledge of the system dynamics. Fundamentally different from adaptive optimal stabilization problems, the solution to a Hamilton-Jacobi–Bellman (HJB) equation, not necessarily a positive definite function, cannot be approximated through the existing iterative methods. This paper proposes a novel policy iteration technique for solving positive semidefinite HJB equations with rigorous convergence analysis. A two-phase data-driven learning method is developed and implemented online by ADP. The efficacy of the proposed adaptive optimal tracking control methodology is demonstrated via a Van der Pol oscillator with time-varying exogenous signals.

105 citations


Journal ArticleDOI
TL;DR: This paper presents an adaptive large neighborhood search which is enhanced by local search and dynamic programming components, and derives new penalty functions for time-efficient neighborhood evaluation, and assess the competitiveness of the algorithm on the electric vehicle routing problem with time windows for full and partial recharging, and derive new best known solutions for both problem variants.
Abstract: Recent research on location-routing problems has been focusing on locating facilities as the starting and end point of routes. In this paper, we investigate a new type of location-routing problem. In the location-routing problem with intra-route facilities, the location of depots is known, whereas the location of facilities for intermediate stops has to be determined to keep vehicles operational. We present an adaptive large neighborhood search which is enhanced by local search and dynamic programming components, and derive new penalty functions for time-efficient neighborhood evaluation. We show that this algorithm is suitable for solving various problems with intra-route facilities by deriving new best known solutions for the recently published electric location-routing problem with time windows and partial recharging, as well as for the battery swap station electric vehicle location-routing problem. Additionally, we create new real-world benchmark instances and show results as well. Furthermore, we assess the competitiveness of our algorithm on the electric vehicle routing problem with time windows for full and partial recharging, and derive new best known solutions for both problem variants. The online appendix is available at https://doi.org/10.1287/trsc.2017.0746 .

103 citations


Journal ArticleDOI
TL;DR: The MPSO algorithm incorporated a diversification and a local search strategy into a basic particle swarm optimization algorithm to minimize the maximum lateness for the single batch-processing machine problem with non-identical job sizes and release dates.

Journal ArticleDOI
TL;DR: A faster semi-global alignment algorithm, “difference recurrence relations,” that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1, and will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages.
Abstract: The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith–Waterman–Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, “difference recurrence relations,” that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .

Journal ArticleDOI
TL;DR: The results show that the proposed method not only has the ability to solve the above four dynamic programming problems, but also has much shorter computation time and higher calculation accuracy when compared with the calculation results of Basic Dynamic Programming and Level-Set Dynamic Programming.

Journal ArticleDOI
TL;DR: A 3D dynamic programming based ship voyage optimization method, aiming to select the optimal path and speed profile for a ship voyage on the basis of weather forecast maps, carried out in a discretized space-time domain.

Proceedings Article
11 Feb 2018
TL;DR: Theoretically, this work provides a new probabilistic perspective on backpropagating through these DP operators, and relates them to inference in graphical models, and derives two particular instantiations of the framework, a smoothed Viterbi algorithm for sequence prediction and a smoothing DTW algorithm for time-series alignment.
Abstract: Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, many DP algorithms are non-differentiable, which hampers their use as a layer in neural networks trained by back-propagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combina-torial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically , we provide a new probabilistic perspective on backpropagating through these DP operators , and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We show-case these instantiations on structured prediction (audio-to-score alignment, NER) and on structured and sparse attention for translation.

Journal ArticleDOI
TL;DR: A multistage stochastic programming formulation of a transmission-constrained economic dispatch subject to multiarea renewable production uncertainty, with a focus on optimizing the dispatch of storage in real-time operations is presented.
Abstract: This paper presents a multistage stochastic programming formulation of a transmission-constrained economic dispatch subject to multiarea renewable production uncertainty, with a focus on optimizing the dispatch of storage in real-time operations. This problem is resolved using stochastic dual dynamic programming. The applicability of the proposed approach is demonstrated on a realistic case study of the German power system calibrated against the solar and wind power integration levels of 2013–2014, with a 24-h horizon and 15-min time step. The value of the stochastic solution relative to the cost of a deterministic policy amounts to 1.1%, while the value of perfect foresight relative to the cost of the stochastic programming policy amounts to 0.8%. The relative performance of various alternative real-time dispatch policies is analyzed, and the sensitivity of the results is explored.

Journal ArticleDOI
TL;DR: An adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure to design the adaptive constrained optimal controller based on the gradient descent scheme.
Abstract: Reinforcement learning has proved to be a powerful tool to solve optimal control problems over the past few years. However, the data-based constrained optimal control problem of nonaffine nonlinear discrete-time systems has rarely been studied yet. To solve this problem, an adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure. Most of the existing constrained control methods require the use of a certain performance index and only suit for linear or affine nonlinear systems, which is unreasonable in practice. To overcome this problem, the system transformation is first introduced with the general performance index. Then, the constrained optimal control problem is converted to an unconstrained optimal control problem. By introducing the action-state value function, i.e., Q-function, the VIQL algorithm is proposed to learn the optimal Q-function of the data-based unconstrained optimal control problem. The convergence results of the VIQL algorithm are established with an easy-to-realize initial condition $Q^{(0)}(x,a)\geqslant 0 $ . To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.

Journal ArticleDOI
TL;DR: This work considers a general formulation of the principal–agent problem with a lump-sum payment on a finite horizon, providing a systematic method for solving such problems, and relies on the backward stochastic differential equations approach to non-Markovian Stochastic control.
Abstract: We consider a general formulation of the principal–agent problem with a lump-sum payment on a finite horizon, providing a systematic method for solving such problems. Our approach is the following. We first find the contract that is optimal among those for which the agent’s value process allows a dynamic programming representation, in which case the agent’s optimal effort is straightforward to find. We then show that the optimization over this restricted family of contracts represents no loss of generality. As a consequence, we have reduced a non-zero-sum stochastic differential game to a stochastic control problem which may be addressed by standard tools of control theory. Our proofs rely on the backward stochastic differential equations approach to non-Markovian stochastic control, and more specifically on the recent extensions to the second order case.

Journal ArticleDOI
TL;DR: A distributed control scheme for an interconnected system composed of uncertain input affine nonlinear subsystems with event triggered state feedback is presented by using a novel hybrid learning scheme-based approximate dynamic programming with online exploration.
Abstract: In this paper, a distributed control scheme for an interconnected system composed of uncertain input affine nonlinear subsystems with event triggered state feedback is presented by using a novel hybrid learning scheme-based approximate dynamic programming with online exploration. First, an approximate solution to the Hamilton-Jacobi–Bellman equation is generated with event sampled neural network (NN) approximation and subsequently, a near optimal control policy for each subsystem is derived. Artificial NNs are utilized as function approximators to develop a suite of identifiers and learn the dynamics of each subsystem. The NN weight tuning rules for the identifier and event-triggering condition are derived using Lyapunov stability theory. Taking into account, the effects of NN approximation of system dynamics and boot-strapping, a novel NN weight update is presented to approximate the optimal value function. Finally, a novel strategy to incorporate exploration in online control framework, using identifiers, is introduced to reduce the overall cost at the expense of additional computations during the initial online learning phase. System states and the NN weight estimation errors are regulated and local uniformly ultimately bounded results are achieved. The analytical results are substantiated using simulation studies.

Journal ArticleDOI
TL;DR: An extensive computational study reveals significant cost savings as compared to myopic and non-storage policies, as well as policies obtained using a two-stage SP model, and demonstrates the scalability of the solution procedure.
Abstract: A microgrid is a small-scale version of a centralized power grid that generates, distributes and regulates electricity flow to local entities using distributed generation and the main grid. Distributed energy storage systems can be used to mitigate adverse effects of intermittent renewable sources in a microgrid in which operators dynamically adjust electricity procurement and storage decisions in response to randomly-evolving demand, renewable supply and pricing information. We formulate a multistage stochastic programming (SP) model whose objective is to minimize the expected total energy costs incurred within a microgrid over a finite planning horizon. The model prescribes the amount of energy to procure, store and discharge in each decision stage of the horizon. However, for even a moderate number of stages, the model is computationally intractable; therefore, we customize the stochastic dual dynamic programming (SDDP) algorithm to obtain high-quality approximate solutions. Computation times and optimization gaps are significantly reduced by implementing a dynamic cut selection procedure and a lower bound improvement scheme within the SDDP framework. An extensive computational study reveals significant cost savings as compared to myopic and non-storage policies, as well as policies obtained using a two-stage SP model. The study also demonstrates the scalability of our solution procedure.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the stochastic optimal control problem of McKean-Vlasov stochastically differential equation where the coefficients may depend upon the joint law of the state and control.
Abstract: We consider the stochastic optimal control problem of McKean-Vlasov stochastic differential equation where the coefficients may depend upon the joint law of the state and control. By using feedback controls, we reformulate the problem into a deterministic control problem with only the marginal distribution of the process as controlled state variable, and prove that dynamic programming principle holds in its general form. Then, by relying on the notion of differentiability with respect to pro\-bability measures recently introduced by P.L. Lions in [32], and a special Ito formula for flows of probability measures, we derive the (dynamic programming) Bellman equation for mean-field stochastic control problem, and prove a veri\-fication theorem in our McKean-Vlasov framework. We give explicit solutions to the Bellman equation for the linear quadratic mean-field control problem, with applications to the mean-variance portfolio selection and a systemic risk model. We also consider a notion of lifted visc-sity solutions for the Bellman equation, and show the viscosity property and uniqueness of the value function to the McKean-Vlasov control problem. Finally, we consider the case of McKean-Vlasov control problem with open-loop controls and discuss the associated dynamic programming equation that we compare with the case of closed-loop controls.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a continuous-time, rolling horizon formulation for the microgrid energy management problem by introducing a continuous time, continuous horizon formulation, which allows for an accurate modeling and is computationally cheap.
Abstract: —We propose a novel method for the microgrid energy management problem by introducing a continuous-time, rolling horizon formulation. The energy management problem is formulated as a deterministic optimal control problem (OCP). We solve (OCP) with two classical approaches: the direct method [1], and Bellman's Dynamic Programming Principle (DPP) [2]. In both cases we use the optimal control toolbox BOCOP [3] for the numerical simulations. For the DPP approach we implement a semi-Lagrangian scheme [4] adapted to handle the optimization of switching times for the on/off modes of the diesel generator. The DPP approach allows for an accurate modeling and is computationally cheap. It finds the global optimum in less than 3 seconds, a CPU time similar to the Mixed Integer Linear Programming (MILP) approach used in [5]. We achieve this performance by introducing a trick based on the Pontryagin Maximum Principle (PMP). The trick increases the computation speed by several orders and also improves the precision of the solution. For validation purposes, simulation are performed using datasets from an actual isolated microgrid located in northern Chile. Results show that DPP method is very well suited for this type of problem when compared with the MILP approach.

Journal ArticleDOI
TL;DR: A unified dynamic programming model and its solution method to solve the problems for fuel cell electric vehicles and the results demonstrate that the proposed method is much better than Basic Dynamic Programming and Level-Set Dynamic Programming in both calculation time and computation accuracy.

Journal ArticleDOI
TL;DR: The proposed DP-based EMS with prediction horizon can offer an effective solution for the PHEV applying it online in the trip path without GPS data and can further reduce the fuel consumption and emissions.
Abstract: Plug-in hybrid electric vehicles (PHEVs) with fuel and electricity have demonstrated the capability to reduce fuel consumption and emissions by adopting appropriate energy management strategies. In the existing energy management strategies, the dynamic programming (DP)-based energy management strategy (EMS) can realize the global optimization of the fuel consumption if the global vehicle-speed trajectory is known in advance. The global vehicle-speed trajectory can be obtained by applying GPS data of vehicles when the trip path is determined. However, for a trip path without GPS data, the global vehicle-speed trajectory is difficult to be gained. In this case, the DP-based EMS cannot be utilized to achieve the globally optimal fuel consumption, which is the issue discussed in this paper. This paper makes the following two contributions to solve this issue. First of all, the cell transmission model of the road traffic flow and the vehicle kinematics are introduced to obtain the traffic speeds of road segments and the accelerations of the PHEV. On this basis, a hybrid trip model is presented to obtain the vehicle-speed trajectory for the trip path without GPS data. Next, a DP-based EMS with prediction horizon is proposed, and moreover, in order to improve its real-time implementation, a search range optimization algorithm of the state of charge (SOC) is designed to reduce the computational load of DP. In summary, we propose a computation-optimized DP-based EMS through applying the hybrid trip model. Finally, a simulation study is conducted for applying the proposed EMS to a practical trip path in Beijing road network. The results show that the hybrid trip model can effectively construct the vehicle-speed trajectory online, and the average accuracy of the vehicle-speed trajectory is more than 78%. In addition, compared with the existing optimization algorithm for DP calculation, the SOC search range optimization algorithm can further reduce the calculation load of DP. More importantly, compared to the globally optimal DP-based EMS, although the proposed EMS makes the fuel consumption grow less than 5.36%, it can be implemented in real time. Moreover, compared with the existing real-time strategies, it can further reduce the fuel consumption and emissions. Thus, the proposed EMS can offer an effective solution for the PHEV applying it online in the trip path without GPS data.

Journal ArticleDOI
TL;DR: A data-based finite-horizon optimal control approach for discrete-time nonlinear affine systems and the convergence of the iterative ADP algorithm and the stability of the weight estimation errors based on the HDP structure are intensively analyzed.
Abstract: This paper presents a data-based finite-horizon optimal control approach for discrete-time nonlinear affine systems. The iterative adaptive dynamic programming (ADP) is used to approximately solve Hamilton-Jacobi–Bellman equation by minimizing the cost function in finite time. The idea is implemented with the heuristic dynamic programming (HDP) involved the model network, which makes the iterative control at the first step can be obtained without the system function, meanwhile the action network is used to obtain the approximate optimal control law and the critic network is utilized for approximating the optimal cost function. The convergence of the iterative ADP algorithm and the stability of the weight estimation errors based on the HDP structure are intensively analyzed. Finally, two simulation examples are provided to demonstrate the theoretical results and show the performance of the proposed method.

Journal ArticleDOI
TL;DR: A simultaneous policy iteration (SPI) algorithm is developed to solve the optimal regulation problem within the framework of adaptive dynamic programming, and actor and critic networks are employed to approximate the optimal control and the optimal value function.
Abstract: This paper presents a novel robust regulation method for a class of continuous-time nonlinear systems subject to unmatched perturbations. To begin with, the robust regulation problem is transformed into an optimal regulation problem by constructing a value function for the auxiliary system. Then, a simultaneous policy iteration (SPI) algorithm is developed to solve the optimal regulation problem within the framework of adaptive dynamic programming. To implement the SPI algorithm, actor and critic networks are employed to approximate the optimal control and the optimal value function, respectively, and the Monte Carlo integration method is applied to obtain the unknown weight parameters. Finally, two examples, including a power system, are provided to demonstrate the applicability of the developed approach.

Journal ArticleDOI
TL;DR: A dynamic discount factor embedded in the iterative Bellman equation is proposed to prevent from a biased estimation of action-value function due to the effects of inconstant time step interval and shows that the trained agent outperforms a fixed timing plan in all testing cases with reducing system total delay by 20%.
Abstract: Under efficiency improvement of road networks by utilizing advanced traffic signal control methods, intelligent transportation systems intend to characterize a smart city. Recently, due to significant progress in artificial intelligence, machine learning-based framework of adaptive traffic signal control has been highly concentrated. In particular, deep Q-learning neural network is a model-free technique and can be applied to optimal action selection problems. However, setting variable green time is a key mechanism to reflect traffic fluctuations such that time steps need not be fixed intervals in reinforcement learning framework. In this study, the authors proposed a dynamic discount factor embedded in the iterative Bellman equation to prevent from a biased estimation of action-value function due to the effects of inconstant time step interval. Moreover, action is added to the input layer of the neural network in the training process, and the output layer is the estimated action-value for the denoted action. Then, the trained neural network can be used to generate action that leads to an optimal estimated value within a finite set as the agents' policy. The preliminary results show that the trained agent outperforms a fixed timing plan in all testing cases with reducing system total delay by 20%..

Journal ArticleDOI
TL;DR: This work formulate the fully sequential sampling and selection decision in statistical ranking and selection as a stochastic control problem as a Bayesian framework, and derives an approximately optimal allocation policy that possesses both one-step-ahead and asymptotic optimality for independent normal sampling distributions.
Abstract: Under a Bayesian framework, we formulate the fully sequential sampling and selection decision in statistical ranking and selection as a stochastic control problem, and derive the associated Bellman equation. Using a value function approximation, we derive an approximately optimal allocation policy. We show that this policy is not only computationally efficient but also possesses both one-step-ahead and asymptotic optimality for independent normal sampling distributions. Moreover, the proposed allocation policy is easily generalizable in the approximate dynamic programming paradigm.

Journal ArticleDOI
TL;DR: An intuitive threshold method and a dynamic programming approach are proposed for scheduling jobs and PMs under a given job sequence and a scatter simulated annealing algorithm in which a scatter-search mechanism leads SSA to explore more potential solutions is developed.

Journal ArticleDOI
TL;DR: An approximate solution to the ${H_\infty }$ optimal control of polynomial nonlinear systems is proposed.
Abstract: Sum of squares (SOS) polynomials have provided a computationally tractable way to deal with inequality constraints appearing in many control problems. It can also act as an approximator in the framework of adaptive dynamic programming. In this paper, an approximate solution to the ${H_\infty }$ optimal control of polynomial nonlinear systems is proposed. Under a given attenuation coefficient, the Hamilton–Jacobi–Isaacs equation is relaxed to an optimization problem with a set of inequalities. After applying the policy iteration technique and constraining inequalities to SOS, the optimization problem is divided into a sequence of feasible semidefinite programming problems. With the converged solution, the attenuation coefficient is further minimized to a lower value. After iterations, approximate solutions to the smallest ${L_{2}}$ -gain and the associated ${H_\infty }$ optimal controller are obtained. Four examples are employed to verify the effectiveness of the proposed algorithm.