Showing papers on "Dynamic programming published in 2018"

PDF

Open Access

Journal Article•DOI•

Energy-Efficient Admission of Delay-Sensitive Tasks for Mobile Edge Computing

[...]

Xinchen Lyu¹, Hui Tian¹, Wei Ni, Yan Zhang², Ping Zhang¹, Ren Ping Liu³ - Show less +2 more•Institutions (3)

Beijing University of Posts and Telecommunications¹, University of Oslo², University of Technology, Sydney³

30 Jan 2018-IEEE Transactions on Communications

TL;DR: Simulations show that this approach is able to dramatically enhance the scalability of task admission at a marginal cost of extra energy, as compared with the optimal branch and bound method, and can be efficiently implemented for online programming.

...read moreread less

Abstract: Task admission is critical to delay-sensitive applications in mobile edge computing, but is technically challenging due to its combinatorial mixed nature and consequently limited scalability. We propose an asymptotically optimal task admission approach which is able to guarantee task delays and achieve $(1-\epsilon)$ -approximation of the computationally prohibitive maximum energy saving at a time-complexity linearly scaling with devices. $\epsilon $ is linear to the quantization interval of energy. The key idea is to transform the mixed integer programming of task admission to an integer programming (IP) problem with the optimal substructure by pre-admitting resource-restrained devices. Another important aspect is a new quantized dynamic programming algorithm which we develop to exploit the optimal substructure and solve the IP. The quantization interval of energy is optimized to achieve an $[\mathcal {O}(\epsilon),\mathcal {O}(1/\epsilon)]$ -tradeoff between the optimality loss and time complexity of the algorithm. Simulations show that our approach is able to dramatically enhance the scalability of task admission at a marginal cost of extra energy, as compared with the optimal branch and bound method, and can be efficiently implemented for online programming.

...read moreread less

163 citations

Journal Article•DOI•

Co-Optimizing Battery Storage for the Frequency Regulation and Energy Arbitrage Using Multi-Scale Dynamic Programming

[...]

Bolong Cheng¹, Warren B. Powell¹•Institutions (1)

Princeton University¹

01 May 2018-IEEE Transactions on Smart Grid

TL;DR: This work proposes a dynamic programming approach that takes advantage of the nested structure of the battery storage problem by solving smaller subproblems with reduced state spaces, over different time scales.

...read moreread less

Abstract: We are interested in optimizing the use of battery storage for multiple applications, in particular energy arbitrage and frequency regulation. The nature of this problem requires the battery to make charging and discharging decisions at different time scales while accounting for the stochastic information such as load demand, electricity prices, and regulation signals. Solving the problem for even a single-day operation would be computationally intractable due to the large state space and the number of time steps. We propose a dynamic programming approach that takes advantage of the nested structure of the problem by solving smaller subproblems with reduced state spaces, over different time scales.

...read moreread less

163 citations

Journal Article•DOI•

Distributed Optimal Consensus Control for Nonlinear Multiagent System With Unknown Dynamic

[...]

Jilie Zhang¹, Huaguang Zhang², Tao Feng¹•Institutions (2)

Southwest Jiaotong University¹, Northeastern University (China)²

01 Aug 2018-IEEE Transactions on Neural Networks

TL;DR: It is theoretically proved that the iterative value function sequence strictly converges to the solution of the coupled Hamilton–Jacobi–Bellman equation and a novel online iterative scheme is proposed, which runs based on the data sampled from the augmented system and the gradient of the value function.

...read moreread less

Abstract: This paper focuses on the distributed optimal cooperative control for continuous-time nonlinear multiagent systems (MASs) with completely unknown dynamics via adaptive dynamic programming (ADP) technology. By introducing predesigned extra compensators, the augmented neighborhood error systems are derived, which successfully circumvents the system knowledge requirement for ADP. It is revealed that the optimal consensus protocols actually work as the solutions of the MAS differential game. Policy iteration algorithm is adopted, and it is theoretically proved that the iterative value function sequence strictly converges to the solution of the coupled Hamilton–Jacobi–Bellman equation. Based on this point, a novel online iterative scheme is proposed, which runs based on the data sampled from the augmented system and the gradient of the value function. Neural networks are employed to implement the algorithm and the weights are updated, in the least-square sense, to the ideal value, which yields approximated optimal consensus protocols. Finally, a numerical example is given to illustrate the effectiveness of the proposed scheme.

...read moreread less

142 citations

Journal Article•DOI•

A unified approach to route planning for shared mobility

[...]

Yongxin Tong¹, Yuxiang Zeng², Zimu Zhou³, Lei Chen², Jieping Ye⁴, Ke Xu¹ - Show less +2 more•Institutions (4)

Beihang University¹, Hong Kong University of Science and Technology², ETH Zurich³, DiDi⁴

01 Jul 2018

TL;DR: A novel dynamic programming (DP) algorithm is designed to accelerate the insertion operation from cubic or quadric time in previous work to only linear time, and on basis of the DP algorithm, a greedy based solution to the URPSM problem is proposed.

...read moreread less

Abstract: There has been a dramatic growth of shared mobility applications such as ride-sharing, food delivery and crowdsourced parcel delivery. Shared mobility refers to transportation services that are shared among users, where a central issue is route planning. Given a set of workers and requests, route planning finds for each worker a route, i.e., a sequence of locations to pick up and drop off passengers/parcels that arrive from time to time, with different optimization objectives. Previous studies lack practicability due to their conflicted objectives and inefficiency in inserting a new request into a route, a basic operation called insertion. In this paper, we present a unified formulation of route planning called URPSM. It has a well-defined parameterized objective function which eliminates the contradicted objectives in previous studies and enables flexible multi-objective route planning for shared mobility. We prove the problem is NP-hard and there is no polynomial-time algorithm with constant competitive ratio for the URPSM problem and its variants. In response, we devise an effective and efficient solution to address the URPSM problem approximately. We design a novel dynamic programming (DP) algorithm to accelerate the insertion operation from cubic or quadric time in previous work to only linear time. On basis of the DP algorithm, we propose a greedy based solution to the URPSM problem. Experimental results on real datasets show that our solution outperforms the state-of-the-arts by 1.2 to 12.8 times in effectiveness, and also runs 2.6 to 20.7 times faster.

...read moreread less

138 citations

Journal Article•DOI•

Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis

[...]

Qinglai Wei¹, Frank L. Lewis², Derong Liu³, Ruizhuo Song³, Hanquan Lin¹ - Show less +1 more•Institutions (3)

Chinese Academy of Sciences¹, University of Texas at Arlington², University of Science and Technology Beijing³

01 Jun 2018-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Monotonicity of the local value iteration ADP algorithm is presented, which shows that under some special conditions of the initial value function and the learning rate function, the iterative value function can monotonically converge to the optimum.

...read moreread less

Abstract: In this paper, convergence properties are established for the newly developed discrete-time local value iteration adaptive dynamic programming (ADP) algorithm. The present local iterative ADP algorithm permits an arbitrary positive semidefinite function to initialize the algorithm. Employing a state-dependent learning rate function, for the first time, the iterative value function and iterative control law can be updated in a subset of the state space instead of the whole state space, which effectively relaxes the computational burden. A new analysis method for the convergence property is developed to prove that the iterative value functions will converge to the optimum under some mild constraints. Monotonicity of the local value iteration ADP algorithm is presented, which shows that under some special conditions of the initial value function and the learning rate function, the iterative value function can monotonically converge to the optimum. Finally, three simulation examples and comparisons are given to illustrate the performance of the developed algorithm.

...read moreread less

128 citations

Journal Article•DOI•

Leader-to-Formation Stability of Multiagent Systems: An Adaptive Optimal Control Approach

[...]

Weinan Gao¹, Zhong-Ping Jiang², Frank L. Lewis³, Yebin Wang⁴•Institutions (4)

Georgia Southern University¹, New York University², University of Texas at Arlington³, Mitsubishi Electric Research Laboratories⁴

30 Jan 2018-IEEE Transactions on Automatic Control

TL;DR: Stability analysis shows that the system in closed-loop with the developed control policy is leader-to-formation stable, with guaranteed robustness to unmeasurable leader disturbance.

...read moreread less

Abstract: This note proposes a novel data-driven solution to the cooperative adaptive optimal control problem of leader-follower multiagent systems under switching network topology. The dynamics of all the followers are unknown, and the leader is modeled by a perturbed exosystem. Through the combination of adaptive dynamic programming and internal model principle, an approximate optimal controller is iteratively learned online using real-time input-state data. Rigorous stability analysis shows that the system in closed-loop with the developed control policy is leader-to-formation stable, with guaranteed robustness to unmeasurable leader disturbance. Numerical results illustrate the effectiveness of the proposed data-driven algorithm.

...read moreread less

126 citations

Journal Article•DOI•

Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games

[...]

Qinglai Wei¹, Derong Liu², Qiao Lin¹, Ruizhuo Song²•Institutions (2)

Chinese Academy of Sciences¹, University of Science and Technology Beijing²

01 Apr 2018-IEEE Transactions on Neural Networks

TL;DR: A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the higher and lower optimums of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required.

...read moreread less

Abstract: In this paper, a novel adaptive dynamic programming (ADP) algorithm, called “iterative zero-sum ADP algorithm,” is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.

...read moreread less

114 citations

Journal Article•DOI•

Optimal Energy Management and Velocity Control of Hybrid Electric Vehicles

[...]

Stephan Uebel¹, Nikolce Murgovski², Conny Tempelhahn¹, Bernard Bäker¹•Institutions (2)

Dresden University of Technology¹, Chalmers University of Technology²

01 Jan 2018-IEEE Transactions on Vehicular Technology

TL;DR: An assessment study of a novel approach that combines discrete state-space Dynamic Programming and Pontryagin's Maximum Principle for online optimal control of hybrid electric vehicles (HEV) yields a close-to-optimal solution by solving the optimal control problem over one hundred thousand times faster than the benchmark method.

...read moreread less

Abstract: An assessment study of a novel approach is presented that combines discrete state-space Dynamic Programming and Pontryagin's Maximum Principle for online optimal control of hybrid electric vehicles (HEV). In addition to electric energy storage, engine state and gear, kinetic energy, and travel time are considered states in this paper. After presenting the corresponding model using a parallel HEV as an example, a benchmark method with Dynamic Programming is introduced which is used to show the solution quality of the novel approach. It is illustrated that the proposed method yields a close-to-optimal solution by solving the optimal control problem over one hundred thousand times faster than the benchmark method. Finally, a potential online usage is assessed by comparing solution quality and calculation time with regard to the quantization of the state space.

...read moreread less

111 citations

Journal Article•DOI•

Learning-Based Adaptive Optimal Tracking Control of Strict-Feedback Nonlinear Systems

[...]

Weinan Gao¹, Zhong-Ping Jiang²•Institutions (2)

Georgia Southern University¹, New York University²

01 Jun 2018-IEEE Transactions on Neural Networks

TL;DR: A novel policy iteration technique for solving positive semidefinite HJB equations with rigorous convergence analysis is proposed and a two-phase data-driven learning method is developed and implemented online by ADP.

...read moreread less

Abstract: This paper proposes a novel data-driven control approach to address the problem of adaptive optimal tracking for a class of nonlinear systems taking the strict-feedback form. Adaptive dynamic programming (ADP) and nonlinear output regulation theories are integrated for the first time to compute an adaptive near-optimal tracker without any a priori knowledge of the system dynamics. Fundamentally different from adaptive optimal stabilization problems, the solution to a Hamilton-Jacobi–Bellman (HJB) equation, not necessarily a positive definite function, cannot be approximated through the existing iterative methods. This paper proposes a novel policy iteration technique for solving positive semidefinite HJB equations with rigorous convergence analysis. A two-phase data-driven learning method is developed and implemented online by ADP. The efficacy of the proposed adaptive optimal tracking control methodology is demonstrated via a Van der Pol oscillator with time-varying exogenous signals.

...read moreread less

105 citations

Journal Article•DOI•

An Adaptive Large Neighborhood Search for the Location-routing Problem with Intra-route Facilities

[...]

Maximilian Schiffer¹, Grit Walther¹•Institutions (1)

RWTH Aachen University¹

01 Mar 2018-Transportation Science

TL;DR: This paper presents an adaptive large neighborhood search which is enhanced by local search and dynamic programming components, and derives new penalty functions for time-efficient neighborhood evaluation, and assess the competitiveness of the algorithm on the electric vehicle routing problem with time windows for full and partial recharging, and derive new best known solutions for both problem variants.

...read moreread less

Abstract: Recent research on location-routing problems has been focusing on locating facilities as the starting and end point of routes. In this paper, we investigate a new type of location-routing problem. In the location-routing problem with intra-route facilities, the location of depots is known, whereas the location of facilities for intermediate stops has to be determined to keep vehicles operational. We present an adaptive large neighborhood search which is enhanced by local search and dynamic programming components, and derive new penalty functions for time-efficient neighborhood evaluation. We show that this algorithm is suitable for solving various problems with intra-route facilities by deriving new best known solutions for the recently published electric location-routing problem with time windows and partial recharging, as well as for the battery swap station electric vehicle location-routing problem. Additionally, we create new real-world benchmark instances and show results as well. Furthermore, we assess the competitiveness of our algorithm on the electric vehicle routing problem with time windows for full and partial recharging, and derive new best known solutions for both problem variants. The online appendix is available at https://doi.org/10.1287/trsc.2017.0746 .

...read moreread less

103 citations

Journal Article•DOI•

A modified particle swarm optimization algorithm for a batch-processing machine scheduling problem with arbitrary release times and non-identical job sizes

[...]

Hongming Zhou¹, Jihong Pang¹, Ping-Kuo Chen, Fuh-Der Chou¹•Institutions (1)

Wenzhou University¹

01 Sep 2018-Computers & Industrial Engineering

TL;DR: The MPSO algorithm incorporated a diversification and a local search strategy into a basic particle swarm optimization algorithm to minimize the maximum lateness for the single batch-processing machine problem with non-identical job sizes and release dates.

...read moreread less

Journal Article•DOI•

Introducing difference recurrence relations for faster semi-global alignment of long sequences.

[...]

Hajime Suzuki¹, Masahiro Kasahara¹•Institutions (1)

University of Tokyo¹

19 Feb 2018-BMC Bioinformatics

TL;DR: A faster semi-global alignment algorithm, “difference recurrence relations,” that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1, and will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages.

...read moreread less

Abstract: The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith–Waterman–Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, “difference recurrence relations,” that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .

...read moreread less

Journal Article•DOI•

Dynamic programming for New Energy Vehicles based on their work modes part I: Electric Vehicles and Hybrid Electric Vehicles

[...]

Wei Zhou¹, Lin Yang¹, Yishan Cai¹, Tianxing Ying¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Dec 2018-Journal of Power Sources

TL;DR: The results show that the proposed method not only has the ability to solve the above four dynamic programming problems, but also has much shorter computation time and higher calculation accuracy when compared with the calculation results of Basic Dynamic Programming and Level-Set Dynamic Programming.

...read moreread less

Journal Article•DOI•

Ship voyage optimization for safe and energy-efficient navigation: A dynamic programming approach

[...]

Raphael Zaccone¹, E. Ottaviani, Massimo Figari¹, Marco Altosole¹•Institutions (1)

University of Genoa¹

01 Apr 2018-Ocean Engineering

TL;DR: A 3D dynamic programming based ship voyage optimization method, aiming to select the optimal path and speed profile for a ship voyage on the basis of weather forecast maps, carried out in a discretized space-time domain.

...read moreread less

Proceedings Article•

Differentiable Dynamic Programming for Structured Prediction and Attention

[...]

Arthur Mensch, Mathieu Blondel

11 Feb 2018

TL;DR: Theoretically, this work provides a new probabilistic perspective on backpropagating through these DP operators, and relates them to inference in graphical models, and derives two particular instantiations of the framework, a smoothed Viterbi algorithm for sequence prediction and a smoothing DTW algorithm for time-series alignment.

...read moreread less

Abstract: Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, many DP algorithms are non-differentiable, which hampers their use as a layer in neural networks trained by back-propagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combina-torial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically , we provide a new probabilistic perspective on backpropagating through these DP operators , and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We show-case these instantiations on structured prediction (audio-to-score alignment, NER) and on structured and sparse attention for translation.

...read moreread less

Journal Article•DOI•

Application of Stochastic Dual Dynamic Programming to the Real-Time Dispatch of Storage Under Renewable Supply Uncertainty

[...]

Anthony Papavasiliou¹, Yuting Mou¹, Léopold Cambier², Damien Scieur³•Institutions (3)

Université catholique de Louvain¹, Stanford University², École Normale Supérieure³

01 Apr 2018-IEEE Transactions on Sustainable Energy

TL;DR: A multistage stochastic programming formulation of a transmission-constrained economic dispatch subject to multiarea renewable production uncertainty, with a focus on optimizing the dispatch of storage in real-time operations is presented.

...read moreread less

Abstract: This paper presents a multistage stochastic programming formulation of a transmission-constrained economic dispatch subject to multiarea renewable production uncertainty, with a focus on optimizing the dispatch of storage in real-time operations. This problem is resolved using stochastic dual dynamic programming. The applicability of the proposed approach is demonstrated on a realistic case study of the German power system calibrated against the solar and wind power integration levels of 2013–2014, with a 24-h horizon and 15-min time step. The value of the stochastic solution relative to the cost of a deterministic policy amounts to 1.1%, while the value of perfect foresight relative to the cost of the stochastic programming policy amounts to 0.8%. The relative performance of various alternative real-time dispatch policies is analyzed, and the sensitivity of the results is explored.

...read moreread less

Journal Article•DOI•

Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure

[...]

Biao Luo¹, Derong Liu², Huai-Ning Wu³•Institutions (3)

Chinese Academy of Sciences¹, Guangdong University of Technology², Beihang University³

01 Jun 2018-IEEE Transactions on Neural Networks

TL;DR: An adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure to design the adaptive constrained optimal controller based on the gradient descent scheme.

...read moreread less

Abstract: Reinforcement learning has proved to be a powerful tool to solve optimal control problems over the past few years. However, the data-based constrained optimal control problem of nonaffine nonlinear discrete-time systems has rarely been studied yet. To solve this problem, an adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure. Most of the existing constrained control methods require the use of a certain performance index and only suit for linear or affine nonlinear systems, which is unreasonable in practice. To overcome this problem, the system transformation is first introduced with the general performance index. Then, the constrained optimal control problem is converted to an unconstrained optimal control problem. By introducing the action-state value function, i.e., Q-function, the VIQL algorithm is proposed to learn the optimal Q-function of the data-based unconstrained optimal control problem. The convergence results of the VIQL algorithm are established with an easy-to-realize initial condition $Q^{(0)}(x,a)\geqslant 0 $ . To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.

...read moreread less

Journal Article•DOI•

Dynamic Programming Approach to Principal-Agent Problems

[...]

Jakša Cvitanić¹, Dylan Possamaï², Nizar Touzi³•Institutions (3)

California Institute of Technology¹, Paris Dauphine University², École Polytechnique³

01 Jan 2018-Finance and Stochastics

TL;DR: This work considers a general formulation of the principal–agent problem with a lump-sum payment on a finite horizon, providing a systematic method for solving such problems, and relies on the backward stochastic differential equations approach to non-Markovian Stochastic control.

...read moreread less

Abstract: We consider a general formulation of the principal–agent problem with a lump-sum payment on a finite horizon, providing a systematic method for solving such problems. Our approach is the following. We first find the contract that is optimal among those for which the agent’s value process allows a dynamic programming representation, in which case the agent’s optimal effort is straightforward to find. We then show that the optimization over this restricted family of contracts represents no loss of generality. As a consequence, we have reduced a non-zero-sum stochastic differential game to a stochastic control problem which may be addressed by standard tools of control theory. Our proofs rely on the backward stochastic differential equations approach to non-Markovian stochastic control, and more specifically on the recent extensions to the second order case.

...read moreread less

Journal Article•DOI•

Event-Triggered Distributed Control of Nonlinear Interconnected Systems Using Online Reinforcement Learning With Exploration

[...]

Vignesh Narayanan¹, Sarangapani Jagannathan¹•Institutions (1)

Missouri University of Science and Technology¹

01 Sep 2018-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A distributed control scheme for an interconnected system composed of uncertain input affine nonlinear subsystems with event triggered state feedback is presented by using a novel hybrid learning scheme-based approximate dynamic programming with online exploration.

...read moreread less

Abstract: In this paper, a distributed control scheme for an interconnected system composed of uncertain input affine nonlinear subsystems with event triggered state feedback is presented by using a novel hybrid learning scheme-based approximate dynamic programming with online exploration. First, an approximate solution to the Hamilton-Jacobi–Bellman equation is generated with event sampled neural network (NN) approximation and subsequently, a near optimal control policy for each subsystem is derived. Artificial NNs are utilized as function approximators to develop a suite of identifiers and learn the dynamics of each subsystem. The NN weight tuning rules for the identifier and event-triggering condition are derived using Lyapunov stability theory. Taking into account, the effects of NN approximation of system dynamics and boot-strapping, a novel NN weight update is presented to approximate the optimal value function. Finally, a novel strategy to incorporate exploration in online control framework, using identifiers, is introduced to reduce the overall cost at the expense of additional computations during the initial online learning phase. System states and the NN weight estimation errors are regulated and local uniformly ultimately bounded results are achieved. The analytical results are substantiated using simulation studies.

...read moreread less

Journal Article•DOI•

Managing Energy Storage in Microgrids: A Multistage Stochastic Programming Approach

[...]

Arnab Bhattacharya¹, Jeffrey P. Kharoufeh¹, Bo Zeng¹•Institutions (1)

University of Pittsburgh¹

01 Jan 2018-IEEE Transactions on Smart Grid

TL;DR: An extensive computational study reveals significant cost savings as compared to myopic and non-storage policies, as well as policies obtained using a two-stage SP model, and demonstrates the scalability of the solution procedure.

...read moreread less

Abstract: A microgrid is a small-scale version of a centralized power grid that generates, distributes and regulates electricity flow to local entities using distributed generation and the main grid. Distributed energy storage systems can be used to mitigate adverse effects of intermittent renewable sources in a microgrid in which operators dynamically adjust electricity procurement and storage decisions in response to randomly-evolving demand, renewable supply and pricing information. We formulate a multistage stochastic programming (SP) model whose objective is to minimize the expected total energy costs incurred within a microgrid over a finite planning horizon. The model prescribes the amount of energy to procure, store and discharge in each decision stage of the horizon. However, for even a moderate number of stages, the model is computationally intractable; therefore, we customize the stochastic dual dynamic programming (SDDP) algorithm to obtain high-quality approximate solutions. Computation times and optimization gaps are significantly reduced by implementing a dynamic cut selection procedure and a lower bound improvement scheme within the SDDP framework. An extensive computational study reveals significant cost savings as compared to myopic and non-storage policies, as well as policies obtained using a two-stage SP model. The study also demonstrates the scalability of our solution procedure.

...read moreread less

Journal Article•DOI•

Bellman equation and viscosity solutions for mean-field stochastic control problem

[...]

Huyên Pham, Xiaoli Wei

01 Jan 2018-ESAIM: Control, Optimisation and Calculus of Variations

TL;DR: In this paper, the authors consider the stochastic optimal control problem of McKean-Vlasov stochastically differential equation where the coefficients may depend upon the joint law of the state and control.

...read moreread less

Abstract: We consider the stochastic optimal control problem of McKean-Vlasov stochastic differential equation where the coefficients may depend upon the joint law of the state and control. By using feedback controls, we reformulate the problem into a deterministic control problem with only the marginal distribution of the process as controlled state variable, and prove that dynamic programming principle holds in its general form. Then, by relying on the notion of differentiability with respect to pro\-bability measures recently introduced by P.L. Lions in [32], and a special Ito formula for flows of probability measures, we derive the (dynamic programming) Bellman equation for mean-field stochastic control problem, and prove a veri\-fication theorem in our McKean-Vlasov framework. We give explicit solutions to the Bellman equation for the linear quadratic mean-field control problem, with applications to the mean-variance portfolio selection and a systemic risk model. We also consider a notion of lifted visc-sity solutions for the Bellman equation, and show the viscosity property and uniqueness of the value function to the McKean-Vlasov control problem. Finally, we consider the case of McKean-Vlasov control problem with open-loop controls and discuss the associated dynamic programming equation that we compare with the case of closed-loop controls.

...read moreread less

Journal Article•DOI•

Continuous Optimal Control Approaches to Microgrid Energy Management

[...]

Benjamin Heymann¹, J. Frédéric Bonnans¹, Pierre Martinon¹, Francisco J. Silva², Fernando Lanas³, Guillermo Jimenez³ - Show less +2 more•Institutions (3)

École Polytechnique¹, Digital Management, Inc.², University of Santiago, Chile³

01 Jan 2018-Energy Systems

TL;DR: In this article, the authors proposed a continuous-time, rolling horizon formulation for the microgrid energy management problem by introducing a continuous time, continuous horizon formulation, which allows for an accurate modeling and is computationally cheap.

...read moreread less

Abstract: —We propose a novel method for the microgrid energy management problem by introducing a continuous-time, rolling horizon formulation. The energy management problem is formulated as a deterministic optimal control problem (OCP). We solve (OCP) with two classical approaches: the direct method [1], and Bellman's Dynamic Programming Principle (DPP) [2]. In both cases we use the optimal control toolbox BOCOP [3] for the numerical simulations. For the DPP approach we implement a semi-Lagrangian scheme [4] adapted to handle the optimization of switching times for the on/off modes of the diesel generator. The DPP approach allows for an accurate modeling and is computationally cheap. It finds the global optimum in less than 3 seconds, a CPU time similar to the Mixed Integer Linear Programming (MILP) approach used in [5]. We achieve this performance by introducing a trick based on the Pontryagin Maximum Principle (PMP). The trick increases the computation speed by several orders and also improves the precision of the solution. For validation purposes, simulation are performed using datasets from an actual isolated microgrid located in northern Chile. Results show that DPP method is very well suited for this type of problem when compared with the MILP approach.

...read moreread less

Journal Article•DOI•

Dynamic programming for new energy vehicles based on their work modes Part II: Fuel cell electric vehicles

[...]

Wei Zhou¹, Lin Yang¹, Yishan Cai¹, Tianxing Ying¹•Institutions (1)

Shanghai Jiao Tong University¹

15 Dec 2018-Journal of Power Sources

TL;DR: A unified dynamic programming model and its solution method to solve the problems for fuel cell electric vehicles and the results demonstrate that the proposed method is much better than Basic Dynamic Programming and Level-Set Dynamic Programming in both calculation time and computation accuracy.

...read moreread less

Journal Article•DOI•

Hybrid-Trip-Model-Based Energy Management of a PHEV With Computation-Optimized Dynamic Programming

[...]

Jichao Liu¹, Yangzhou Chen¹, Wei Li¹, Fei Shang¹, Jingyuan Zhan¹ - Show less +1 more•Institutions (1)

Beijing University of Technology¹

01 Jan 2018-IEEE Transactions on Vehicular Technology

TL;DR: The proposed DP-based EMS with prediction horizon can offer an effective solution for the PHEV applying it online in the trip path without GPS data and can further reduce the fuel consumption and emissions.

...read moreread less

Abstract: Plug-in hybrid electric vehicles (PHEVs) with fuel and electricity have demonstrated the capability to reduce fuel consumption and emissions by adopting appropriate energy management strategies. In the existing energy management strategies, the dynamic programming (DP)-based energy management strategy (EMS) can realize the global optimization of the fuel consumption if the global vehicle-speed trajectory is known in advance. The global vehicle-speed trajectory can be obtained by applying GPS data of vehicles when the trip path is determined. However, for a trip path without GPS data, the global vehicle-speed trajectory is difficult to be gained. In this case, the DP-based EMS cannot be utilized to achieve the globally optimal fuel consumption, which is the issue discussed in this paper. This paper makes the following two contributions to solve this issue. First of all, the cell transmission model of the road traffic flow and the vehicle kinematics are introduced to obtain the traffic speeds of road segments and the accelerations of the PHEV. On this basis, a hybrid trip model is presented to obtain the vehicle-speed trajectory for the trip path without GPS data. Next, a DP-based EMS with prediction horizon is proposed, and moreover, in order to improve its real-time implementation, a search range optimization algorithm of the state of charge (SOC) is designed to reduce the computational load of DP. In summary, we propose a computation-optimized DP-based EMS through applying the hybrid trip model. Finally, a simulation study is conducted for applying the proposed EMS to a practical trip path in Beijing road network. The results show that the hybrid trip model can effectively construct the vehicle-speed trajectory online, and the average accuracy of the vehicle-speed trajectory is more than 78%. In addition, compared with the existing optimization algorithm for DP calculation, the SOC search range optimization algorithm can further reduce the calculation load of DP. More importantly, compared to the globally optimal DP-based EMS, although the proposed EMS makes the fuel consumption grow less than 5.36%, it can be implemented in real time. Moreover, compared with the existing real-time strategies, it can further reduce the fuel consumption and emissions. Thus, the proposed EMS can offer an effective solution for the PHEV applying it online in the trip path without GPS data.

...read moreread less

Journal Article•DOI•

Data-Driven Finite-Horizon Approximate Optimal Control for Discrete-Time Nonlinear Systems Using Iterative HDP Approach

[...]

Chaoxu Mu¹, Ding Wang², Haibo He³•Institutions (3)

Tianjin University¹, Chinese Academy of Sciences², University of Rhode Island³

01 Oct 2018-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A data-based finite-horizon optimal control approach for discrete-time nonlinear affine systems and the convergence of the iterative ADP algorithm and the stability of the weight estimation errors based on the HDP structure are intensively analyzed.

...read moreread less

Abstract: This paper presents a data-based finite-horizon optimal control approach for discrete-time nonlinear affine systems. The iterative adaptive dynamic programming (ADP) is used to approximately solve Hamilton-Jacobi–Bellman equation by minimizing the cost function in finite time. The idea is implemented with the heuristic dynamic programming (HDP) involved the model network, which makes the iterative control at the first step can be obtained without the system function, meanwhile the action network is used to obtain the approximate optimal control law and the critic network is utilized for approximating the optimal cost function. The convergence of the iterative ADP algorithm and the stability of the weight estimation errors based on the HDP structure are intensively analyzed. Finally, two simulation examples are provided to demonstrate the theoretical results and show the performance of the proposed method.

...read moreread less

Journal Article•DOI•

Adaptive Dynamic Programming for Robust Regulation and Its Application to Power Systems

[...]

Xiong Yang¹, Haibo He², Xiangnan Zhong•Institutions (2)

Tianjin University¹, University of Rhode Island²

01 Jul 2018-IEEE Transactions on Industrial Electronics

TL;DR: A simultaneous policy iteration (SPI) algorithm is developed to solve the optimal regulation problem within the framework of adaptive dynamic programming, and actor and critic networks are employed to approximate the optimal control and the optimal value function.

...read moreread less

Abstract: This paper presents a novel robust regulation method for a class of continuous-time nonlinear systems subject to unmatched perturbations. To begin with, the robust regulation problem is transformed into an optimal regulation problem by constructing a value function for the auxiliary system. Then, a simultaneous policy iteration (SPI) algorithm is developed to solve the optimal regulation problem within the framework of adaptive dynamic programming. To implement the SPI algorithm, actor and critic networks are employed to approximate the optimal control and the optimal value function, respectively, and the Monte Carlo integration method is applied to obtain the unknown weight parameters. Finally, two examples, including a power system, are provided to demonstrate the applicability of the developed approach.

...read moreread less

Journal Article•DOI•

Value-based deep reinforcement learning for adaptive isolated intersection signal control

[...]

Chia-Hao Wan, Ming-Chorng Hwang

01 Nov 2018-Iet Intelligent Transport Systems

TL;DR: A dynamic discount factor embedded in the iterative Bellman equation is proposed to prevent from a biased estimation of action-value function due to the effects of inconstant time step interval and shows that the trained agent outperforms a fixed timing plan in all testing cases with reducing system total delay by 20%.

...read moreread less

Abstract: Under efficiency improvement of road networks by utilizing advanced traffic signal control methods, intelligent transportation systems intend to characterize a smart city. Recently, due to significant progress in artificial intelligence, machine learning-based framework of adaptive traffic signal control has been highly concentrated. In particular, deep Q-learning neural network is a model-free technique and can be applied to optimal action selection problems. However, setting variable green time is a key mechanism to reflect traffic fluctuations such that time steps need not be fixed intervals in reinforcement learning framework. In this study, the authors proposed a dynamic discount factor embedded in the iterative Bellman equation to prevent from a biased estimation of action-value function due to the effects of inconstant time step interval. Moreover, action is added to the input layer of the neural network in the training process, and the output layer is the estimated action-value for the denoted action. Then, the trained neural network can be used to generate action that leads to an optimal estimated value within a finite set as the agents' policy. The preliminary results show that the trained agent outperforms a fixed timing plan in all testing cases with reducing system total delay by 20%..

...read moreread less

Journal Article•DOI•

Ranking and Selection as Stochastic Control

[...]

Yijie Peng¹, Edwin K. P. Chong², Chun-Hung Chen³, Michael C. Fu⁴•Institutions (4)

Peking University¹, Colorado State University², George Mason University³, University of Maryland, College Park⁴

23 Jan 2018-IEEE Transactions on Automatic Control

TL;DR: This work formulate the fully sequential sampling and selection decision in statistical ranking and selection as a stochastic control problem as a Bayesian framework, and derives an approximately optimal allocation policy that possesses both one-step-ahead and asymptotic optimality for independent normal sampling distributions.

...read moreread less

Abstract: Under a Bayesian framework, we formulate the fully sequential sampling and selection decision in statistical ranking and selection as a stochastic control problem, and derive the associated Bellman equation. Using a value function approximation, we derive an approximately optimal allocation policy. We show that this policy is not only computationally efficient but also possesses both one-step-ahead and asymptotic optimality for independent normal sampling distributions. Moreover, the proposed allocation policy is easily generalizable in the approximate dynamic programming paradigm.

...read moreread less

Journal Article•DOI•

A scatter simulated annealing algorithm for the bi-objective scheduling problem for the wet station of semiconductor manufacturing

[...]

Jihong Pang¹, Hongming Zhou¹, Ya-Chih Tsai², Fuh-Der Chou¹•Institutions (2)

Wenzhou University¹, Vanung University²

01 Sep 2018-Computers & Industrial Engineering

TL;DR: An intuitive threshold method and a dynamic programming approach are proposed for scheduling jobs and PMs under a given job sequence and a scatter simulated annealing algorithm in which a scatter-search mechanism leads SSA to explore more potential solutions is developed.

...read moreread less

Journal Article•DOI•

Policy Iteration for $H_\infty $ Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming

[...]

Yuanheng Zhu¹, Dongbin Zhao¹, Xiong Yang², Qichao Zhang¹•Institutions (2)

Chinese Academy of Sciences¹, Tianjin University²

01 Feb 2018-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An approximate solution to the

${H_\infty }$ optimal control of polynomial nonlinear systems is proposed.

...read moreread less

Abstract: Sum of squares (SOS) polynomials have provided a computationally tractable way to deal with inequality constraints appearing in many control problems. It can also act as an approximator in the framework of adaptive dynamic programming. In this paper, an approximate solution to the ${H_\infty }$ optimal control of polynomial nonlinear systems is proposed. Under a given attenuation coefficient, the Hamilton–Jacobi–Isaacs equation is relaxed to an optimization problem with a set of inequalities. After applying the policy iteration technique and constraining inequalities to SOS, the optimization problem is divided into a sequence of feasible semidefinite programming problems. With the converged solution, the attenuation coefficient is further minimized to a lower value. After iterations, approximate solutions to the smallest ${L_{2}}$ -gain and the associated ${H_\infty }$ optimal controller are obtained. Four examples are employed to verify the effectiveness of the proposed algorithm.

...read moreread less

Collapse