Showing papers on "Dynamic programming published in 2014"

PDF

Open Access

Book•

Introduction to Stochastic Dynamic Programming

[...]

Sheldon M. Ross

25 Sep 2014

1,100 citations

Journal Article•DOI•

Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems

[...]

Derong Liu, Qinglai Wei

01 Mar 2014-IEEE Transactions on Neural Networks

TL;DR: It is shown that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation and it is proven that any of the iteratives control laws can stabilize the nonlinear systems.

...read moreread less

Abstract: This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.

...read moreread less

535 citations

Journal Article•DOI•

Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems

[...]

Yu Jiang¹, Zhong-Ping Jiang¹•Institutions (1)

New York University¹

06 Jan 2014-IEEE Transactions on Neural Networks

TL;DR: The proposed RADP methodology can be viewed as an extension of ADP to uncertain nonlinear systems and has been applied to the controller design problems for a jet engine and a one-machine power system.

...read moreread less

Abstract: This paper studies the robust optimal control design for a class of uncertain nonlinear systems from a perspective of robust adaptive dynamic programming (RADP). The objective is to fill up a gap in the past literature of adaptive dynamic programming (ADP) where dynamic uncertainties or unmodeled dynamics are not addressed. A key strategy is to integrate tools from modern nonlinear control theory, such as the robust redesign and the backstepping techniques as well as the nonlinear small-gain theorem, with the theory of ADP. The proposed RADP methodology can be viewed as an extension of ADP to uncertain nonlinear systems. Practical learning algorithms are developed in this paper, and have been applied to the controller design problems for a jet engine and a one-machine power system.

...read moreread less

328 citations

Journal Article•DOI•

A Population Prediction Strategy for Evolutionary Dynamic Multiobjective Optimization

[...]

Aimin Zhou¹, Yaochu Jin², Qingfu Zhang³•Institutions (3)

East China Normal University¹, University of Surrey², City University of Hong Kong³

01 Jan 2014-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper systematically compares PPS with a random initialization strategy and a hybrid initialization strategy on a variety of test instances with linear or nonlinear correlation between design variables to show that PPS is promising for dealing with dynamic environments.

...read moreread less

Abstract: This paper investigates how to use prediction strategies to improve the performance of multiobjective evolutionary optimization algorithms in dealing with dynamic environments. Prediction-based methods have been applied to predict some isolated points in both dynamic single objective optimization and dynamic multiobjective optimization. We extend this idea to predict a whole population by considering the properties of continuous dynamic multiobjective optimization problems. In our approach, called population prediction strategy (PPS), a Pareto set is divided into two parts: a center point and a manifold. A sequence of center points is maintained to predict the next center, and the previous manifolds are used to estimate the next manifold. Thus, PPS could initialize a whole population by combining the predicted center and estimated manifold when a change is detected. We systematically compare PPS with a random initialization strategy and a hybrid initialization strategy on a variety of test instances with linear or nonlinear correlation between design variables. The statistical results show that PPS is promising for dealing with dynamic environments.

...read moreread less

315 citations

Journal Article•DOI•

Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach

[...]

Derong Liu, Ding Wang, Hongliang Li

01 Feb 2014-IEEE Transactions on Neural Networks

TL;DR: It is proven that the decentralized control strategy of the overall system can be established by adding appropriate feedback gains to the optimal control policies of the isolated subsystems, and an online policy iteration algorithm is presented to solve the Hamilton-Jacobi-Bellman equations.

...read moreread less

Abstract: In this paper, using a neural-network-based online learning optimal control approach, a novel decentralized control strategy is developed to stabilize a class of continuous-time nonlinear interconnected large-scale systems. First, optimal controllers of the isolated subsystems are designed with cost functions reflecting the bounds of interconnections. Then, it is proven that the decentralized control strategy of the overall system can be established by adding appropriate feedback gains to the optimal control policies of the isolated subsystems. Next, an online policy iteration algorithm is presented to solve the Hamilton-Jacobi-Bellman equations related to the optimal control problem. Through constructing a set of critic neural networks, the cost functions can be obtained approximately, followed by the control policies. Furthermore, the dynamics of the estimation errors of the critic networks are verified to be uniformly and ultimately bounded. Finally, a simulation example is provided to illustrate the effectiveness of the present decentralized control scheme.

...read moreread less

273 citations

Journal Article•DOI•

Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems.

[...]

Huaguang Zhang¹, Chunbin Qin², Bin Jiang³, Yanhong Luo¹•Institutions (3)

Northeastern University (China)¹, Henan University², Nanjing University of Aeronautics and Astronautics³

28 Jul 2014-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An online adaptive policy learning algorithm (APLA) based on adaptive dynamic programming (ADP) is proposed for learning in real-time the solution to the Hamilton-Jacobi-Isaacs (HJI) equation, which appears in the H∞ control problem.

...read moreread less

Abstract: The problem of H∞ state feedback control of affine nonlinear discrete-time systems with unknown dynamics is investigated in this paper. An online adaptive policy learning algorithm (APLA) based on adaptive dynamic programming (ADP) is proposed for learning in real-time the solution to the Hamilton-Jacobi-Isaacs (HJI) equation, which appears in the H∞ control problem. In the proposed algorithm, three neural networks (NNs) are utilized to find suitable approximations of the optimal value function and the saddle point feedback control and disturbance policies. Novel weight updating laws are given to tune the critic, actor, and disturbance NNs simultaneously by using data generated in real-time along the system trajectories. Considering NN approximation errors, we provide the stability analysis of the proposed algorithm with Lyapunov approach. Moreover, the need of the system input dynamics for the proposed algorithm is relaxed by using a NN identification scheme. Finally, simulation examples show the effectiveness of the proposed algorithm.

...read moreread less

197 citations

Journal Article•DOI•

A theory of Markovian time-inconsistent stochastic control in discrete time

[...]

Tomas Björk¹, Agatha Murgoci²•Institutions (2)

Stockholm School of Economics¹, Copenhagen Business School²

13 Jun 2014-Finance and Stochastics

TL;DR: A theory for a general class of discrete-time stochastic control problems that, in various ways, are time-inconsistent in the sense that they do not admit a Bellman optimality principle is developed.

...read moreread less

Abstract: We develop a theory for a general class of discrete-time stochastic control problems that, in various ways, are time-inconsistent in the sense that they do not admit a Bellman optimality principle. We attack these problems by viewing them within a game theoretic framework, and we look for subgame perfect Nash equilibrium points. For a general controlled Markov process and a fairly general objective functional, we derive an extension of the standard Bellman equation, in the form of a system of nonlinear equations, for the determination of the equilibrium strategy as well as the equilibrium value function. Most known examples of time-inconsistent stochastic control problems in the literature are easily seen to be special cases of the present theory. We also prove that for every time-inconsistent problem, there exists an associated time-consistent problem such that the optimal control and the optimal value function for the consistent problem coincide with the equilibrium control and value function, respectively for the time-inconsistent problem. To exemplify the theory, we study some concrete examples, such as hyperbolic discounting and mean–variance control.

...read moreread less

188 citations

Journal Article•DOI•

Adaptive dynamic programming and optimal control of nonlinear nonaffine systems

[...]

Tao Bian¹, Yu Jiang¹, Zhong-Ping Jiang¹•Institutions (1)

New York University¹

01 Oct 2014-Automatica

TL;DR: A novel optimal control design scheme is proposed for continuous-time nonaffine nonlinear dynamic systems with unknown dynamics by adaptive dynamic programming (ADP), which iteratively updates the control policy online by using the state and input information without identifying the system dynamics.

...read moreread less

184 citations

Journal Article•DOI•

Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification

[...]

Qinglai Wei¹, Derong Liu¹•Institutions (1)

Chinese Academy of Sciences¹

01 Oct 2014-IEEE Transactions on Automation Science and Engineering

TL;DR: A new data-based iterative optimal learning control scheme for discrete-time nonlinear systems using iterative adaptive dynamic programming (ADP) approach is established and the developed control scheme is applied to solve a coal gasification optimal tracking control problem.

...read moreread less

Abstract: In this paper, we establish a new data-based iterative optimal learning control scheme for discrete-time nonlinear systems using iterative adaptive dynamic programming (ADP) approach and apply the developed control scheme to solve a coal gasification optimal tracking control problem. According to the system data, neural networks (NNs) are used to construct the dynamics of coal gasification process, coal quality and reference control, respectively, where the mathematical model of the system is unnecessary. The approximation errors from neural network construction of the disturbance and the controls are both considered. Via system transformation, the optimal tracking control problem with approximation errors and disturbances is effectively transformed into a two-person zero-sum optimal control problem. A new iterative ADP algorithm is then developed to obtain the optimal control laws for the transformed system. Convergence property is developed to guarantee that the performance index function converges to a finite neighborhood of the optimal performance index function, and the convergence criterion is also obtained. Finally, numerical results are given to illustrate the performance of the present method.

...read moreread less

182 citations

Journal Article•DOI•

Convex Optimization for the Energy Management of Hybrid Electric Vehicles Considering Engine Start and Gearshift Costs

[...]

Tobias Nüesch, Philipp Elbert, Michael Flankl, Christopher H. Onder, Lino Guzzella - Show less +1 more

19 Feb 2014-Energies

TL;DR: In this article, a combination of deterministic dynamic programming (DP) and convex optimization is proposed to solve the energy management problem for hybrid electric vehicles (HEVs) with engine start and gearshift costs.

...read moreread less

Abstract: This paper presents a novel method to solve the energy management problem for hybrid electric vehicles (HEVs) with engine start and gearshift costs. The method is based on a combination of deterministic dynamic programming (DP) and convex optimization. As demonstrated in a case study, the method yields globally optimal results while returning the solution in much less time than the conventional DP method. In addition, the proposed method handles state constraints, which allows for the application to scenarios where the battery state of charge (SOC) reaches its boundaries.

...read moreread less

175 citations

Journal Article•DOI•

Data-Driven Neuro-Optimal Temperature Control of Water–Gas Shift Reaction Using Stable Iterative Adaptive Dynamic Programming

[...]

Qinglai Wei¹, Derong Liu¹•Institutions (1)

Chinese Academy of Sciences¹

21 Jan 2014-IEEE Transactions on Industrial Electronics

TL;DR: A novel data-driven stable iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal temperature control problems for water-gas shift (WGS) reaction systems where neural networks are used to construct the dynamics of the WGS system and solve the reference control.

...read moreread less

Abstract: In this paper, a novel data-driven stable iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal temperature control problems for water–gas shift (WGS) reaction systems. According to the system data, neural networks (NNs) are used to construct the dynamics of the WGS system and solve the reference control, respectively, where the mathematical model of the WGS system is unnecessary. Considering the reconstruction errors of NNs and the disturbances of the system and control input, a new stable iterative ADP algorithm is developed to obtain the optimal control law. The convergence property is developed to guarantee that the iterative performance index function converges to a finite neighborhood of the optimal performance index function. The stability property is developed to guarantee that each of the iterative control laws can make the tracking error uniformly ultimately bounded (UUB). NNs are developed to implement the stable iterative ADP algorithm. Finally, numerical results are given to illustrate the effectiveness of the developed method.

...read moreread less

Journal Article•DOI•

Optimal energy management in a dual-storage fuel-cell hybrid vehicle using multi-dimensional dynamic programming

[...]

Mehdi Ansarey¹, Masoud Shariat Panahi¹, Hussein Ziarati², Mohammad J. Mahjoob¹•Institutions (2)

University of Tehran¹, Islamic Azad University²

15 Mar 2014-Journal of Power Sources

TL;DR: In this paper, the authors proposed an optimal solution to the energy management problem in fuel-cell hybrid vehicles with dual storage buffer for fuel economy in a standard driving cycle using multi-dimensional dynamic programming (MDDP).

...read moreread less

Journal Article•DOI•

Map-matching algorithm for large-scale low-frequency floating car data

[...]

Bi Yu Chen¹, Hui Yuan¹, Qingquan Li¹, William H. K. Lam², Shih-Lung Shaw¹, Ke Yan¹ - Show less +2 more•Institutions (2)

Wuhan University¹, Hong Kong Polytechnic University²

01 Jan 2014-International Journal of Geographical Information Science

TL;DR: A multi-criteria dynamic programming map-matching (MDP-MM) algorithm is proposed for online matching FCD that is competitive with existing algorithms in both accuracy and computational performance.

...read moreread less

Abstract: Large-scale global positioning system GPS positioning information of floating cars has been recognised as a major data source for many transportation applications. Mapping large-scale low-frequency floating car data FCD onto the road network is very challenging for traditional map-matching MM algorithms developed for in-vehicle navigation. In this paper, a multi-criteria dynamic programming map-matching MDP-MM algorithm is proposed for online matching FCD. In the proposed MDP-MM algorithm, the MDP technique is used to minimise the number of candidate routes maintained at each GPS point, while guaranteeing to determine the best matching route. In addition, several useful techniques are developed to improve running time of the shortest path calculation in the MM process. Case studies based on real FCD demonstrate the accuracy and computational performance of the MDP-MM algorithm. Results indicated that the MDP-MM algorithm is competitive with existing algorithms in both accuracy and computational performance.

...read moreread less

Journal Article•DOI•

Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming

[...]

Qinglai Wei¹, Fei-Yue Wang¹, Derong Liu¹, Xiong Yang¹•Institutions (1)

Chinese Academy of Sciences¹

26 Sep 2014-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation.

...read moreread less

Abstract: In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton–Jacobi–Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new “design method of the convergence criteria” for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.

...read moreread less

Journal Article•DOI•

Engine On/Off Control for the Energy Management of a Serial Hybrid Electric Bus via Convex Optimization

[...]

Philipp Elbert¹, Tobias Nüesch¹, Andreas Ritter¹, Nikolce Murgovski², Lino Guzzella¹ - Show less +1 more•Institutions (2)

ETH Zurich¹, Chalmers University of Technology²

31 Jan 2014-IEEE Transactions on Vehicular Technology

TL;DR: It is demonstrated that the optimal engine on/off strategy is to switch the engine on if and only if the requested power exceeds a certain nonconstant threshold.

...read moreread less

Abstract: Convex optimization has recently been suggested for solving the optimal energy management problem of hybrid electric vehicles. Compared with dynamic programming, this approach can significantly reduce the computational time, but the price to pay is additional model approximations and heuristics for discrete decision variables such as engine on/off control. In this paper, the globally optimal engine on/off conditions are derived analytically. It is demonstrated that the optimal engine on/off strategy is to switch the engine on if and only if the requested power exceeds a certain nonconstant threshold. By iteratively computing the threshold and the power split using convex optimization, the optimal solution to the energy management problem is found. The effectiveness of the presented approach is demonstrated in two sizing case studies. The first case study deals with high-energy-capacity batteries, whereas the second case study deals with supercapacitors that have much lower energy capacity. In both cases, the proposed algorithm yields optimal results much faster than the dynamic programming algorithm.

...read moreread less

Journal Article•DOI•

Optimal control for unknown discrete-time nonlinear Markov jump systems using adaptive dynamic programming.

[...]

Xiangnan Zhong¹, Haibo He¹, Huaguang Zhang², Zhanshan Wang²•Institutions (2)

University of Rhode Island¹, Northeastern University (China)²

04 Mar 2014-IEEE Transactions on Neural Networks

TL;DR: An identifier is established for the unknown systems to approximate system states, and an optimal control approach for nonlinear MJSs is developed to solve the Hamilton-Jacobi-Bellman equation based on the adaptive dynamic programming technique.

...read moreread less

Abstract: In this paper, we develop and analyze an optimal control method for a class of discrete-time nonlinear Markov jump systems (MJSs) with unknown system dynamics. Specifically, an identifier is established for the unknown systems to approximate system states, and an optimal control approach for nonlinear MJSs is developed to solve the Hamilton-Jacobi-Bellman equation based on the adaptive dynamic programming technique. We also develop detailed stability analysis of the control approach, including the convergence of the performance index function for nonlinear MJSs and the existence of the corresponding admissible control. Neural network techniques are used to approximate the proposed performance index function and the control law. To demonstrate the effectiveness of our approach, three simulation studies, one linear case, one nonlinear case, and one single link robot arm case, are used to validate the performance of the proposed optimal control method.

...read moreread less

Journal Article•DOI•

Adaptive Dynamic Programming for a Class of Complex-Valued Nonlinear Systems

[...]

Ruizhuo Song¹, Wendong Xiao¹, Huaguang Zhang², Chang-Yin Sun¹•Institutions (2)

University of Science and Technology Beijing¹, Northeastern University (China)²

11 Mar 2014-IEEE Transactions on Neural Networks

TL;DR: An optimal control scheme based on adaptive dynamic programming (ADP) is developed to solve infinite-horizon optimal control problems of continuous-time complex-valued nonlinear systems and a new performance index function is established on the basis ofcomplex-valued state and control.

...read moreread less

Abstract: In this brief, an optimal control scheme based on adaptive dynamic programming (ADP) is developed to solve infinite-horizon optimal control problems of continuous-time complex-valued nonlinear systems. A new performance index function is established on the basis of complex-valued state and control. Using system transformations, the complex-valued system is transformed into a real-valued one, which overcomes Cauchy-Riemann conditions effectively. With the transformed system and the performance index function, a new ADP method is developed to obtain the optimal control law by using neural networks. A compensation controller is developed to compensate the approximation errors of neural networks. Stability properties of the nonlinear system are analyzed and convergence properties of the weights for neural networks are presented. Finally, simulation results demonstrate the performance of the developed optimal control scheme for complex-valued nonlinear systems.

...read moreread less

Journal Article•DOI•

Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming

[...]

Ding Wang¹, Derong Liu¹, Hongliang Li¹, Hongwen Ma¹•Institutions (1)

Chinese Academy of Sciences¹

01 Oct 2014-Information Sciences

TL;DR: The neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming approach is investigated and it is shown that this robust controller can achieve optimality under a specified cost function.

...read moreread less

Journal Article•DOI•

Cost-optimal power system extension under flow-based market coupling

[...]

Simeon Hagspiel¹, Cosima Jägemann¹, Dietmar Lindenberger¹, Tom Brown, Stanislav Cherevatskiy, Eckehard Tröster - Show less +2 more•Institutions (1)

University of Cologne¹

01 Mar 2014-Energy

TL;DR: In this article, the authors present a methodology that allows optimizing transmission investments under flow-based market coupling and complements established electricity market models implemented in a linear programming environment that are suitable for solving large-scale problems.

...read moreread less

Journal Article•DOI•

Revisiting approximate dynamic programming and its convergence.

[...]

Ali Heydari¹•Institutions (1)

South Dakota School of Mines and Technology¹

16 May 2014-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated and a relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided.

...read moreread less

Abstract: Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.

...read moreread less

Journal Article•DOI•

A Novel Iterative theta-Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems

[...]

Qinglai Wei¹, Derong Liu²•Institutions (2)

Chinese Academy of Sciences¹, University of Illinois at Chicago²

01 Oct 2014-IEEE Transactions on Automation Science and Engineering

TL;DR: It is proved that all the Iterative controls obtained in the iterative θ-ADP algorithm can stabilize the nonlinear system which means that the iteratives θ, which is feasible for implementations both online and offline, is feasible.

...read moreread less

Abstract: This paper is concerned with a new iterative theta-adaptive dynamic programming (ADP) technique to solve optimal control problems of infinite horizon discrete-time nonlinear systems. The idea is to use an iterative ADP algorithm to obtain the iterative control law which optimizes the iterative performance index function. In the present iterative theta-ADP algorithm, the condition of initial admissible control in policy iteration algorithm is avoided. It is proved that all the iterative controls obtained in the iterative theta-ADP algorithm can stabilize the nonlinear system which means that the iterative theta-ADP algorithm is feasible for implementations both online and offline. Convergence analysis of the performance index function is presented to guarantee that the iterative performance index function will converge to the optimum monotonically. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative theta-ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the established method.

...read moreread less

Proceedings Article•

Scaling Up Robust MDPs using Function Approximation

[...]

Aviv Tamar¹, Shie Mannor¹, Huan Xu²•Institutions (2)

Technion – Israel Institute of Technology¹, National University of Singapore²

21 Jun 2014

TL;DR: This work develops a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs and shows that the proposed method provably succeeds under certain technical conditions, and its effectiveness through simulation of an option pricing problem.

...read moreread less

Abstract: We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm. Previous studies showed that robust MDPs, based on a minimax approach to handling uncertainty, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that model real-life problems are typically prohibitively large for such approaches. In this work we employ a reinforcement learning approach to tackle this planning problem: we develop a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs. We show that the proposed method provably succeeds under certain technical conditions, and demonstrate its effectiveness through simulation of an option pricing problem. To the best of our knowledge, this is the first attempt to scale up the robust MDP paradigm.

...read moreread less

Journal Article•DOI•

Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming

[...]

Chunbin Qin¹, Huaguang Zhang², Yanhong Luo²•Institutions (2)

Henan University¹, Northeastern University (China)²

04 May 2014-International Journal of Control

TL;DR: A novel theoretic formulation based on adaptive dynamic programming (ADP) is developed to solve online the optimal tracking problem of the continuous-time linear system with unknown dynamics, where the original system dynamics and the reference trajectory dynamics are transformed into an augmented system.

...read moreread less

Abstract: In this paper, a novel theoretic formulation based on adaptive dynamic programming (ADP) is developed to solve online the optimal tracking problem of the continuous-time linear system with unknown dynamics. First, the original system dynamics and the reference trajectory dynamics are transformed into an augmented system. Then, under the same performance index with the original system dynamics, an augmented algebraic Riccati equation is derived. Furthermore, the solutions for the optimal control problem of the augmented system are proven to be equal to the standard solutions for the optimal tracking problem of the original system dynamics. Moreover, a new online algorithm based on the ADP technique is presented to solve the optimal tracking problem of the linear system with unknown system dynamics. Finally, simulation results are given to verify the effectiveness of the theoretic results.

...read moreread less

Proceedings Article•

Probabilistic Differential Dynamic Programming

[...]

Yunpeng Pan¹, Evangelos A. Theodorou¹•Institutions (1)

Georgia Institute of Technology¹

08 Dec 2014

TL;DR: Compared with the classical DDP and a state-of-the-art GP-based policy search method, PDDP offers a superior combination of data-efficiency, learning speed, and applicability.

...read moreread less

Abstract: We present a data-driven, probabilistic trajectory optimization framework for systems with unknown dynamics, called Probabilistic Differential Dynamic Programming (PDDP). PDDP takes into account uncertainty explicitly for dynamics models using Gaussian processes (GPs). Based on the second-order local approximation of the value function, PDDP performs Dynamic Programming around a nominal trajectory in Gaussian belief spaces. Different from typical gradient-based policy search methods, PDDP does not require a policy parameterization and learns a locally optimal, time-varying control policy. We demonstrate the effectiveness and efficiency of the proposed algorithm using two nontrivial tasks. Compared with the classical DDP and a state-of-the-art GP-based policy search method, PDDP offers a superior combination of data-efficiency, learning speed, and applicability.

...read moreread less

Clearing the Jungle of Stochastic Optimization

[...]

Warren B. Powell¹•Institutions (1)

Princeton University¹

01 Jan 2014

TL;DR: This article places a variety of competing strategies into a common framework, which makes it easier to see the close relationship between communities such as stochastic programming, (approximate) dynamic programming, simulation, and Stochastic search.

...read moreread less

Abstract: Whereas deterministic optimization enjoys an almost universally accepted canonical form, stochastic optimization is a jungle of competing notational systems and algorithmic strategies. This is especially problematic in the context of sequential (multistage) stochastic optimization problems, which is the focus of our presentation. In this article, we place a variety of competing strategies into a common framework, which makes it easier to see the close relationship between communities such as stochastic programming, (approximate) dynamic programming, simulation, and stochastic search. What have previously been viewed as competing approaches (e.g., simulation versus optimization, stochastic programming versus dynamic programming) can be reduced to four fundamental classes of policies that are evaluated in a simulation-based setting we call the base model. The result is a single coherent framework that encompasses all of these methods, which can often be combined to create powerful hybrid policies to address complex problems.

...read moreread less

Journal Article•DOI•

Learning in Mean-Field Games

[...]

Huibing Yin¹, Prashant G. Mehta¹, Sean P. Meyn², Uday V. Shanbhag³•Institutions (3)

University of Illinois at Urbana–Champaign¹, University of Florida², Pennsylvania State University³

01 Mar 2014-IEEE Transactions on Automatic Control

TL;DR: ADP techniques for design and adaptation (learning) of approximately optimal control laws for this model are introduced and a parameterization is proposed, based on an analysis of the mean-field PDE model for the game.

...read moreread less

Abstract: The purpose of this paper is to show how insight obtained from a mean-field model can be used to create an architecture for approximate dynamic programming (ADP) for a certain class of games comprising of a large number of agents. The general technique is illustrated with the aid of a mean-field oscillator game model introduced in our prior work. The states of the model are interpreted as the phase angles for a collection of nonhomogeneous oscillators, and in this way the model may be regarded as an extension of the classical coupled oscillator model of Kuramoto. The paper introduces ADP techniques for design and adaptation (learning) of approximately optimal control laws for this model. For this purpose, a parameterization is proposed, based on an analysis of the mean-field PDE model for the game. In an offline setting, a Galerkin procedure is introduced to choose the optimal parameters while in an online setting, a steepest descent algorithm is proposed. The paper provides a detailed analysis of the optimal parameter values as well as the Bellman error with both the Galerkin approximation and the online algorithm. Finally, a phase transition result is described for the large population limit when each oscillator uses the approximately optimal control law. A critical value of the control penalty parameter is identified: above this value, the oscillators are incoherent; and below this value (when control is sufficiently cheap) the oscillators synchronize. These conclusions are illustrated with results from numerical experiments.

...read moreread less

Journal Article•DOI•

Optimization of CHCP (combined heat power and cooling) systems operation strategy using dynamic programming

[...]

Andrea Luigi Facci, Luca Andreassi¹, Stefano Ubertini•Institutions (1)

Instituto Politécnico Nacional¹

01 Mar 2014-Energy

TL;DR: In this paper, a trigeneration plant is designed to meet the thermal and electrical loads of a user and is connected to the electrical grid, and the problem is discretized in time and plant states, represented as weighted graph and the strategy that minimizes the total cost is determined using backward dynamic programming.

...read moreread less

Journal Article•DOI•

Stochastic Maximum Principle for Mean-Field Type Optimal Control Under Partial Information

[...]

Guangchen Wang¹, Chenghui Zhang¹, Weihai Zhang²•Institutions (2)

Shandong University¹, Shandong University of Science and Technology²

01 Feb 2014-IEEE Transactions on Automatic Control

TL;DR: This technical note is concerned with a partially observed optimal control problem, whose novel feature is that the cost functional is of mean-field type, Hence determining the optimal control is time inconsistent in the sense that Bellman's dynamic programming principle does not hold.

...read moreread less

Abstract: This technical note is concerned with a partially observed optimal control problem, whose novel feature is that the cost functional is of mean-field type. Hence determining the optimal control is time inconsistent in the sense that Bellman's dynamic programming principle does not hold. A maximum principle is established using Girsanov's theorem and convex variation. Some nonlinear filtering results for backward stochastic differential equations (BSDEs) are developed by expressing the solutions of the BSDEs as some Ito's processes. An illustrative example is demonstrated in terms of the maximum principle and the filtering.

...read moreread less

Journal Article•DOI•

Full-range adaptive cruise control based on supervised adaptive dynamic programming

[...]

Dongbin Zhao¹, Zhaohui Hu¹, Zhongpu Xia¹, Cesare Alippi¹, Yuanheng Zhu¹, Ding Wang¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

01 Feb 2014-Neurocomputing

TL;DR: The conclusion is that the proposed SADP algorithm is an effective control methodology able to effectively address the full-range ACC problem.

...read moreread less

Dissertation•

Wind power forecasting uncertainty and unit commitment

[...]

Dissertação De Mestrado Apresentada, À Faculdade De Engenharia, Ramo Energia

15 Jul 2014

TL;DR: Results show that the stochastic approach leads to more robust Unit Commitment solutions than the deterministic one.

...read moreread less

Abstract: In this work we evaluate the impact of considering a stochastic approach on the day-ahead basis Unit Commitment. Comparisons between stochastic and deterministic Unit Commitment solutions are provided. The Unit Commitment model consists in the minimization of the total operation costs considering units’ technical constraints like ramping rates and minimum up and down time. Load shedding and wind power spilling is acceptable, but at inflated operational costs. The generation of Unit Commitment solution is guaranteed by DEEPSO, which is a hybrid DE-EA-PSO algorithm, where DE stands for Differential Evolution, EA for Evolutionary Algorithms and PSO for Particle Swarm Optimization. The evaluation process consists in the calculation of the optimal economic dispatch and in verifying the fulfillment of the considered constraints. For the calculation of the optimal economic dispatch an algorithm based on the Benders Decomposition, namely on the Dual Dynamic Programming, was developed. If possible, the constraints added to the dispatch problem by the Benders Decomposition algorithm will provide a feasible and optimal dispatch solution. Two approaches were considered on the construction of stochastic solutions. Either the top 5 more probable wind power output scenarios are used, or a set of extreme scenarios are considered instead. Data related to wind power outputs from two different operational days is considered on the analysis. Stochastic and deterministic solutions are compared based on the actual measured wind power output at the operational day. Through a technique capable of finding representative wind power scenarios and their probabilities we were able to analyze in a more detailed process the expected final operational costs. Also, we expose the probability that the system operator has on the operational costs being under/above certain value. Results show that the stochastic approach leads to more robust Unit Commitment solutions than the deterministic one. The method of using the top 5 more probable scenarios on the search for the stochastic solution proved to produce preferable results. Index Terms – unit commitment, stochastic, wind power, forecasting, uncertainty, DEEPSO, Benders Decomposition.

...read moreread less

Collapse