Showing papers on "Dynamic programming published in 2016"

PDF

Open Access

Journal Article•DOI•

Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems

[...]

Qinglai Wei¹, Derong Liu², Hanquan Lin¹•Institutions (2)

Chinese Academy of Sciences¹, University of Science and Technology Beijing²

01 Mar 2016-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms and it is emphasized that new termination criteria are established to guarantee the effectiveness of the iteration control laws.

...read moreread less

Abstract: In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.

...read moreread less

324 citations

Journal Article•DOI•

Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems

[...]

Weinan Gao¹, Zhong-Ping Jiang¹•Institutions (1)

New York University¹

30 Mar 2016-IEEE Transactions on Automatic Control

TL;DR: This note studies the adaptive optimal output regulation problem for continuous-time linear systems, which aims to achieve asymptotic tracking and disturbance rejection by minimizing some predefined costs by employing reinforcement learning and adaptive dynamic programming techniques.

...read moreread less

Abstract: This note studies the adaptive optimal output regulation problem for continuous-time linear systems, which aims to achieve asymptotic tracking and disturbance rejection by minimizing some predefined costs. Reinforcement learning and adaptive dynamic programming techniques are employed to compute an approximated optimal controller using input/partial-state data despite unknown system dynamics and unmeasurable disturbance. Rigorous stability analysis shows that the proposed controller exponentially stabilizes the closed-loop system and the output of the plant asymptotically tracks the given reference signal. Simulation results on a LCL coupled inverter-based distributed generation system demonstrate the effectiveness of the proposed approach.

...read moreread less

251 citations

Journal Article•DOI•

Finding optimal solutions for vehicle routing problem with pickup and delivery services with time windows: A dynamic programming approach based on state–space–time network representations

[...]

Monirehalsadat Mahmoudi¹, Xuesong Zhou¹•Institutions (1)

Arizona State University¹

01 Jul 2016-Transportation Research Part B-methodological

TL;DR: A new time-discretized multi-commodity network flow model for the VRPPDTW based on the integration of vehicles carrying states within space-time transportation networks is proposed, so as to allow a joint optimization of passenger-to-vehicle assignment and turn-by-turn routing in congested transportation networks.

...read moreread less

Abstract: Optimization of on-demand transportation systems and ride-sharing services involves solving a class of complex vehicle routing problems with pickup and delivery with time windows (VRPPDTW). This paper first proposes a new time-discretized multi-commodity network flow model for the VRPPDTW based on the integration of vehicles’ carrying states within space–time transportation networks, so as to allow a joint optimization of passenger-to-vehicle assignment and turn-by-turn routing in congested transportation networks. Our three-dimensional state–space–time network construct is able to comprehensively enumerate possible transportation states at any given time along vehicle space–time paths, and further allows a forward dynamic programming solution algorithm to solve the single vehicle VRPPDTW problem. By utilizing a Lagrangian relaxation approach, the primal multi-vehicle routing problem is decomposed to a sequence of single vehicle routing sub-problems, with Lagrangian multipliers for individual passengers’ requests being updated by sub-gradient-based algorithms. We further discuss a number of search space reduction strategies and test our algorithms, implemented through a specialized program in C++, on medium-scale and large-scale transportation networks, namely the Chicago sketch and Phoenix regional networks.

...read moreread less

242 citations

Proceedings Article•DOI•

Deep learning for detecting multiple space-time action tubes in videos

[...]

Suman Saha¹, Gurkirt Singh², Michael Sapienza³, Philip H. S. Torr³, Fabio Cuzzolin¹ - Show less +1 more•Institutions (3)

Oxford Brookes University¹, École normale supérieure de Cachan², University of Oxford³

04 Aug 2016

TL;DR: A huge leap forward in action detection performance is achieved and 20% and 11% gain in mAP are reported on UCF-101 and J-HMDB-21 datasets respectively when compared to the state-of-the-art.

...read moreread less

Abstract: In this work, we propose an approach to the spatiotemporal localisation (detection) and classification of multiple concurrent actions within temporally untrimmed videos. Our framework is composed of three stages. In stage 1, appearance and motion detection networks are employed to localise and score actions from colour images and optical flow. In stage 2, the appearance network detections are boosted by combining them with the motion detection scores, in proportion to their respective spatial overlap. In stage 3, sequences of detection boxes most likely to be associated with a single action instance, called action tubes, are constructed by solving two energy maximisation problems via dynamic programming. While in the first pass, action paths spanning the whole video are built by linking detection boxes over time using their class-specific scores and their spatial overlap, in the second pass, temporal trimming is performed by ensuring label consistency for all constituting detection boxes. We demonstrate the performance of our algorithm on the challenging UCF101, J-HMDB-21 and LIRIS-HARL datasets, achieving new state-of-the-art results across the board and significantly increasing detection speed at test time.

...read moreread less

223 citations

Journal Article•DOI•

Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming

[...]

Weinan Gao¹, Yu Jiang¹, Zhong-Ping Jiang², Tianyou Chai²•Institutions (2)

New York University¹, Northeastern University (China)²

01 Oct 2016-Automatica

TL;DR: The obtained adaptive and optimal output-feedback controllers differ from the existing literature on the ADP in that they are derived from sampled-data systems theory and are guaranteed to be robust to dynamic uncertainties.

...read moreread less

183 citations

Journal Article•DOI•

Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design

[...]

Tao Bian¹, Zhong-Ping Jiang¹•Institutions (1)

New York University¹

01 Sep 2016-Automatica

TL;DR: A continuous-time version of the traditional value iteration (VI) algorithm is presented with rigorous convergence analysis, crucial for developing new adaptive dynamic programming methods to solve the adaptive optimal control problem and the stochastic robust optimal Control problem for linear continuous- time systems.

...read moreread less

149 citations

Journal Article•DOI•

Asymptotically Stable Adaptive–Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation

[...]

Kyriakos G. Vamvoudakis¹, Marcio F. Miranda², Joao P. Hespanha¹•Institutions (2)

University of California, Santa Barbara¹, Universidade Federal de Minas Gerais²

01 Nov 2016-IEEE Transactions on Neural Networks

TL;DR: A control algorithm based on adaptive dynamic programming to solve the infinite-horizon optimal control problem for known deterministic nonlinear systems with saturating actuators and nonquadratic cost functionals is proposed.

...read moreread less

Abstract: This paper proposes a control algorithm based on adaptive dynamic programming to solve the infinite-horizon optimal control problem for known deterministic nonlinear systems with saturating actuators and nonquadratic cost functionals. The algorithm is based on an actor/critic framework, where a critic neural network (NN) is used to learn the optimal cost, and an actor NN is used to learn the optimal control policy. The adaptive control nature of the algorithm requires a persistence of excitation condition to be a priori validated, but this can be relaxed using previously stored data concurrently with current data in the update of the critic NN. A robustifying control term is added to the controller to eliminate the effect of residual errors, leading to the asymptotically stability of the closed-loop system. Simulation results show the effectiveness of the proposed approach for a controlled Van der Pol oscillator and also for a power system plant.

...read moreread less

138 citations

Proceedings Article•

Increasing the action gap: new operators for reinforcement learning

[...]

Marc G. Bellemare¹, Georg Ostrovski¹, Arthur Guez¹, Philip S. Thomas², Rémi Munos¹ - Show less +1 more•Institutions (2)

Google¹, Carnegie Mellon University²

12 Feb 2016

TL;DR: An operator for tabular representations is described, the consistent Bellman operator, which incorporates a notion of local policy consistency, which leads to an increase in the action gap at each state; increasing this gap mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies.

...read moreread less

Abstract: This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.

...read moreread less

121 citations

Journal Article•DOI•

Eco‐driving in urban traffic networks using traffic signals information

[...]

Giovanni De Nunzio¹, Carlos Canudas de Wit², Philippe Moulin¹, Domenico Di Domenico¹•Institutions (2)

French Institute of Petroleum¹, French Institute for Research in Computer Science and Automation²

01 Apr 2016-International Journal of Robust and Nonlinear Control

TL;DR: The proposed sub‐optimal strategy is compared with the optimal solution provided by dynamic programming for validation purposes and it is shown that the low computational load of the presented approach enables robustness properties and results very appealing for online use.

...read moreread less

Abstract: The problem of eco-driving is analyzed for an urban traffic network in presence of signalized intersections. It is assumed that the traffic lights timings are known and available to the vehicles via infrastructure-to-vehicle (I2V) communication. This work provides a solution to the energy consumption minimization, while traveling through a sequence of signalized intersections and always catching a green light. The optimal control problem is non-convex due to the constraints coming from the traffic lights, therefore a sub-optimal strategy to restore the convexity and solve the problem is proposed. Firstly, a pruning algorithm aims at reducing the optimization domain, by considering only the portions of the traffic lights green phases that allow to drive in compliance with the city speed limits. Then, a graph is created in the feasible region, in order to approximate the energy consumption associated with each available path in the driving horizon. Lastly, after the problem convexity is recovered, a simple optimization problem is solved on the selected path to calculate the optimal crossing times at each intersection. The optimal speeds are then suggested to the driver. The proposed sub-optimal strategy is compared to the optimal solution provided by Dynamic Programming, for validation purposes. It is also shown that the low computational load of the presented approach enables robustness properties, and results very appealing for online use.

...read moreread less

108 citations

Journal Article•DOI•

Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning

[...]

Hamidreza Modares¹, Frank L. Lewis¹, Zhong-Ping Jiang²•Institutions (2)

University of Texas at Arlington¹, New York University²

22 Sep 2016-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: It is shown that the proposed OPFB method is more powerful than the static OPFB as it is equivalent to a state-feedback control policy and is successfully used to solve a regulation and a tracking problem.

...read moreread less

Abstract: A model-free off-policy reinforcement learning algorithm is developed to learn the optimal output-feedback (OPFB) solution for linear continuous-time systems. The proposed algorithm has the important feature of being applicable to the design of optimal OPFB controllers for both regulation and tracking problems. To provide a unified framework for both optimal regulation and tracking, a discounted performance function is employed and a discounted algebraic Riccati equation (ARE) is derived which gives the solution to the problem. Conditions on the existence of a solution to the discounted ARE are provided and an upper bound for the discount factor is found to assure the stability of the optimal control solution. To develop an optimal OPFB controller, it is first shown that the system state can be constructed using some limited observations on the system output over a period of the history of the system. A Bellman equation is then developed to evaluate a control policy and find an improved policy simultaneously using only some limited observations on the system output. Then, using this Bellman equation, a model-free Off-policy RL-based OPFB controller is developed without requiring the knowledge of the system state or the system dynamics. It is shown that the proposed OPFB method is more powerful than the static OPFB as it is equivalent to a state-feedback control policy. The proposed method is successfully used to solve a regulation and a tracking problem.

...read moreread less

102 citations

Journal Article•DOI•

Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming

[...]

Xiong Yang¹, Derong Liu², Qinglai Wei¹, Ding Wang¹•Institutions (2)

Chinese Academy of Sciences¹, University of Science and Technology Beijing²

19 Jul 2016-Neurocomputing

TL;DR: This paper presents an adaptive dynamic programming-based guaranteed cost neural tracking control algorithm for a class of continuous-time matched uncertain nonlinear systems by introducing an augmented system and employing a modified cost function with a discount factor.

...read moreread less

Journal Article•DOI•

Real-Time Fuel Economy Optimization With Nonlinear MPC for PHEVs

[...]

Jiangyan Zhang¹, Tielong Shen²•Institutions (2)

Dalian Nationalities University¹, Sophia University²

26 Jan 2016-IEEE Transactions on Control Systems and Technology

TL;DR: This brief addresses the energy management problem with the framework of receding horizon optimization for power-split plug-in hybrid electric vehicles (HEVs) by proposing an online iterative algorithm to solve the optimization problem based on the continuation/generalized minimum residual algorithm.

...read moreread less

Abstract: This brief addresses the energy management problem with the framework of receding horizon optimization. For power-split plug-in hybrid electric vehicles (HEVs), the real-time power-split decision is formulated as a nonlinear receding horizon optimization problem. Then, an online iterative algorithm to solve the optimization problem is proposed based on the continuation/generalized minimum residual algorithm. It should be noted that the proposed energy management strategy aims for optimality of the targeted horizon, but the solution is not optimal for the full driving route, unlike many solutions presented using the dynamic programming approaches. At each decision step, only the initial value of the optimal solution is implemented according to the receding horizon optimization approach. Finally, to demonstrate a comparison of the proposed scheme with other schemes, numerical validations conducted on a full-scale GT-SUITE HEV simulator are presented.

...read moreread less

Journal Article•DOI•

Real time energy management strategy for a fast charging electric urban bus powered by hybrid energy storage system

[...]

Huilong Yu¹, Davide Tarsitano¹, Xiaosong Hu², Federico Cheli¹•Institutions (2)

Polytechnic University of Milan¹, Chongqing University²

01 Oct 2016-Energy

TL;DR: Simulation results demonstrate that the overall performance of optimized fuzzy logic based energy management strategy can be improved significantly and can even approach the optimal results of dynamic programming.

...read moreread less

Journal Article•DOI•

Branch-and-price algorithms for the solution of the multi-trip vehicle routing problem with time windows

[...]

Florent Hernandez¹, Dominique Feillet², Rodolphe Giroudeau, Olivier Naud•Institutions (2)

École Polytechnique de Montréal¹, Mines ParisTech²

01 Mar 2016-European Journal of Operational Research

TL;DR: This work investigates the exact solution of the vehicle routing problem with time windows, where multiple trips are allowed for the vehicles and specifically considers the case in which it is mandatory to visit all customers and there is no limitation on duration.

...read moreread less

Journal Article•DOI•

Combining sampling-based and scenario-based nested Benders decomposition methods: application to stochastic dual dynamic programming

[...]

Steffen Rebennack¹•Institutions (1)

Colorado School of Mines¹

01 Mar 2016-Mathematical Programming

TL;DR: The SDDP algorithm is embedded into the scenario tree framework, essentially combining the nested Benders decomposition method on trees with the sampling procedure of SDDP, which allows for the incorporation of different types of uncertainties in multi-stage stochastic optimization while still maintaining an efficient solution algorithm.

...read moreread less

Abstract: Nested Benders decomposition is a widely used and accepted solution methodology for multi-stage stochastic linear programming problems. Motivated by large-scale applications in the context of hydro-thermal scheduling, in 1991, Pereira and Pinto introduced a sampling-based variant of the Benders decomposition method, known as stochastic dual dynamic programming (SDDP). In this paper, we embed the SDDP algorithm into the scenario tree framework, essentially combining the nested Benders decomposition method on trees with the sampling procedure of SDDP. This allows for the incorporation of different types of uncertainties in multi-stage stochastic optimization while still maintaining an efficient solution algorithm. We provide an illustration of the applicability of our method towards a least-cost hydro-thermal scheduling problem by examining an illustrative example combining both fuel cost with inflow uncertainty and by studying the Panama power system incorporating both electricity demand and inflow uncertainties.

...read moreread less

Journal Article•DOI•

The Electric Traveling Salesman Problem with Time Windows

[...]

Roberto Roberti¹, Min Wen•Institutions (1)

Technical University of Denmark¹

01 May 2016-Transportation Research Part E-logistics and Transportation Review

TL;DR: The logistic field has seen an increasing usage of electric vehicles and the resulting distribution planning problems present new computational challenges, and this paper addresses these problems.

...read moreread less

Abstract: To minimize greenhouse gas emissions, the logistic field has seen an increasing usage of electric vehicles. The resulting distribution planning problems present new computational challenges. We address a problem, called Electric Traveling Salesman Problem with Time Windows. We propose a mixed integer linear formulation that can solve 20-customer instances in short computing times and a Three-Phase Heuristic algorithm based on General Variable Neighborhood Search and Dynamic Programming. Computational results show that the heuristic algorithm can find the optimal solution in most small-size instances within a tenth of a second and achieves goods solutions in instances with up to 200 customers.

...read moreread less

Journal Article•DOI•

Near-Optimal Tracking Control of Mobile Robots Via Receding-Horizon Dual Heuristic Programming

[...]

Chuanqiang Lian¹, Xin Xu¹, Hong Chen², Haibo He³•Institutions (3)

National University of Defense Technology¹, Jilin University², University of Rhode Island³

01 Nov 2016-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A near-optimal tracking control method is presented for WMRs based on receding-horizon dual heuristic programming (RHDHP) and it is illustrated that the proposed method has lower computational burden than conventional MPC, which is very beneficial for real-time tracking control.

...read moreread less

Abstract: Trajectory tracking control of wheeled mobile robots (WMRs) has been an important research topic in control theory and robotics. Although various tracking control methods with stability have been developed for WMRs, it is still difficult to design optimal or near-optimal tracking controller under uncertainties and disturbances. In this paper, a near-optimal tracking control method is presented for WMRs based on receding-horizon dual heuristic programming (RHDHP). In the proposed method, a backstepping kinematic controller is designed to generate desired velocity profiles and the receding horizon strategy is used to decompose the infinite-horizon optimal control problem into a series of finite-horizon optimal control problems. In each horizon, a closed-loop tracking control policy is successively updated using a class of approximate dynamic programming algorithms called finite-horizon dual heuristic programming (DHP). The convergence property of the proposed method is analyzed and it is shown that the tracking control system based on RHDHP is asymptotically stable by using the Lyapunov approach. Simulation results on three tracking control problems demonstrate that the proposed method has improved control performance when compared with conventional model predictive control (MPC) and DHP. It is also illustrated that the proposed method has lower computational burden than conventional MPC, which is very beneficial for real-time tracking control.

...read moreread less

Adaptive Dynamic Programming For Control Algorithms And Stability

[...]

Doreen Eichel

01 Jan 2016

TL;DR: Adapt dynamic programming for control algorithms and stability helps people to enjoy a good book with a cup of tea in the afternoon, but instead they are facing with some malicious virus inside their computer.

...read moreread less

Abstract: Thank you very much for reading adaptive dynamic programming for control algorithms and stability. Maybe you have knowledge that, people have look hundreds times for their favorite novels like this adaptive dynamic programming for control algorithms and stability, but end up in harmful downloads. Rather than enjoying a good book with a cup of tea in the afternoon, instead they are facing with some malicious virus inside their computer.

...read moreread less

Proceedings Article•DOI•

An Efficient Optimal Planning and Control Framework For Quadrupedal Locomotion

[...]

Farbod Farshidian¹, Michael Neunert¹, Alexander W. Winkler¹, Gonzalo J. Rey, Jonas Buchli¹ - Show less +1 more•Institutions (1)

ETH Zurich¹

30 Sep 2016-arXiv: Systems and Control

TL;DR: In this article, the authors present an efficient dynamic programing framework for optimal planning and control of legged robots, where the switching times as well as the contact forces and the joint velocities are optimized for different locomotion tasks.

...read moreread less

Abstract: In this paper, we present an efficient Dynamic Programing framework for optimal planning and control of legged robots. First we formulate this problem as an optimal control problem for switched systems. Then we propose a multi--level optimization approach to find the optimal switching times and the optimal continuous control inputs. Through this scheme, the decomposed optimization can potentially be done more efficiently than the combined approach. Finally, we present a continuous-time constrained LQR algorithm which simultaneously optimizes the feedforward and feedback controller with $O(n)$ time-complexity. In order to validate our approach, we show the performance of our framework on a quadrupedal robot. We choose the Center of Mass dynamics and the full kinematic formulation as the switched system model where the switching times as well as the contact forces and the joint velocities are optimized for different locomotion tasks such as gap crossing, walking and trotting.

...read moreread less

Journal Article•DOI•

Perspectives of approximate dynamic programming

[...]

Warren B. Powell¹•Institutions (1)

Princeton University¹

01 Jun 2016-Annals of Operations Research

TL;DR: There is actually a common theme to these strategies, and underpinning the entire field remains the fundamental algorithmic strategies of value and policy iteration that were first introduced in the 1950’s and 60s.

...read moreread less

Abstract: Approximate dynamic programming has evolved, initially independently, within operations research, computer science and the engineering controls community, all search- ing for practical tools for solving sequential stochastic optimization problems. More so than other communities, operations research continued to develop the theory behind the basic model introduced by Bellman with discrete states and actions, even while authors as early as Bellman himself recognized its limits due to the "curse of dimensionality" inherent in discrete state spaces. In response to these limitations, subcommunities in computer science, control theory and operations research have developed a variety of methods for solving dif- ferent classes of stochastic, dynamic optimization problems, creating the appearance of a jungle of competing approaches. In this article, we show that there is actually a common theme to these strategies, and underpinning the entire field remains the fundamental algo- rithmic strategies of value and policy iteration that were first introduced in the 1950's and 60's. Dynamic programming involves making decisions over time, under uncertainty. These problems arise in a wide range of applications, spanning business, science, engineering, economics, medicine and health, and operations. While tremendous successes have been achieved in specific problem settings, we lack general purpose tools with the broad applica- bility enjoyed by algorithmic strategies such as linear, nonlinear and integer programming. This paper provides an introduction to the challenges of dynamic programming, and describes the contributions made by different subcommunities, with special emphasis on computer science which pioneered a field known as reinforcement learning, and the opera- tions research community which has made contributions through several subcommunities, including stochastic programming, simulation optimization and approximate dynamic pro- gramming. Our presentation recognizes, but does not do justice to, the important contribu- tions made in the engineering controls communities.

...read moreread less

Journal Article•DOI•

Fast Model Predictive Control for Multilevel Cascaded H-Bridge STATCOM With Polynomial Computation Time

[...]

Yonglei Zhang¹, Xiaojie Wu¹, Xibo Yuan¹, Wang Yingjie¹, Peng Dai¹ - Show less +1 more•Institutions (1)

China University of Mining and Technology¹

24 May 2016-IEEE Transactions on Industrial Electronics

TL;DR: A mixed algorithm, which combines dynamic programming and greedy algorithm (0-1 programing), is proposed to select the optimal switching combination from all the redundant switching combinations for the voltage balancing MPC achieving a global optimization.

...read moreread less

Abstract: To solve the issue of exponentially increasing computational burden of finite control set model predictive control (FCS-MPC) for multilevel converters, a fast MPC scheme for multilevel cascaded H-bridge STATCOM is presented The proposed approach consists of three steps First, with the partially stratified optimization approach, the multiobjective programming of MPC is divided into two suboptimization problems, ie, current-control MPC and voltage-balancing MPC Second, a dynamic programming algorithm is proposed for the optimization of current-control MPC Third, a mixed algorithm, which combines dynamic programming and greedy algorithm (0-1 programing), is proposed to select the optimal switching combination from all the redundant switching combinations for the voltage balancing MPC achieving a global optimization Through the analysis of the time complexity, with the proposed scheme, the total computation of FCS-MPC can be reduced to polynomial time from exponential time The proposed approaches will not deteriorate the control performance The control performance is validated by simulation results and the effectiveness is further demonstrated by implementing the algorithm on a low-cost DSP (TMS320F28335) in real time

...read moreread less

Proceedings Article•

Bayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach

[...]

Remi Lam¹, Karen Willcox¹, David H. Wolpert²•Institutions (2)

Massachusetts Institute of Technology¹, Santa Fe Institute²

01 Dec 2016

TL;DR: This work considers the problem of optimizing an expensive objective function when a finite budget of total evaluations is prescribed, and shows how to approximate the solution of this dynamic programming problem using rollout, and proposes rollout heuristics specifically designed for the Bayesian optimization setting.

...read moreread less

Abstract: We consider the problem of optimizing an expensive objective function when a finite budget of total evaluations is prescribed. In that context, the optimal solution strategy for Bayesian optimization can be formulated as a dynamic programming instance. This results in a complex problem with uncountable, dimension-increasing state space and an uncountable control space. We show how to approximate the solution of this dynamic programming problem using rollout, and propose rollout heuristics specifically designed for the Bayesian optimization setting. We present numerical experiments showing that the resulting algorithm for optimization with a finite budget outperforms several popular Bayesian optimization algorithms.

...read moreread less

Journal Article•DOI•

Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics

[...]

Yuanheng Zhu, Dongbin Zhao, Xiangjun Li

25 Jul 2016-Iet Control Theory and Applications

TL;DR: A model-free adaptive optimal tracking algorithm based on the framework of reinforcement learning and adaptive dynamic programming is proposed in this study which learns the optimal solution online in real time without any information of the system dynamics.

...read moreread less

Abstract: The optimal tracking of non-linear systems without knowing system dynamics is an important and intractable problem. Based on the framework of reinforcement learning (RL) and adaptive dynamic programming, a model-free adaptive optimal tracking algorithm is proposed in this study. After constructing an augmented system with the tracking errors and the reference states, the tracking problem is converted to a regulation problem with respect to the new system. Several RL techniques are synthesised to form a novel algorithm which learns the optimal solution online in real time without any information of the system dynamics. Continuous adaptation laws are defined by the current observations and the past experience. The convergence is guaranteed by Lyapunov analysis. Two simulations on a linear and a non-linear systems demonstrate the performance of the proposed approach.

...read moreread less

Journal Article•DOI•

Optimal Storage Scheduling Using Markov Decision Processes

[...]

Samuele Grillo¹, Antonio Pievatolo, Enrico Tironi¹•Institutions (1)

Polytechnic University of Milan¹

01 Apr 2016-IEEE Transactions on Sustainable Energy

TL;DR: A method based on Markov decision processes to optimally schedule energy storage devices in power distribution networks with renewable generation and other properties, such as energy storage placement and size, can be assessed and compared in optimized systems with different layouts.

...read moreread less

Abstract: The paper presents a method based on Markov decision processes to optimally schedule energy storage devices in power distribution networks with renewable generation. The time series of renewable generation is modeled as a Markov chain which allows for the implementation of a stochastic dynamic programming algorithm. The output of this algorithm is an optimal scheduling policy for the storage device achieving the minimization of an objective function including cost of energy and network losses. Besides this, other properties, such as energy storage placement and size, can be assessed and compared in optimized systems with different layouts.

...read moreread less

Journal Article•DOI•

Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise

[...]

Tao Bian¹, Yu Jiang², Zhong-Ping Jiang¹•Institutions (2)

New York University¹, MathWorks²

05 Apr 2016-IEEE Transactions on Automatic Control

TL;DR: Both adaptive dynamic programming and robust ADP algorithms are developed, along with rigorous stability and convergence analysis, to iteratively update the control policy on-line by using directly the data of the system state and input.

...read moreread less

Abstract: In this technical note, the adaptive optimal control problem is investigated for a class of continuous-time stochastic systems subject to multiplicative noise. A novel non-model-based optimal control design methodology is employed to iteratively update the control policy on-line by using directly the data of the system state and input. Both adaptive dynamic programming (ADP) and robust ADP algorithms are developed, along with rigorous stability and convergence analysis. The effectiveness of the obtained methods is illustrated by an example arising from biological sensorimotor control.

...read moreread less

Journal Article•DOI•

Hypervolume subset selection in two dimensions: Formulations and algorithms

[...]

Tobias Kuhn¹, Carlos M. Fonseca², Luís Paquete², Stefan Ruzika³, Miguel M. Duarte², José Rui Figueira⁴ - Show less +2 more•Institutions (4)

Kaiserslautern University of Technology¹, University of Coimbra², University of Koblenz and Landau³, Instituto Superior Técnico⁴

01 Sep 2016-Evolutionary Computation

TL;DR: Two new formulations are provided for the two-dimensional variant of the hypervolume subset selection problem, including a (linear) integer programming formulation that can be solved by solving its linear programming relaxation and a k-link shortest path formulation on a special digraph with the Monge property that can been solved by dynamic programming in time.

...read moreread less

Abstract: The hypervolume subset selection problem consists of finding a subset, with a given cardinality k, of a set of nondominated points that maximizes the hypervolume indicator. This problem arises in selection procedures of evolutionary algorithms for multiobjective optimization, for which practically efficient algorithms are required. In this article, two new formulations are provided for the two-dimensional variant of this problem. The first is a linear integer programming formulation that can be solved by solving its linear programming relaxation. The second formulation is a k-link shortest path formulation on a special digraph with the Monge property that can be solved by dynamic programming in time. This improves upon the result of in Bader 2009, and slightly improves upon the result of in Bringmann eti¾ al. 2014b, which was developed independently from this work using different techniques. Numerical results are shown for several values of n and k.

...read moreread less

Dynamic Programming Deterministic And Stochastic Models

[...]

Phillipp Bergmann

01 Jan 2016

TL;DR: This dynamic programming deterministic and stochastic models helps people to read a good book with a cup of coffee in the afternoon, instead they are facing with some malicious virus inside their laptop.

...read moreread less

Abstract: Thank you very much for downloading dynamic programming deterministic and stochastic models. Maybe you have knowledge that, people have search hundreds times for their chosen novels like this dynamic programming deterministic and stochastic models, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious virus inside their laptop.

...read moreread less

Journal Article•DOI•

Enhanced exact solution methods for the Team Orienteering Problem

[...]

Morteza Keshtkaran¹, Koorush Ziarati¹, Andrea Bettinelli², Daniele Vigo²•Institutions (2)

Shiraz University¹, University of Bologna²

17 Jan 2016-International Journal of Production Research

TL;DR: In this article, a Branch-and-Cut-andPrice (B&P) approach was proposed to solve the TOP problem with a bounded bidirectional dynamic programming algorithm with decremental state space relaxation.

...read moreread less

Abstract: The Team Orienteering Problem (TOP) is one of the most investigated problems in the family of vehicle routing problems with profits. In this paper, we propose a Branch-and-Price approach to find proven optimal solutions to TOP. The pricing sub-problem is solved by a bounded bidirectional dynamic programming algorithm with decremental state space relaxation featuring a two-phase dominance rule relaxation. The new method is able to close 17 previously unsolved benchmark instances. In addition, we propose a Branch-and-Cut-and-Price approach using subset-row inequalities and show the effectiveness of these cuts in solving TOP.

...read moreread less

Journal Article•DOI•

Trip-oriented stochastic optimal energy management strategy for plug-in hybrid electric bus

[...]

Yongchang Du¹, Yue Zhao¹, Qinpu Wang, Yuanbo Zhang¹, Huaicheng Xia² - Show less +1 more•Institutions (2)

Tsinghua University¹, Yanshan University²

15 Nov 2016-Energy

TL;DR: The proposed trip-oriented stochastic optimal energy management strategy for plug-in hybrid electric bus outperforms both the well-tuned equivalent consumption minimization strategy and the rule-based strategy in terms of fuel economy, and even proved to be close to the optimal result obtained by dynamic programming.

...read moreread less

Journal Article•DOI•

Two-agent single-machine scheduling to minimize the batch delivery cost

[...]

Yunqiang Yin¹, Yan Wang¹, T.C.E. Cheng², Dujuan Wang³, Chin-Chia Wu⁴ - Show less +1 more•Institutions (4)

Kunming University of Science and Technology¹, Hong Kong Polytechnic University², Dalian University of Technology³, Feng Chia University⁴

01 Feb 2016-Computers & Industrial Engineering

TL;DR: This work considers integrated production and batch delivery scheduling in a make-to-order production system involving two competing agents, each of which having its own job set competes to process its jobs on a shared single machine.

...read moreread less

Collapse