scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 2016"


Journal ArticleDOI
TL;DR: In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms and it is emphasized that new termination criteria are established to guarantee the effectiveness of the iteration control laws.
Abstract: In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.

324 citations


Journal ArticleDOI
TL;DR: This note studies the adaptive optimal output regulation problem for continuous-time linear systems, which aims to achieve asymptotic tracking and disturbance rejection by minimizing some predefined costs by employing reinforcement learning and adaptive dynamic programming techniques.
Abstract: This note studies the adaptive optimal output regulation problem for continuous-time linear systems, which aims to achieve asymptotic tracking and disturbance rejection by minimizing some predefined costs. Reinforcement learning and adaptive dynamic programming techniques are employed to compute an approximated optimal controller using input/partial-state data despite unknown system dynamics and unmeasurable disturbance. Rigorous stability analysis shows that the proposed controller exponentially stabilizes the closed-loop system and the output of the plant asymptotically tracks the given reference signal. Simulation results on a LCL coupled inverter-based distributed generation system demonstrate the effectiveness of the proposed approach.

251 citations


Journal ArticleDOI
TL;DR: A new time-discretized multi-commodity network flow model for the VRPPDTW based on the integration of vehicles carrying states within space-time transportation networks is proposed, so as to allow a joint optimization of passenger-to-vehicle assignment and turn-by-turn routing in congested transportation networks.
Abstract: Optimization of on-demand transportation systems and ride-sharing services involves solving a class of complex vehicle routing problems with pickup and delivery with time windows (VRPPDTW). This paper first proposes a new time-discretized multi-commodity network flow model for the VRPPDTW based on the integration of vehicles’ carrying states within space–time transportation networks, so as to allow a joint optimization of passenger-to-vehicle assignment and turn-by-turn routing in congested transportation networks. Our three-dimensional state–space–time network construct is able to comprehensively enumerate possible transportation states at any given time along vehicle space–time paths, and further allows a forward dynamic programming solution algorithm to solve the single vehicle VRPPDTW problem. By utilizing a Lagrangian relaxation approach, the primal multi-vehicle routing problem is decomposed to a sequence of single vehicle routing sub-problems, with Lagrangian multipliers for individual passengers’ requests being updated by sub-gradient-based algorithms. We further discuss a number of search space reduction strategies and test our algorithms, implemented through a specialized program in C++, on medium-scale and large-scale transportation networks, namely the Chicago sketch and Phoenix regional networks.

242 citations


Proceedings ArticleDOI
04 Aug 2016
TL;DR: A huge leap forward in action detection performance is achieved and 20% and 11% gain in mAP are reported on UCF-101 and J-HMDB-21 datasets respectively when compared to the state-of-the-art.
Abstract: In this work, we propose an approach to the spatiotemporal localisation (detection) and classification of multiple concurrent actions within temporally untrimmed videos. Our framework is composed of three stages. In stage 1, appearance and motion detection networks are employed to localise and score actions from colour images and optical flow. In stage 2, the appearance network detections are boosted by combining them with the motion detection scores, in proportion to their respective spatial overlap. In stage 3, sequences of detection boxes most likely to be associated with a single action instance, called action tubes, are constructed by solving two energy maximisation problems via dynamic programming. While in the first pass, action paths spanning the whole video are built by linking detection boxes over time using their class-specific scores and their spatial overlap, in the second pass, temporal trimming is performed by ensuring label consistency for all constituting detection boxes. We demonstrate the performance of our algorithm on the challenging UCF101, J-HMDB-21 and LIRIS-HARL datasets, achieving new state-of-the-art results across the board and significantly increasing detection speed at test time.

223 citations


Journal ArticleDOI
TL;DR: The obtained adaptive and optimal output-feedback controllers differ from the existing literature on the ADP in that they are derived from sampled-data systems theory and are guaranteed to be robust to dynamic uncertainties.

183 citations


Journal ArticleDOI
TL;DR: A continuous-time version of the traditional value iteration (VI) algorithm is presented with rigorous convergence analysis, crucial for developing new adaptive dynamic programming methods to solve the adaptive optimal control problem and the stochastic robust optimal Control problem for linear continuous- time systems.

149 citations


Journal ArticleDOI
TL;DR: A control algorithm based on adaptive dynamic programming to solve the infinite-horizon optimal control problem for known deterministic nonlinear systems with saturating actuators and nonquadratic cost functionals is proposed.
Abstract: This paper proposes a control algorithm based on adaptive dynamic programming to solve the infinite-horizon optimal control problem for known deterministic nonlinear systems with saturating actuators and nonquadratic cost functionals. The algorithm is based on an actor/critic framework, where a critic neural network (NN) is used to learn the optimal cost, and an actor NN is used to learn the optimal control policy. The adaptive control nature of the algorithm requires a persistence of excitation condition to be a priori validated, but this can be relaxed using previously stored data concurrently with current data in the update of the critic NN. A robustifying control term is added to the controller to eliminate the effect of residual errors, leading to the asymptotically stability of the closed-loop system. Simulation results show the effectiveness of the proposed approach for a controlled Van der Pol oscillator and also for a power system plant.

138 citations


Proceedings Article
12 Feb 2016
TL;DR: An operator for tabular representations is described, the consistent Bellman operator, which incorporates a notion of local policy consistency, which leads to an increase in the action gap at each state; increasing this gap mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies.
Abstract: This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.

121 citations


Journal ArticleDOI
TL;DR: The proposed sub‐optimal strategy is compared with the optimal solution provided by dynamic programming for validation purposes and it is shown that the low computational load of the presented approach enables robustness properties and results very appealing for online use.
Abstract: The problem of eco-driving is analyzed for an urban traffic network in presence of signalized intersections. It is assumed that the traffic lights timings are known and available to the vehicles via infrastructure-to-vehicle (I2V) communication. This work provides a solution to the energy consumption minimization, while traveling through a sequence of signalized intersections and always catching a green light. The optimal control problem is non-convex due to the constraints coming from the traffic lights, therefore a sub-optimal strategy to restore the convexity and solve the problem is proposed. Firstly, a pruning algorithm aims at reducing the optimization domain, by considering only the portions of the traffic lights green phases that allow to drive in compliance with the city speed limits. Then, a graph is created in the feasible region, in order to approximate the energy consumption associated with each available path in the driving horizon. Lastly, after the problem convexity is recovered, a simple optimization problem is solved on the selected path to calculate the optimal crossing times at each intersection. The optimal speeds are then suggested to the driver. The proposed sub-optimal strategy is compared to the optimal solution provided by Dynamic Programming, for validation purposes. It is also shown that the low computational load of the presented approach enables robustness properties, and results very appealing for online use.

108 citations


Journal ArticleDOI
TL;DR: It is shown that the proposed OPFB method is more powerful than the static OPFB as it is equivalent to a state-feedback control policy and is successfully used to solve a regulation and a tracking problem.
Abstract: A model-free off-policy reinforcement learning algorithm is developed to learn the optimal output-feedback (OPFB) solution for linear continuous-time systems. The proposed algorithm has the important feature of being applicable to the design of optimal OPFB controllers for both regulation and tracking problems. To provide a unified framework for both optimal regulation and tracking, a discounted performance function is employed and a discounted algebraic Riccati equation (ARE) is derived which gives the solution to the problem. Conditions on the existence of a solution to the discounted ARE are provided and an upper bound for the discount factor is found to assure the stability of the optimal control solution. To develop an optimal OPFB controller, it is first shown that the system state can be constructed using some limited observations on the system output over a period of the history of the system. A Bellman equation is then developed to evaluate a control policy and find an improved policy simultaneously using only some limited observations on the system output. Then, using this Bellman equation, a model-free Off-policy RL-based OPFB controller is developed without requiring the knowledge of the system state or the system dynamics. It is shown that the proposed OPFB method is more powerful than the static OPFB as it is equivalent to a state-feedback control policy. The proposed method is successfully used to solve a regulation and a tracking problem.

102 citations


Journal ArticleDOI
TL;DR: This paper presents an adaptive dynamic programming-based guaranteed cost neural tracking control algorithm for a class of continuous-time matched uncertain nonlinear systems by introducing an augmented system and employing a modified cost function with a discount factor.

Journal ArticleDOI
TL;DR: This brief addresses the energy management problem with the framework of receding horizon optimization for power-split plug-in hybrid electric vehicles (HEVs) by proposing an online iterative algorithm to solve the optimization problem based on the continuation/generalized minimum residual algorithm.
Abstract: This brief addresses the energy management problem with the framework of receding horizon optimization. For power-split plug-in hybrid electric vehicles (HEVs), the real-time power-split decision is formulated as a nonlinear receding horizon optimization problem. Then, an online iterative algorithm to solve the optimization problem is proposed based on the continuation/generalized minimum residual algorithm. It should be noted that the proposed energy management strategy aims for optimality of the targeted horizon, but the solution is not optimal for the full driving route, unlike many solutions presented using the dynamic programming approaches. At each decision step, only the initial value of the optimal solution is implemented according to the receding horizon optimization approach. Finally, to demonstrate a comparison of the proposed scheme with other schemes, numerical validations conducted on a full-scale GT-SUITE HEV simulator are presented.

Journal ArticleDOI
01 Oct 2016-Energy
TL;DR: Simulation results demonstrate that the overall performance of optimized fuzzy logic based energy management strategy can be improved significantly and can even approach the optimal results of dynamic programming.

Journal ArticleDOI
TL;DR: This work investigates the exact solution of the vehicle routing problem with time windows, where multiple trips are allowed for the vehicles and specifically considers the case in which it is mandatory to visit all customers and there is no limitation on duration.

Journal ArticleDOI
TL;DR: The SDDP algorithm is embedded into the scenario tree framework, essentially combining the nested Benders decomposition method on trees with the sampling procedure of SDDP, which allows for the incorporation of different types of uncertainties in multi-stage stochastic optimization while still maintaining an efficient solution algorithm.
Abstract: Nested Benders decomposition is a widely used and accepted solution methodology for multi-stage stochastic linear programming problems. Motivated by large-scale applications in the context of hydro-thermal scheduling, in 1991, Pereira and Pinto introduced a sampling-based variant of the Benders decomposition method, known as stochastic dual dynamic programming (SDDP). In this paper, we embed the SDDP algorithm into the scenario tree framework, essentially combining the nested Benders decomposition method on trees with the sampling procedure of SDDP. This allows for the incorporation of different types of uncertainties in multi-stage stochastic optimization while still maintaining an efficient solution algorithm. We provide an illustration of the applicability of our method towards a least-cost hydro-thermal scheduling problem by examining an illustrative example combining both fuel cost with inflow uncertainty and by studying the Panama power system incorporating both electricity demand and inflow uncertainties.

Journal ArticleDOI
TL;DR: The logistic field has seen an increasing usage of electric vehicles and the resulting distribution planning problems present new computational challenges, and this paper addresses these problems.
Abstract: To minimize greenhouse gas emissions, the logistic field has seen an increasing usage of electric vehicles. The resulting distribution planning problems present new computational challenges. We address a problem, called Electric Traveling Salesman Problem with Time Windows. We propose a mixed integer linear formulation that can solve 20-customer instances in short computing times and a Three-Phase Heuristic algorithm based on General Variable Neighborhood Search and Dynamic Programming. Computational results show that the heuristic algorithm can find the optimal solution in most small-size instances within a tenth of a second and achieves goods solutions in instances with up to 200 customers.

Journal ArticleDOI
TL;DR: A near-optimal tracking control method is presented for WMRs based on receding-horizon dual heuristic programming (RHDHP) and it is illustrated that the proposed method has lower computational burden than conventional MPC, which is very beneficial for real-time tracking control.
Abstract: Trajectory tracking control of wheeled mobile robots (WMRs) has been an important research topic in control theory and robotics. Although various tracking control methods with stability have been developed for WMRs, it is still difficult to design optimal or near-optimal tracking controller under uncertainties and disturbances. In this paper, a near-optimal tracking control method is presented for WMRs based on receding-horizon dual heuristic programming (RHDHP). In the proposed method, a backstepping kinematic controller is designed to generate desired velocity profiles and the receding horizon strategy is used to decompose the infinite-horizon optimal control problem into a series of finite-horizon optimal control problems. In each horizon, a closed-loop tracking control policy is successively updated using a class of approximate dynamic programming algorithms called finite-horizon dual heuristic programming (DHP). The convergence property of the proposed method is analyzed and it is shown that the tracking control system based on RHDHP is asymptotically stable by using the Lyapunov approach. Simulation results on three tracking control problems demonstrate that the proposed method has improved control performance when compared with conventional model predictive control (MPC) and DHP. It is also illustrated that the proposed method has lower computational burden than conventional MPC, which is very beneficial for real-time tracking control.

01 Jan 2016
TL;DR: Adapt dynamic programming for control algorithms and stability helps people to enjoy a good book with a cup of tea in the afternoon, but instead they are facing with some malicious virus inside their computer.
Abstract: Thank you very much for reading adaptive dynamic programming for control algorithms and stability. Maybe you have knowledge that, people have look hundreds times for their favorite novels like this adaptive dynamic programming for control algorithms and stability, but end up in harmful downloads. Rather than enjoying a good book with a cup of tea in the afternoon, instead they are facing with some malicious virus inside their computer.

Proceedings ArticleDOI
TL;DR: In this article, the authors present an efficient dynamic programing framework for optimal planning and control of legged robots, where the switching times as well as the contact forces and the joint velocities are optimized for different locomotion tasks.
Abstract: In this paper, we present an efficient Dynamic Programing framework for optimal planning and control of legged robots. First we formulate this problem as an optimal control problem for switched systems. Then we propose a multi--level optimization approach to find the optimal switching times and the optimal continuous control inputs. Through this scheme, the decomposed optimization can potentially be done more efficiently than the combined approach. Finally, we present a continuous-time constrained LQR algorithm which simultaneously optimizes the feedforward and feedback controller with $O(n)$ time-complexity. In order to validate our approach, we show the performance of our framework on a quadrupedal robot. We choose the Center of Mass dynamics and the full kinematic formulation as the switched system model where the switching times as well as the contact forces and the joint velocities are optimized for different locomotion tasks such as gap crossing, walking and trotting.

Journal ArticleDOI
TL;DR: There is actually a common theme to these strategies, and underpinning the entire field remains the fundamental algorithmic strategies of value and policy iteration that were first introduced in the 1950’s and 60s.
Abstract: Approximate dynamic programming has evolved, initially independently, within operations research, computer science and the engineering controls community, all search- ing for practical tools for solving sequential stochastic optimization problems. More so than other communities, operations research continued to develop the theory behind the basic model introduced by Bellman with discrete states and actions, even while authors as early as Bellman himself recognized its limits due to the "curse of dimensionality" inherent in discrete state spaces. In response to these limitations, subcommunities in computer science, control theory and operations research have developed a variety of methods for solving dif- ferent classes of stochastic, dynamic optimization problems, creating the appearance of a jungle of competing approaches. In this article, we show that there is actually a common theme to these strategies, and underpinning the entire field remains the fundamental algo- rithmic strategies of value and policy iteration that were first introduced in the 1950's and 60's. Dynamic programming involves making decisions over time, under uncertainty. These problems arise in a wide range of applications, spanning business, science, engineering, economics, medicine and health, and operations. While tremendous successes have been achieved in specific problem settings, we lack general purpose tools with the broad applica- bility enjoyed by algorithmic strategies such as linear, nonlinear and integer programming. This paper provides an introduction to the challenges of dynamic programming, and describes the contributions made by different subcommunities, with special emphasis on computer science which pioneered a field known as reinforcement learning, and the opera- tions research community which has made contributions through several subcommunities, including stochastic programming, simulation optimization and approximate dynamic pro- gramming. Our presentation recognizes, but does not do justice to, the important contribu- tions made in the engineering controls communities.

Journal ArticleDOI
TL;DR: A mixed algorithm, which combines dynamic programming and greedy algorithm (0-1 programing), is proposed to select the optimal switching combination from all the redundant switching combinations for the voltage balancing MPC achieving a global optimization.
Abstract: To solve the issue of exponentially increasing computational burden of finite control set model predictive control (FCS-MPC) for multilevel converters, a fast MPC scheme for multilevel cascaded H-bridge STATCOM is presented The proposed approach consists of three steps First, with the partially stratified optimization approach, the multiobjective programming of MPC is divided into two suboptimization problems, ie, current-control MPC and voltage-balancing MPC Second, a dynamic programming algorithm is proposed for the optimization of current-control MPC Third, a mixed algorithm, which combines dynamic programming and greedy algorithm (0-1 programing), is proposed to select the optimal switching combination from all the redundant switching combinations for the voltage balancing MPC achieving a global optimization Through the analysis of the time complexity, with the proposed scheme, the total computation of FCS-MPC can be reduced to polynomial time from exponential time The proposed approaches will not deteriorate the control performance The control performance is validated by simulation results and the effectiveness is further demonstrated by implementing the algorithm on a low-cost DSP (TMS320F28335) in real time

Proceedings Article
01 Dec 2016
TL;DR: This work considers the problem of optimizing an expensive objective function when a finite budget of total evaluations is prescribed, and shows how to approximate the solution of this dynamic programming problem using rollout, and proposes rollout heuristics specifically designed for the Bayesian optimization setting.
Abstract: We consider the problem of optimizing an expensive objective function when a finite budget of total evaluations is prescribed. In that context, the optimal solution strategy for Bayesian optimization can be formulated as a dynamic programming instance. This results in a complex problem with uncountable, dimension-increasing state space and an uncountable control space. We show how to approximate the solution of this dynamic programming problem using rollout, and propose rollout heuristics specifically designed for the Bayesian optimization setting. We present numerical experiments showing that the resulting algorithm for optimization with a finite budget outperforms several popular Bayesian optimization algorithms.

Journal ArticleDOI
TL;DR: A model-free adaptive optimal tracking algorithm based on the framework of reinforcement learning and adaptive dynamic programming is proposed in this study which learns the optimal solution online in real time without any information of the system dynamics.
Abstract: The optimal tracking of non-linear systems without knowing system dynamics is an important and intractable problem. Based on the framework of reinforcement learning (RL) and adaptive dynamic programming, a model-free adaptive optimal tracking algorithm is proposed in this study. After constructing an augmented system with the tracking errors and the reference states, the tracking problem is converted to a regulation problem with respect to the new system. Several RL techniques are synthesised to form a novel algorithm which learns the optimal solution online in real time without any information of the system dynamics. Continuous adaptation laws are defined by the current observations and the past experience. The convergence is guaranteed by Lyapunov analysis. Two simulations on a linear and a non-linear systems demonstrate the performance of the proposed approach.

Journal ArticleDOI
TL;DR: A method based on Markov decision processes to optimally schedule energy storage devices in power distribution networks with renewable generation and other properties, such as energy storage placement and size, can be assessed and compared in optimized systems with different layouts.
Abstract: The paper presents a method based on Markov decision processes to optimally schedule energy storage devices in power distribution networks with renewable generation. The time series of renewable generation is modeled as a Markov chain which allows for the implementation of a stochastic dynamic programming algorithm. The output of this algorithm is an optimal scheduling policy for the storage device achieving the minimization of an objective function including cost of energy and network losses. Besides this, other properties, such as energy storage placement and size, can be assessed and compared in optimized systems with different layouts.

Journal ArticleDOI
TL;DR: Both adaptive dynamic programming and robust ADP algorithms are developed, along with rigorous stability and convergence analysis, to iteratively update the control policy on-line by using directly the data of the system state and input.
Abstract: In this technical note, the adaptive optimal control problem is investigated for a class of continuous-time stochastic systems subject to multiplicative noise. A novel non-model-based optimal control design methodology is employed to iteratively update the control policy on-line by using directly the data of the system state and input. Both adaptive dynamic programming (ADP) and robust ADP algorithms are developed, along with rigorous stability and convergence analysis. The effectiveness of the obtained methods is illustrated by an example arising from biological sensorimotor control.

Journal ArticleDOI
TL;DR: Two new formulations are provided for the two-dimensional variant of the hypervolume subset selection problem, including a (linear) integer programming formulation that can be solved by solving its linear programming relaxation and a k-link shortest path formulation on a special digraph with the Monge property that can been solved by dynamic programming in time.
Abstract: The hypervolume subset selection problem consists of finding a subset, with a given cardinality k, of a set of nondominated points that maximizes the hypervolume indicator. This problem arises in selection procedures of evolutionary algorithms for multiobjective optimization, for which practically efficient algorithms are required. In this article, two new formulations are provided for the two-dimensional variant of this problem. The first is a linear integer programming formulation that can be solved by solving its linear programming relaxation. The second formulation is a k-link shortest path formulation on a special digraph with the Monge property that can be solved by dynamic programming in time. This improves upon the result of in Bader 2009, and slightly improves upon the result of in Bringmann eti¾ al. 2014b, which was developed independently from this work using different techniques. Numerical results are shown for several values of n and k.

01 Jan 2016
TL;DR: This dynamic programming deterministic and stochastic models helps people to read a good book with a cup of coffee in the afternoon, instead they are facing with some malicious virus inside their laptop.
Abstract: Thank you very much for downloading dynamic programming deterministic and stochastic models. Maybe you have knowledge that, people have search hundreds times for their chosen novels like this dynamic programming deterministic and stochastic models, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious virus inside their laptop.

Journal ArticleDOI
TL;DR: In this article, a Branch-and-Cut-andPrice (B&P) approach was proposed to solve the TOP problem with a bounded bidirectional dynamic programming algorithm with decremental state space relaxation.
Abstract: The Team Orienteering Problem (TOP) is one of the most investigated problems in the family of vehicle routing problems with profits. In this paper, we propose a Branch-and-Price approach to find proven optimal solutions to TOP. The pricing sub-problem is solved by a bounded bidirectional dynamic programming algorithm with decremental state space relaxation featuring a two-phase dominance rule relaxation. The new method is able to close 17 previously unsolved benchmark instances. In addition, we propose a Branch-and-Cut-and-Price approach using subset-row inequalities and show the effectiveness of these cuts in solving TOP.

Journal ArticleDOI
15 Nov 2016-Energy
TL;DR: The proposed trip-oriented stochastic optimal energy management strategy for plug-in hybrid electric bus outperforms both the well-tuned equivalent consumption minimization strategy and the rule-based strategy in terms of fuel economy, and even proved to be close to the optimal result obtained by dynamic programming.

Journal ArticleDOI
TL;DR: This work considers integrated production and batch delivery scheduling in a make-to-order production system involving two competing agents, each of which having its own job set competes to process its jobs on a shared single machine.