scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 2011"


BookDOI
04 Aug 2011
TL;DR: This book discusses the challenges of dynamic programming, the three curses of dimensionality, and some experimental comparisons of stepsize formulas that led to the creation of ADP for online applications.
Abstract: Preface. Acknowledgments. 1. The challenges of dynamic programming. 1.1 A dynamic programming example: a shortest path problem. 1.2 The three curses of dimensionality. 1.3 Some real applications. 1.4 Problem classes. 1.5 The many dialects of dynamic programming. 1.6 What is new in this book? 1.7 Bibliographic notes. 2. Some illustrative models. 2.1 Deterministic problems. 2.2 Stochastic problems. 2.3 Information acquisition problems. 2.4 A simple modeling framework for dynamic programs. 2.5 Bibliographic notes. Problems. 3. Introduction to Markov decision processes. 3.1 The optimality equations. 3.2 Finite horizon problems. 3.3 Infinite horizon problems. 3.4 Value iteration. 3.5 Policy iteration. 3.6 Hybrid valuepolicy iteration. 3.7 The linear programming method for dynamic programs. 3.8 Monotone policies. 3.9 Why does it work? 3.10 Bibliographic notes. Problems 4. Introduction to approximate dynamic programming. 4.1 The three curses of dimensionality (revisited). 4.2 The basic idea. 4.3 Sampling random variables . 4.4 ADP using the postdecision state variable. 4.5 Lowdimensional representations of value functions. 4.6 So just what is approximate dynamic programming? 4.7 Experimental issues. 4.8 Dynamic programming with missing or incomplete models. 4.9 Relationship to reinforcement learning. 4.10 But does it work? 4.11 Bibliographic notes. Problems. 5. Modeling dynamic programs. 5.1 Notational style. 5.2 Modeling time. 5.3 Modeling resources. 5.4 The states of our system. 5.5 Modeling decisions. 5.6 The exogenous information process. 5.7 The transition function. 5.8 The contribution function. 5.9 The objective function. 5.10 A measuretheoretic view of information. 5.11 Bibliographic notes. Problems. 6. Stochastic approximation methods. 6.1 A stochastic gradient algorithm. 6.2 Some stepsize recipes. 6.3 Stochastic stepsizes. 6.4 Computing bias and variance. 6.5 Optimal stepsizes. 6.6 Some experimental comparisons of stepsize formulas. 6.7 Convergence. 6.8 Why does it work? 6.9 Bibliographic notes. Problems. 7. Approximating value functions. 7.1 Approximation using aggregation. 7.2 Approximation methods using regression models. 7.3 Recursive methods for regression models. 7.4 Neural networks. 7.5 Batch processes. 7.6 Why does it work? 7.7 Bibliographic notes. Problems. 8. ADP for finite horizon problems. 8.1 Strategies for finite horizon problems. 8.2 Qlearning. 8.3 Temporal difference learning. 8.4 Policy iteration. 8.5 Monte Carlo value and policy iteration. 8.6 The actorcritic paradigm. 8.7 Bias in value function estimation. 8.8 State sampling strategies. 8.9 Starting and stopping. 8.10 A taxonomy of approximate dynamic programming strategies. 8.11 Why does it work? 8.12 Bibliographic notes. Problems. 9. Infinite horizon problems. 9.1 From finite to infinite horizon. 9.2 Algorithmic strategies. 9.3 Stepsizes for infinite horizon problems. 9.4 Error measures. 9.5 Direct ADP for online applications. 9.6 Finite horizon models for steady state applications. 9.7 Why does it work? 9.8 Bibliographic notes. Problems. 10. Exploration vs. exploitation. 10.1 A learning exercise: the nomadic trucker. 10.2 Learning strategies. 10.3 A simple information acquisition problem. 10.4 Gittins indices and the information acquisition problem. 10.5 Variations. 10.6 The knowledge gradient algorithm. 10.7 Information acquisition in dynamic programming. 10.8 Bibliographic notes. Problems. 11. Value function approximations for special functions. 11.1 Value functions versus gradients. 11.2 Linear approximations. 11.3 Piecewise linear approximations. 11.4 The SHAPE algorithm. 11.5 Regression methods. 11.6 Cutting planes. 11.7 Why does it work? 11.8 Bibliographic notes. Problems. 12. Dynamic resource allocation. 12.1 An asset acquisition problem. 12.2 The blood management problem. 12.3 A portfolio optimization problem. 12.4 A general resource allocation problem. 12.5 A fleet management problem. 12.6 A driver management problem. 12.7 Bibliographic references. Problems. 13. Implementation challenges. 13.1 Will ADP work for your problem? 13.2 Designing an ADP algorithm for complex problems. 13.3 Debugging an ADP algorithm. 13.4 Convergence issues. 13.5 Modeling your problem. 13.6 Online vs. offline models. 13.7 If it works, patent it!

2,300 citations


Journal ArticleDOI
TL;DR: This paper shows that reformulating that step as a constrained flow optimization results in a convex problem and takes advantage of its particular structure to solve it using the k-shortest paths algorithm, which is very fast.
Abstract: Multi-object tracking can be achieved by detecting objects in individual frames and then linking detections across frames. Such an approach can be made very robust to the occasional detection failure: If an object is not detected in a frame but is in previous and following ones, a correct trajectory will nevertheless be produced. By contrast, a false-positive detection in a few frames will be ignored. However, when dealing with a multiple target problem, the linking step results in a difficult optimization problem in the space of all possible families of trajectories. This is usually dealt with by sampling or greedy search based on variants of Dynamic Programming which can easily miss the global optimum. In this paper, we show that reformulating that step as a constrained flow optimization results in a convex problem. We take advantage of its particular structure to solve it using the k-shortest paths algorithm, which is very fast. This new approach is far simpler formally and algorithmically than existing techniques and lets us demonstrate excellent performance in two very different contexts.

1,076 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: A near-optimal algorithm based on dynamic programming which runs in time linear in the number of objects andlinear in the sequence length is given which results in state-of-the-art performance.
Abstract: We analyze the computational problem of multi-object tracking in video sequences. We formulate the problem using a cost function that requires estimating the number of tracks, as well as their birth and death states. We show that the global solution can be obtained with a greedy algorithm that sequentially instantiates tracks using shortest path computations on a flow network. Greedy algorithms allow one to embed pre-processing steps, such as nonmax suppression, within the tracking algorithm. Furthermore, we give a near-optimal algorithm based on dynamic programming which runs in time linear in the number of objects and linear in the sequence length. Our algorithms are fast, simple, and scalable, allowing us to process dense input data. This results in state-of-the-art performance.

904 citations


Journal ArticleDOI
TL;DR: In static simulation for a power-split hybrid vehicle, the fuel economy of the vehicle using the control algorithm proposed in this brief is found to be very close-typically within 1%-to the fuel Economy through global optimal control that is based on dynamic programming (DP).
Abstract: A number of strategies for the power management of hybrid electric vehicles (HEVs) are proposed in the literature. A key challenge is to achieve near-optimality while keeping the methodology simple. The Pontryagin's minimum principle (PMP) is suggested as a viable real-time strategy. In this brief, the global optimality of the principle under reasonable assumptions is described from a mathematical viewpoint. Instantaneous optimal control with an appropriate equivalent parameter for battery usage is shown to be possibly a global optimal solution under the assumption that the internal resistance and open-circuit voltage of a battery are independent of the state-of-charge (SOC). This brief also demonstrates that the optimality of the equivalent consumption minimization strategy (ECMS) results from the close relation of ECMS to the optimal-control-theoretic concept of PMP. In static simulation for a power-split hybrid vehicle, the fuel economy of the vehicle using the control algorithm proposed in this brief is found to be very close-typically within 1%-to the fuel economy through global optimal control that is based on dynamic programming (DP).

768 citations


Journal ArticleDOI
TL;DR: A novel data-driven robust approximate optimal tracking control scheme is proposed for unknown general nonlinear systems by using the adaptive dynamic programming (ADP) method and a robustifying term is developed to compensate for the NN approximation errors introduced by implementing the ADP method.
Abstract: In this paper, a novel data-driven robust approximate optimal tracking control scheme is proposed for unknown general nonlinear systems by using the adaptive dynamic programming (ADP) method. In the design of the controller, only available input-output data is required instead of known system dynamics. A data-driven model is established by a recurrent neural network (NN) to reconstruct the unknown system dynamics using available input-output data. By adding a novel adjustable term related to the modeling error, the resultant modeling error is first guaranteed to converge to zero. Then, based on the obtained data-driven model, the ADP method is utilized to design the approximate optimal tracking controller, which consists of the steady-state controller and the optimal feedback controller. Further, a robustifying term is developed to compensate for the NN approximation errors introduced by implementing the ADP method. Based on Lyapunov approach, stability analysis of the closed-loop system is performed to show that the proposed controller guarantees the system state asymptotically tracking the desired trajectory. Additionally, the obtained control input is proven to be close to the optimal control input within a small bound. Finally, two numerical examples are used to demonstrate the effectiveness of the proposed control scheme.

530 citations


Journal ArticleDOI
TL;DR: This paper discusses statistical properties and convergence of the Stochastic Dual Dynamic Programming method applied to multistage linear stochastic programming problems, and argues that the computational complexity of the corresponding SDDP algorithm is almost the same as in the risk neutral case.

399 citations


Journal ArticleDOI
TL;DR: A new iterative adaptive dynamic programming (ADP) method is proposed to solve a class of continuous-time nonlinear two-person zero-sum differential games and the convergence property of the performance index function is proved.

365 citations


Journal ArticleDOI
TL;DR: In this paper, a dynamic programming algorithm for optimal one-dimensional clustering is proposed, which is implemented as an R package called Ckmeans.1d.dp.
Abstract: The heuristic k-means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp. We demonstrate its advantage in optimality and runtime over the standard iterative k-means algorithm.

328 citations


Journal ArticleDOI
TL;DR: This paper studies the finite-horizon optimal control problem for discrete-time nonlinear systems using the adaptive dynamic programming (ADP) approach and uses an iterative ADP algorithm to obtain the optimal control law.
Abstract: In this paper, we study the finite-horizon optimal control problem for discrete-time nonlinear systems using the adaptive dynamic programming (ADP) approach. The idea is to use an iterative ADP algorithm to obtain the optimal control law which makes the performance index function close to the greatest lower bound of all performance indices within an -error bound. The optimal number of control steps can also be obtained by the proposed ADP algorithms. A convergence analysis of the proposed ADP algorithms in terms of performance index function and control policy is made. In order to facilitate the implementation of the iterative ADP algorithms, neural networks are used for approximating the performance index function, computing the optimal control policy, and modeling the nonlinear system. Finally, two simulation examples are employed to illustrate the applicability of the proposed method.

276 citations


Journal ArticleDOI
TL;DR: A weak version of the dynamic programming principle is proved for standard stochastic control problems and mixed control-stopping problems, which avoids the technical difficulties related to the measurable selection argument.
Abstract: We prove a weak version of the dynamic programming principle for standard stochastic control problems and mixed control-stopping problems, which avoids the technical difficulties related to the measurable selection argument. In the Markov case, our result is tailor-made for the derivation of the dynamic programming equation in the sense of viscosity solutions.

242 citations


Journal ArticleDOI
TL;DR: In this paper, the authors study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions.
Abstract: Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.

Journal ArticleDOI
TL;DR: This paper discusses representative examples of how dynamic programming and graph algorithms have been applied to some classical vision problems, and focuses on the low-level vision problem of stereo, the mid-level problem of interactive object segmentation, and the high- level problem of model-based recognition.
Abstract: Optimization is a powerful paradigm for expressing and solving problems in a wide range of areas, and has been successfully applied to many vision problems. Discrete optimization techniques are especially interesting since, by carefully exploiting problem structure, they often provide nontrivial guarantees concerning solution quality. In this paper, we review dynamic programming and graph algorithms, and discuss representative examples of how these discrete optimization techniques have been applied to some classical vision problems. We focus on the low-level vision problem of stereo, the mid-level problem of interactive object segmentation, and the high-level problem of model-based recognition.

Journal ArticleDOI
TL;DR: A corridor method inspired algorithm for a blocks relocation problem in block stacking systems and computational results on medium- and large-size problem instances allow to draw conclusions about the effectiveness of the proposed scheme.
Abstract: In this paper, we present a corridor method inspired algorithm for a blocks relocation problem in block stacking systems Typical applications of such problem are found in the stacking of container terminals in a yard, of pallets and boxes in a warehouse, etc The proposed algorithm applies a recently proposed metaheuristic In a method-based neighborhood we define a two-dimensional "corridor" around the incumbent blocks configuration by imposing exogenous constraints on the solution space of the problem and apply a dynamic programming algorithm capturing the state of the system after each block movement for exploring the neighborhoods Computational results on medium- and large-size problem instances allow to draw conclusions about the effectiveness of the proposed scheme

Journal ArticleDOI
TL;DR: This paper considers several easy-to-compute heuristic trading strategies that are based on optimizing simpler models and complement these heuristics with upper bounds on the performance with an optimal trading strategy based on the dual approach developed in Brown et al.
Abstract: We consider the problem of dynamic portfolio optimization in a discrete-time, finite-horizon setting. Our general model considers risk aversion, portfolio constraints (e.g., no short positions), return predictability, and transaction costs. This problem is naturally formulated as a stochastic dynamic program. Unfortunately, with nonzero transaction costs, the dimension of the state space is at least as large as the number of assets, and the problem is very difficult to solve with more than one or two assets. In this paper, we consider several easy-to-compute heuristic trading strategies that are based on optimizing simpler models. We complement these heuristics with upper bounds on the performance with an optimal trading strategy. These bounds are based on the dual approach developed in Brown et al. (Brown, D. B., J. E. Smith, P. Sun. 2009. Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4) 785--801). In this context, these bounds are given by considering an investor who has access to perfect information about future returns but is penalized for using this advance information. These heuristic strategies and bounds can be evaluated using Monte Carlo simulation. We evaluate these heuristics and bounds in numerical experiments with a risk-free asset and 3 or 10 risky assets. In many cases, the performance of the heuristic strategy is very close to the upper bound, indicating that the heuristic strategies are very nearly optimal. This paper was accepted by Dimitris Bertsimas, optimization.

Book ChapterDOI
01 Jan 2011
TL;DR: The author explains how a so-called “approximation algorithm” can find a tour that is maybe not the shortest one but one whose length usually is quite close to the optimum.
Abstract: In this chapter the author considers how to work out the shortest round-trip through a number of cities, a hard problem for which we do not know how to find an optimal solution. The author first demonstrates why a “brute-force” approach is disastrous, and he then shows how dynamic programming offers a significant improvement in running time. Finally he explains how a so-called “approximation algorithm” can find a tour that is maybe not the shortest one but one whose length usually is quite close to the optimum.

Book
09 Aug 2011
TL;DR: This paper shows how branch-and-bound methods can be used to reduce storage and, possibly, computational requirements in discrete dynamic programs.
Abstract: This paper shows how branch-and-bound methods can be used to reduce storage and, possibly, computational requirements in discrete dynamic programs. Relaxations and fathoming criteria are used to identify and to eliminate states whose corresponding subpolicies could not lead to optimal policies. The general dynamic programming/branch-and-bound approach is applied to the traveling-salesman problem and the nonlinear knapsack problem. Our computational experience demonstrates that the hybrid approach yields dramatic savings in both computer storage and computational requirements.

Journal ArticleDOI
TL;DR: A novel heuristic dynamic programming (HDP) iteration algorithm is proposed to solve the optimal tracking control problem for a class of nonlinear discrete-time systems with time delays.
Abstract: In this paper, a novel heuristic dynamic programming (HDP) iteration algorithm is proposed to solve the optimal tracking control problem for a class of nonlinear discrete-time systems with time delays. The novel algorithm contains state updating, control policy iteration, and performance index iteration. To get the optimal states, the states are also updated. Furthermore, the “backward iteration” is applied to state updating. Two neural networks are used to approximate the performance index function and compute the optimal control policy for facilitating the implementation of HDP iteration algorithm. At last, we present two examples to demonstrate the effectiveness of the proposed HDP iteration algorithm.

Journal ArticleDOI
TL;DR: This paper proposes a discriminative semi-Markov model approach, and defines a set of features over boundary frames, segments, as well as neighboring segments that enable it to conveniently capture a combination of local and global features that best represent each specific action type.
Abstract: A challenging problem in human action understanding is to jointly segment and recognize human actions from an unseen video sequence, where one person performs a sequence of continuous actions. In this paper, we propose a discriminative semi-Markov model approach, and define a set of features over boundary frames, segments, as well as neighboring segments. This enable us to conveniently capture a combination of local and global features that best represent each specific action type. To efficiently solve the inference problem of simultaneous segmentation and recognition, a Viterbi-like dynamic programming algorithm is utilized, which in practice is able to process 20 frames per second. Moreover, the model is discriminatively learned from large margin principle, and is formulated as an optimization problem with exponentially many constraints. To solve it efficiently, we present two different optimization algorithms, namely cutting plane method and bundle method, and demonstrate that each can be alternatively deployed in a "plug and play" fashion. From its theoretical aspect, we also analyze the generalization error of the proposed approach and provide a PAC-Bayes bound. The proposed approach is evaluated on a variety of datasets, and is shown to perform competitively to the state-of-the-art methods. For example, on KTH dataset, it achieves 95.0% recognition accuracy, where the best known result on this dataset is 93.4% (Reddy and Shah in ICCV, 2009).

Book
01 Jan 2011
TL;DR: Dynamic Programming - An Outline Preliminary Analysis Markovian Decomposition Scheme Optimality Equation Dynamic Programming Problems The Final State Model Principle of Optimality Summary Solution Methods.
Abstract: Introduction Welcome to Dynamic Programming! How to Read This Book SCIENCE Fundamentals Introduction Meta-Recipe Revisited Problem Formulation Decomposition of the Solution Set Principle of Conditional Optimization Conditional Problems Optimality Equation Solution Procedure Time Out: Direct Enumeration! Equivalent Conditional Problems Modified Problems The Role of a Decomposition Scheme Dynamic Programming Problem - Revisited Trivial Decomposition Scheme Summary and a Look Ahead Multistage Decision Model Introduction A Prototype Multistage Decision Model Problem vs Problem Formulation Policies Markovian Policies Remarks on the Notation Summary Bibliographic Notes Dynamic Programming - An Outline Introduction Preliminary Analysis Markovian Decomposition Scheme Optimality Equation Dynamic Programming Problems The Final State Model Principle of Optimality Summary Solution Methods Introduction Additive Functional Equations Truncated Functional Equations Nontruncated Functional Equations Summary Successive Approximation Methods Introduction Motivation Preliminaries Functional Equations of Type One Functional Equations of Type Two Truncation Method Stationary Models Truncation and Successive Approximation Summary Bibliographic Notes Optimal Policies Introduction Preliminary Analysis Truncated Functional Equations Nontruncated Functional Equations Successive Approximation in the Policy Space Summary Bibliographic Notes The Curse of Dimensionality Introduction Motivation Discrete Problems Special Cases Complete Enumeration Conclusions The Rest Is Mathematics and Experience Introduction Choice of Model Dynamic Programming Models Forward Decomposition Models Practice What You Preach! Computational Schemes Applications Dynamic Programming Software Summary ART Refinements Introduction Weak-Markovian Condition Markovian Formulations Decomposition Schemes Sequential Decision Models Example Shortest Path Model The Art of Dynamic Programming Modeling Summary Bibliographic Notes The State Introduction Preliminary Analysis Mathematically Speaking Decomposition Revisited Infeasible States and Decisions State Aggregation Nodes as States Multistage vs Sequential Models Models vs Functional Equations Easy Problems Modeling Tips Concluding Remarks Summary Parametric Schemes Introduction Background and Motivation Fractional Programming Scheme C-Programming Scheme Lagrange Multiplier Scheme Summary Bibliographic Notes The Principle of Optimality Introduction Bellman's Principle of Optimality Prevailing Interpretation Variations on a Theme Criticism So What Is Amiss? The Final State Model Revisited Bellman's Treatment of Dynamic Programming Summary Post Script: Pontryagin's Maximum Principle Forward Decomposition Introduction Function Decomposition Initial Problem Separable Objective Functions Revisited Modified Problems Revisited Backward Conditional Problems Revisited Markovian Condition Revisited Forward Functional Equation Impact on the State Space Anomaly Pathologic Cases Summary and Conclusions Bibliographic Notes Push! Introduction The Pull Method The Push Method Monotone Accumulated Return Processes Dijkstra's Algorithm Summary Bibliographic Notes EPILOGUE What Then Is Dynamic Programming? Review Non-Optimization Problems An Abstract Dynamic Programming Model Examples The Towers of Hanoi Problem Optimization-Free Dynamic Programming Concluding Remarks Appendix A: Contraction Mapping Appendix B: Fractional Programming Appendix C: Composite Concave Programming Appendix D: The Principle of Optimality in Stochastic Processes Appendix E: The Corridor Method Bibliography Index

Proceedings ArticleDOI
05 Jun 2011
TL;DR: A modified FA approach combined with chaotic sequences (FAC) applied to reliability-redundancy optimization is introduced and was found to outperform the previously best-known solutions available.
Abstract: The reliability-redundancy allocation problem can be approached as a mixed-integer programming problem. It has been solved by using optimization techniques such as dynamic programming, integer programming, and mixed-integer nonlinear programming. On the other hand, a broad class of meta-heuristics has been developed for reliability-redundancy optimization. Recently, a new meta-heuristics called firefly algorithm (FA) algorithm has emerged. The FA is a stochastic metaheuristic approach based on the idealized behavior of the flashing characteristics of fireflies. In FA, the flashing light can be formulated in such a way that it is associated with the objective function to be optimized, which makes it possible to formulate the firefly algorithm. This paper introduces a modified FA approach combined with chaotic sequences (FAC) applied to reliability-redundancy optimization. In this context, an example of mixed integer programming in reliability-redundancy design of an overspeed protection system for a gas turbine is evaluated. In this application domain, FAC was found to outperform the previously best-known solutions available.

Journal ArticleDOI
TL;DR: In this article, two alternative sampling strategies, namely Latin hypercube sampling and randomized quasi-Monte Carlo, were proposed for the generation of scenario trees, as well as for the sampling of scenarios that is part of the SDDP algorithm.
Abstract: The long-term hydrothermal scheduling is one of the most important problems to be solved in the power systems area. This problem aims to obtain an optimal policy, under water (energy) resources uncertainty, for hydro and thermal plants over a multi-annual planning horizon. It is natural to model the problem as a multi-stage stochastic program, a class of models for which algorithms have been developed. The original stochastic process is represented by a finite scenario tree and, because of the large number of stages, a sampling-based method such as the Stochastic Dual Dynamic Programming (SDDP) algorithm is required. The purpose of this paper is two-fold. Firstly, we study the application of two alternative sampling strategies to the standard Monte Carlo—namely, Latin hypercube sampling and randomized quasi-Monte Carlo—for the generation of scenario trees, as well as for the sampling of scenarios that is part of the SDDP algorithm. Secondly, we discuss the formulation of stopping criteria for the optimization algorithm in terms of statistical hypothesis tests, which allows us to propose an alternative criterion that is more robust than that originally proposed for the SDDP. We test these ideas on a problem associated with the whole Brazilian power system, with a three-year planning horizon.

Journal ArticleDOI
TL;DR: In this paper, an approximate/adaptive dynamic programming (ADP) algorithm was proposed to determine the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.
Abstract: This paper will present an approximate/adaptive dynamic programming (ADP) algorithm, that uses the idea of integral reinforcement learning (IRL), to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost. The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation (CT-GARE), which underlies the game problem. We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics. The feasibility of the ADP scheme is demonstrated in simulation for a power system control application. The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.

Journal ArticleDOI
01 Feb 2011
TL;DR: The honey-bee mating optimisation algorithm, which is based on the mating procedure of honey-bees in nature, is presented and tested with three benchmark multireservoir operation problems in both discrete and continuous domains and it is shown that the performance of the model compares well with results of the well-developed genetic algorithm.
Abstract: In this paper, the honey-bee mating optimisation (HBMO) algorithm, which is based on the mating procedure of honey-bees in nature, is presented and tested with three benchmark multireservoir operation problems in both discrete and continuous domains. To test the applicability of the algorithm, results are compared with those from different analytical and evolutionary algorithms (linear programming, dynamic programming, differential dynamic programming, discrete differential dynamic programming and genetic algorithm). The first example is a multireservoir operation optimisation problem in a discrete domain with discrete decision and state variables. It is shown that the performance of the model compares well with results of the well-developed genetic algorithm. The second example is a four-reservoir problem in a continuous domain that has recently been approached with different evolutionary algorithms. The third example is a ten-reservoir problem in series and parallel. The best solution obtained is quite ...

Proceedings ArticleDOI
12 Nov 2011
TL;DR: An analysis of checkpointing strategies for minimizing expected job execution times in an environment that is subject to processor failures gives the optimal solution for exponentially distributed failure inter-arrival times, which is the first rigorous proof that periodic checkpointing is optimal.
Abstract: This work provides an analysis of checkpointing strategies for minimizing expected job execution times in an environment that is subject to processor failures. In the case of both sequential and parallel jobs, we give the optimal solution for exponentially distributed failure inter-arrival times, which, to the best of our knowledge, is the first rigorous proof that periodic checkpointing is optimal. For non-ex-ponentially distributed failures, we develop a dynamic programming algorithm to maximize the amount of work completed before the next failure, which provides a good heuristic for minimizing the expected execution time. Our work considers various models of job parallelism and of parallel checkpointing overhead. We first perform extensive simulation experiments assuming that failures follow Exponential or Weibull distributions, the latter being more representative of real-world systems. The obtained results not only corroborate our theoretical findings, but also show that our dynamic programming algorithm significantly outperforms previously proposed solutions in the case of Weibull failures. We then discuss results from simulation experiments that use failure logs from production clusters. These results confirm that our dynamic programming algorithm significantly outperforms existing solutions for real-world clusters.

Journal ArticleDOI
TL;DR: The goal of this article is to provide the theoretical basis for enabling tractable solutions to the "arriving on time" problem and enabling its use in real-time mobile phone applications and to present an efficient algorithm for finding an optimal routing policy with a well bounded computational complexity.
Abstract: The goal of this article is to provide the theoretical basis for enabling tractable solutions to the "arriving on time" problem and enabling its use in real-time mobile phone applications. Optimal routing in transportation networks with highly varying traffic conditions is a challenging problem due to the stochastic nature of travel-times on links of the network. The definition of optimality criteria and the design of solution methods must account for the random nature of the travel-time on each link. Most common routing algorithms consider the expected value of link travel-time as a sufficient statistic for the problem and produce least expected travel-time paths without consideration of travel-time variability. However, in numerous practical settings the reliability of the route is also an important decision factor. In this article, the authors consider the following optimality criterion: maximizing the probability of arriving on time at a destination given a departure time and a time budget. The authors present an efficient algorithm for finding an optimal routing policy with a well bounded computational complexity, improving on an existing solution that takes an unbounded number of iterations to converge to the optimal solution. A routing policy is an adaptive algorithm that determines the optimal solution based on en route travel-times and therefore provides better reliability guarantees than an a-priori solution. Novel speed-up techniques to efficiently compute the adaptive optimal strategy and methods to prune the search space of the problem are also investigated. Finally, an extension of this algorithm which allows for both time varying traffic conditions and spatio-temporal correlations of link travel-time distributions is presented. The dramatic runtime improvements provided by the algorithm are demonstrated for practical scenarios in California.

Journal ArticleDOI
TL;DR: This paper investigates a generalized multiple-input-multiple-output (GMIMO) ADP design for online learning and control, which is more applicable to a wide range of practical real-world applications and test the performance of this approach based on a practical complex system.
Abstract: Adaptive dynamic programming (ADP) is a promising research field for design of intelligent controllers, which can both learn on-the-fly and exhibit optimal behavior. Over the past decades, several generations of ADP design have been proposed in the literature, which have demonstrated many successful applications in various benchmarks and industrial applications. While many of the existing researches focus on multiple-inputs-single-output system with steepest descent search, in this paper we investigate a generalized multiple-input-multiple-output (GMIMO) ADP design for online learning and control, which is more applicable to a wide range of practical real-world applications. Furthermore, an improved weight-updating algorithm based on recursive Levenberg-Marquardt methods is presented and embodied in the GMIMO approach to improve its performance. Finally, we test the performance of this approach based on a practical complex system, namely, the learning and control of the tension and height of the looper system in a hot strip mill. Experimental results demonstrate that the proposed approach can achieve effective and robust performance.

Reference EntryDOI
14 Jan 2011
TL;DR: This article provides a flexible modeling framework that uses a classic control-theoretic framework, avoiding devices such as one-step transition matrices, and describes the five fundamental elements of any stochastic, dynamic program.
Abstract: The first step in solving a stochastic optimization problem is providing a mathematical model. How the problem is modeled can impact the solution strategy. In this article, we provide a flexible modeling framework that uses a classic control-theoretic framework, avoiding devices such as one-step transition matrices. We describe the five fundamental elements of any stochastic, dynamic program. Different notational conventions are introduced, and the types of policies that can be used to guide decisions are described in detail. This discussion puts approximate dynamic programming in the context of a variety of other algorithmic strategies by using the modeling framework to describe a wide range of policies. A brief discussion of model-free programming is also provided. Keywords: approximate dynamic programming; Markov decision process; state variable; transition function; model-free dynamic programming

Dissertation
01 Jan 2011
TL;DR: A new nogood learning technique based on constraint projection that allows us to exploit subproblem dominances that arise when two different search paths lead to subproblems which are identical on the remaining unlabeled variables.
Abstract: Combinatorial Optimization is an important area of computer science that has many theoretical and practical applications. In this thesis, we present important contributions to several different areas of combinatorial optimization, including nogood learning, symmetry breaking, dominance, relaxations and parallelization. We develop a new nogood learning technique based on constraint projection that allows us to exploit subproblem dominances that arise when two different search paths lead to subproblems which are identical on the remaining unlabeled variables. On appropriate problems, this nogood learning technique provides orders of magnitude speedup compared to a base solver which does not learn nogoods. We present a new symmetry breaking technique called SBDS-1UIP, which is an extension of Symmetry Breaking During Search (SBDS). SBDS-1UIP uses symmetric versions of the 1UIP nogoods derived by Lazy Clause Generation solvers to prune symmetric parts of the search space. We show that SBDS-1UIP can exploit at least as many symmetries as SBDS, and that it is strictly more powerful on some problems, allowing us to exploit types of symmetries that no previous general symmetry breaking technique is capable of exploiting. We present two new general methods for exploiting almost symmetries (symmetries which are broken by a small number of constraints). The first is to treat almost symmetries as conditional symmetries and exploit them via conditional symmetry breaking constraints. The second is to modify SDBS-1UIP to handle almost symmetries. Both techniques are capable of producing exponential speedups on appropriate problems. We examine three reasonably well known problems: the Minimization of Open

01 Jan 2011
TL;DR: A classification of the methods that have been proposed to design an Adaptive- ECMS (A-ECMS) controller is proposed and a comparative analysis in simulation of three adaptation laws falling into the class of algorithms of adaptation through feedback of SOC is carried out.
Abstract: The problem of adapting the equivalence factor of the Equivalent Consumption Mini- mization Strategy (ECMS) to achieve a real time implementable sub-optimal solution of the problem of energy management in hybrid electric vehicle (HEV) has been the object of extensive research over the last decade. Contributions in the open literature range from methods based on prediction of driving cycle to driving pattern recognition to feedback from state of charge. In this paper, we first propose a classification of the methods that have been proposed to design an Adaptive-ECMS (A-ECMS) controller and then we carry out a comparative analysis in simulation of three adaptation laws falling into the class of algorithms of adaptation through feedback of SOC. Simulation results are performed on a parallel hybrid vehicle and show the performances of the three adaptation laws as compared to the optimal ECMS (a suitable proxi for the global optimal solution given by the dynamic programming algorithm).

Journal ArticleDOI
TL;DR: In this article, a scenario-based dynamic programming model consisting of a number of integer linear programming formulations for each single planning period is proposed, and the model can be solved efficiently by a shortest path algorithm on an acyclic network.
Abstract: This paper proposes a more realistic multi-period liner ship fleet planning problem for a liner container shipping company than has been studied in previous literature. The proposed problem is formulated as a scenario-based dynamic programming model consisting of a number of integer linear programming formulations for each single planning period, and the model can be solved efficiently by a shortest path algorithm on an acyclic network. A numerical example is carried out to illustrate the applicability of the proposed model and solution method. The numerical results show that chartering in ships may not always be a better policy for a long-term planning horizon though it is much cheaper than buying ships in the short-term. Purchasing ships seems to be a more profitable investment in the long run.