Showing papers on "Dynamic programming published in 2011"

PDF

Open Access

Book•DOI•

Approximate dynamic programming : solving the curses of dimensionality

[...]

04 Aug 2011

TL;DR: This book discusses the challenges of dynamic programming, the three curses of dimensionality, and some experimental comparisons of stepsize formulas that led to the creation of ADP for online applications.

...read moreread less

Abstract: Preface. Acknowledgments. 1. The challenges of dynamic programming. 1.1 A dynamic programming example: a shortest path problem. 1.2 The three curses of dimensionality. 1.3 Some real applications. 1.4 Problem classes. 1.5 The many dialects of dynamic programming. 1.6 What is new in this book? 1.7 Bibliographic notes. 2. Some illustrative models. 2.1 Deterministic problems. 2.2 Stochastic problems. 2.3 Information acquisition problems. 2.4 A simple modeling framework for dynamic programs. 2.5 Bibliographic notes. Problems. 3. Introduction to Markov decision processes. 3.1 The optimality equations. 3.2 Finite horizon problems. 3.3 Infinite horizon problems. 3.4 Value iteration. 3.5 Policy iteration. 3.6 Hybrid valuepolicy iteration. 3.7 The linear programming method for dynamic programs. 3.8 Monotone policies. 3.9 Why does it work? 3.10 Bibliographic notes. Problems 4. Introduction to approximate dynamic programming. 4.1 The three curses of dimensionality (revisited). 4.2 The basic idea. 4.3 Sampling random variables . 4.4 ADP using the postdecision state variable. 4.5 Lowdimensional representations of value functions. 4.6 So just what is approximate dynamic programming? 4.7 Experimental issues. 4.8 Dynamic programming with missing or incomplete models. 4.9 Relationship to reinforcement learning. 4.10 But does it work? 4.11 Bibliographic notes. Problems. 5. Modeling dynamic programs. 5.1 Notational style. 5.2 Modeling time. 5.3 Modeling resources. 5.4 The states of our system. 5.5 Modeling decisions. 5.6 The exogenous information process. 5.7 The transition function. 5.8 The contribution function. 5.9 The objective function. 5.10 A measuretheoretic view of information. 5.11 Bibliographic notes. Problems. 6. Stochastic approximation methods. 6.1 A stochastic gradient algorithm. 6.2 Some stepsize recipes. 6.3 Stochastic stepsizes. 6.4 Computing bias and variance. 6.5 Optimal stepsizes. 6.6 Some experimental comparisons of stepsize formulas. 6.7 Convergence. 6.8 Why does it work? 6.9 Bibliographic notes. Problems. 7. Approximating value functions. 7.1 Approximation using aggregation. 7.2 Approximation methods using regression models. 7.3 Recursive methods for regression models. 7.4 Neural networks. 7.5 Batch processes. 7.6 Why does it work? 7.7 Bibliographic notes. Problems. 8. ADP for finite horizon problems. 8.1 Strategies for finite horizon problems. 8.2 Qlearning. 8.3 Temporal difference learning. 8.4 Policy iteration. 8.5 Monte Carlo value and policy iteration. 8.6 The actorcritic paradigm. 8.7 Bias in value function estimation. 8.8 State sampling strategies. 8.9 Starting and stopping. 8.10 A taxonomy of approximate dynamic programming strategies. 8.11 Why does it work? 8.12 Bibliographic notes. Problems. 9. Infinite horizon problems. 9.1 From finite to infinite horizon. 9.2 Algorithmic strategies. 9.3 Stepsizes for infinite horizon problems. 9.4 Error measures. 9.5 Direct ADP for online applications. 9.6 Finite horizon models for steady state applications. 9.7 Why does it work? 9.8 Bibliographic notes. Problems. 10. Exploration vs. exploitation. 10.1 A learning exercise: the nomadic trucker. 10.2 Learning strategies. 10.3 A simple information acquisition problem. 10.4 Gittins indices and the information acquisition problem. 10.5 Variations. 10.6 The knowledge gradient algorithm. 10.7 Information acquisition in dynamic programming. 10.8 Bibliographic notes. Problems. 11. Value function approximations for special functions. 11.1 Value functions versus gradients. 11.2 Linear approximations. 11.3 Piecewise linear approximations. 11.4 The SHAPE algorithm. 11.5 Regression methods. 11.6 Cutting planes. 11.7 Why does it work? 11.8 Bibliographic notes. Problems. 12. Dynamic resource allocation. 12.1 An asset acquisition problem. 12.2 The blood management problem. 12.3 A portfolio optimization problem. 12.4 A general resource allocation problem. 12.5 A fleet management problem. 12.6 A driver management problem. 12.7 Bibliographic references. Problems. 13. Implementation challenges. 13.1 Will ADP work for your problem? 13.2 Designing an ADP algorithm for complex problems. 13.3 Debugging an ADP algorithm. 13.4 Convergence issues. 13.5 Modeling your problem. 13.6 Online vs. offline models. 13.7 If it works, patent it!

...read moreread less

2,300 citations

Journal Article•DOI•

Multiple Object Tracking Using K-Shortest Paths Optimization

[...]

Jérôme Berclaz¹, François Fleuret², Engin Türetken¹, Pascal Fua¹•Institutions (2)

École Normale Supérieure¹, Idiap Research Institute²

01 Sep 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper shows that reformulating that step as a constrained flow optimization results in a convex problem and takes advantage of its particular structure to solve it using the k-shortest paths algorithm, which is very fast.

...read moreread less

Abstract: Multi-object tracking can be achieved by detecting objects in individual frames and then linking detections across frames. Such an approach can be made very robust to the occasional detection failure: If an object is not detected in a frame but is in previous and following ones, a correct trajectory will nevertheless be produced. By contrast, a false-positive detection in a few frames will be ignored. However, when dealing with a multiple target problem, the linking step results in a difficult optimization problem in the space of all possible families of trajectories. This is usually dealt with by sampling or greedy search based on variants of Dynamic Programming which can easily miss the global optimum. In this paper, we show that reformulating that step as a constrained flow optimization results in a convex problem. We take advantage of its particular structure to solve it using the k-shortest paths algorithm, which is very fast. This new approach is far simpler formally and algorithmically than existing techniques and lets us demonstrate excellent performance in two very different contexts.

...read moreread less

1,076 citations

Proceedings Article•DOI•

Globally-optimal greedy algorithms for tracking a variable number of objects

[...]

Hamed Pirsiavash¹, Deva Ramanan¹, Charless C. Fowlkes¹•Institutions (1)

University of California, Irvine¹

20 Jun 2011

TL;DR: A near-optimal algorithm based on dynamic programming which runs in time linear in the number of objects andlinear in the sequence length is given which results in state-of-the-art performance.

...read moreread less

Abstract: We analyze the computational problem of multi-object tracking in video sequences. We formulate the problem using a cost function that requires estimating the number of tracks, as well as their birth and death states. We show that the global solution can be obtained with a greedy algorithm that sequentially instantiates tracks using shortest path computations on a flow network. Greedy algorithms allow one to embed pre-processing steps, such as nonmax suppression, within the tracking algorithm. Furthermore, we give a near-optimal algorithm based on dynamic programming which runs in time linear in the number of objects and linear in the sequence length. Our algorithms are fast, simple, and scalable, allowing us to process dense input data. This results in state-of-the-art performance.

...read moreread less

904 citations

Journal Article•DOI•

Optimal Control of Hybrid Electric Vehicles Based on Pontryagin's Minimum Principle

[...]

Namwook Kim¹, Suk Won Cha¹, Huei Peng²•Institutions (2)

Seoul National University¹, University of Michigan²

01 Sep 2011-IEEE Transactions on Control Systems and Technology

TL;DR: In static simulation for a power-split hybrid vehicle, the fuel economy of the vehicle using the control algorithm proposed in this brief is found to be very close-typically within 1%-to the fuel Economy through global optimal control that is based on dynamic programming (DP).

...read moreread less

Abstract: A number of strategies for the power management of hybrid electric vehicles (HEVs) are proposed in the literature. A key challenge is to achieve near-optimality while keeping the methodology simple. The Pontryagin's minimum principle (PMP) is suggested as a viable real-time strategy. In this brief, the global optimality of the principle under reasonable assumptions is described from a mathematical viewpoint. Instantaneous optimal control with an appropriate equivalent parameter for battery usage is shown to be possibly a global optimal solution under the assumption that the internal resistance and open-circuit voltage of a battery are independent of the state-of-charge (SOC). This brief also demonstrates that the optimality of the equivalent consumption minimization strategy (ECMS) results from the close relation of ECMS to the optimal-control-theoretic concept of PMP. In static simulation for a power-split hybrid vehicle, the fuel economy of the vehicle using the control algorithm proposed in this brief is found to be very close-typically within 1%-to the fuel economy through global optimal control that is based on dynamic programming (DP).

...read moreread less

768 citations

Journal Article•DOI•

Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method

[...]

Huaguang Zhang¹, Lili Cui¹, Xin Zhang¹, Yanhong Luo¹•Institutions (1)

Northeastern University (China)¹

01 Dec 2011-IEEE Transactions on Neural Networks

TL;DR: A novel data-driven robust approximate optimal tracking control scheme is proposed for unknown general nonlinear systems by using the adaptive dynamic programming (ADP) method and a robustifying term is developed to compensate for the NN approximation errors introduced by implementing the ADP method.

...read moreread less

Abstract: In this paper, a novel data-driven robust approximate optimal tracking control scheme is proposed for unknown general nonlinear systems by using the adaptive dynamic programming (ADP) method. In the design of the controller, only available input-output data is required instead of known system dynamics. A data-driven model is established by a recurrent neural network (NN) to reconstruct the unknown system dynamics using available input-output data. By adding a novel adjustable term related to the modeling error, the resultant modeling error is first guaranteed to converge to zero. Then, based on the obtained data-driven model, the ADP method is utilized to design the approximate optimal tracking controller, which consists of the steady-state controller and the optimal feedback controller. Further, a robustifying term is developed to compensate for the NN approximation errors introduced by implementing the ADP method. Based on Lyapunov approach, stability analysis of the closed-loop system is performed to show that the proposed controller guarantees the system state asymptotically tracking the desired trajectory. Additionally, the obtained control input is proven to be close to the optimal control input within a small bound. Finally, two numerical examples are used to demonstrate the effectiveness of the proposed control scheme.

...read moreread less

530 citations

Journal Article•DOI•

Analysis of stochastic dual dynamic programming method

[...]

Alexander Shapiro¹•Institutions (1)

Georgia Institute of Technology¹

16 Feb 2011-European Journal of Operational Research

TL;DR: This paper discusses statistical properties and convergence of the Stochastic Dual Dynamic Programming method applied to multistage linear stochastic programming problems, and argues that the computational complexity of the corresponding SDDP algorithm is almost the same as in the risk neutral case.

...read moreread less

399 citations

Journal Article•DOI•

An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games

[...]

Huaguang Zhang¹, Qinglai Wei², Derong Liu²•Institutions (2)

Northeastern University (China)¹, Chinese Academy of Sciences²

01 Jan 2011-Automatica

TL;DR: A new iterative adaptive dynamic programming (ADP) method is proposed to solve a class of continuous-time nonlinear two-person zero-sum differential games and the convergence property of the performance index function is proved.

...read moreread less

365 citations

Journal Article•DOI•

Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming

[...]

Haizhou Wang¹, Mingzhou Song¹•Institutions (1)

New Mexico State University¹

01 Dec 2011-R Journal

TL;DR: In this paper, a dynamic programming algorithm for optimal one-dimensional clustering is proposed, which is implemented as an R package called Ckmeans.1d.dp.

...read moreread less

Abstract: The heuristic k-means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp. We demonstrate its advantage in optimality and runtime over the standard iterative k-means algorithm.

...read moreread less

328 citations

Journal Article•DOI•

Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$ -Error Bound

[...]

Fei-Yue Wang¹, Ning Jin², Derong Liu², Qinglai Wei¹•Institutions (2)

Chinese Academy of Sciences¹, University of Illinois at Chicago²

01 Jan 2011-IEEE Transactions on Neural Networks

TL;DR: This paper studies the finite-horizon optimal control problem for discrete-time nonlinear systems using the adaptive dynamic programming (ADP) approach and uses an iterative ADP algorithm to obtain the optimal control law.

...read moreread less

Abstract: In this paper, we study the finite-horizon optimal control problem for discrete-time nonlinear systems using the adaptive dynamic programming (ADP) approach. The idea is to use an iterative ADP algorithm to obtain the optimal control law which makes the performance index function close to the greatest lower bound of all performance indices within an -error bound. The optimal number of control steps can also be obtained by the proposed ADP algorithms. A convergence analysis of the proposed ADP algorithms in terms of performance index function and control policy is made. In order to facilitate the implementation of the iterative ADP algorithms, neural networks are used for approximating the performance index function, computing the optimal control policy, and modeling the nonlinear system. Finally, two simulation examples are employed to illustrate the applicability of the proposed method.

...read moreread less

276 citations

Journal Article•DOI•

Weak Dynamic Programming Principle for Viscosity Solutions

[...]

Bruno Bouchard¹, Nizar Touzi²•Institutions (2)

CEREMADE¹, Chicago Metropolitan Agency for Planning²

01 May 2011-Siam Journal on Control and Optimization

TL;DR: A weak version of the dynamic programming principle is proved for standard stochastic control problems and mixed control-stopping problems, which avoids the technical difficulties related to the measurable selection argument.

...read moreread less

Abstract: We prove a weak version of the dynamic programming principle for standard stochastic control problems and mixed control-stopping problems, which avoids the technical difficulties related to the measurable selection argument. In the Markov case, our result is tailor-made for the derivation of the dynamic programming equation in the sense of viscosity solutions.

...read moreread less

242 citations

Journal Article•DOI•

Optimal and Approximate Q-value Functions for Decentralized POMDPs

[...]

Frans A. Oliehoek¹, Matthijs T. J. Spaan², Nikos Vlassis³•Institutions (3)

University of Amsterdam¹, Instituto Superior Técnico², Technical University of Crete³

31 Oct 2011-arXiv: Artificial Intelligence

TL;DR: In this paper, the authors study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions.

...read moreread less

Abstract: Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.

...read moreread less

Journal Article•DOI•

Dynamic Programming and Graph Algorithms in Computer Vision

[...]

Pedro F. Felzenszwalb¹, Ramin Zabih²•Institutions (2)

University of Chicago¹, Cornell University²

01 Apr 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper discusses representative examples of how dynamic programming and graph algorithms have been applied to some classical vision problems, and focuses on the low-level vision problem of stereo, the mid-level problem of interactive object segmentation, and the high- level problem of model-based recognition.

...read moreread less

Abstract: Optimization is a powerful paradigm for expressing and solving problems in a wide range of areas, and has been successfully applied to many vision problems. Discrete optimization techniques are especially interesting since, by carefully exploiting problem structure, they often provide nontrivial guarantees concerning solution quality. In this paper, we review dynamic programming and graph algorithms, and discuss representative examples of how these discrete optimization techniques have been applied to some classical vision problems. We focus on the low-level vision problem of stereo, the mid-level problem of interactive object segmentation, and the high-level problem of model-based recognition.

...read moreread less

Journal Article•DOI•

Applying the corridor method to a blocks relocation problem

[...]

Marco Caserta¹, Stefan Voβ¹, Moshe Sniedovich²•Institutions (2)

University of Hamburg¹, University of Melbourne²

01 Oct 2011-OR Spectrum

TL;DR: A corridor method inspired algorithm for a blocks relocation problem in block stacking systems and computational results on medium- and large-size problem instances allow to draw conclusions about the effectiveness of the proposed scheme.

...read moreread less

Abstract: In this paper, we present a corridor method inspired algorithm for a blocks relocation problem in block stacking systems Typical applications of such problem are found in the stacking of container terminals in a yard, of pallets and boxes in a warehouse, etc The proposed algorithm applies a recently proposed metaheuristic In a method-based neighborhood we define a two-dimensional "corridor" around the incumbent blocks configuration by imposing exogenous constraints on the solution space of the problem and apply a dynamic programming algorithm capturing the state of the system after each block movement for exploring the neighborhoods Computational results on medium- and large-size problem instances allow to draw conclusions about the effectiveness of the proposed scheme

...read moreread less

Journal Article•DOI•

Dynamic Portfolio Optimization with Transaction Costs: Heuristics and Dual Bounds

[...]

David B. Brown¹, James E. Smith¹•Institutions (1)

Duke University¹

01 Oct 2011-Management Science

TL;DR: This paper considers several easy-to-compute heuristic trading strategies that are based on optimizing simpler models and complement these heuristics with upper bounds on the performance with an optimal trading strategy based on the dual approach developed in Brown et al.

...read moreread less

Abstract: We consider the problem of dynamic portfolio optimization in a discrete-time, finite-horizon setting. Our general model considers risk aversion, portfolio constraints (e.g., no short positions), return predictability, and transaction costs. This problem is naturally formulated as a stochastic dynamic program. Unfortunately, with nonzero transaction costs, the dimension of the state space is at least as large as the number of assets, and the problem is very difficult to solve with more than one or two assets. In this paper, we consider several easy-to-compute heuristic trading strategies that are based on optimizing simpler models. We complement these heuristics with upper bounds on the performance with an optimal trading strategy. These bounds are based on the dual approach developed in Brown et al. (Brown, D. B., J. E. Smith, P. Sun. 2009. Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4) 785--801). In this context, these bounds are given by considering an investor who has access to perfect information about future returns but is penalized for using this advance information. These heuristic strategies and bounds can be evaluated using Monte Carlo simulation. We evaluate these heuristics and bounds in numerical experiments with a risk-free asset and 3 or 10 risky assets. In many cases, the performance of the heuristic strategy is very close to the upper bound, indicating that the heuristic strategies are very nearly optimal. This paper was accepted by Dimitris Bertsimas, optimization.

...read moreread less

Book Chapter•DOI•

The Travelling Salesman Problem

[...]

Stefan Näher¹•Institutions (1)

University of Trier¹

01 Jan 2011

TL;DR: The author explains how a so-called “approximation algorithm” can find a tour that is maybe not the shortest one but one whose length usually is quite close to the optimum.

...read moreread less

Abstract: In this chapter the author considers how to work out the shortest round-trip through a number of cities, a hard problem for which we do not know how to find an optimal solution. The author first demonstrates why a “brute-force” approach is disastrous, and he then shows how dynamic programming offers a significant improvement in running time. Finally he explains how a so-called “approximation algorithm” can find a tour that is maybe not the shortest one but one whose length usually is quite close to the optimum.

...read moreread less

Book•

Branch-and-bound strategies for dynamic programming

[...]

Thomas L. Morin¹, Roy E. Marsten²•Institutions (2)

Purdue University¹, Massachusetts Institute of Technology²

09 Aug 2011

TL;DR: This paper shows how branch-and-bound methods can be used to reduce storage and, possibly, computational requirements in discrete dynamic programs.

...read moreread less

Abstract: This paper shows how branch-and-bound methods can be used to reduce storage and, possibly, computational requirements in discrete dynamic programs. Relaxations and fathoming criteria are used to identify and to eliminate states whose corresponding subpolicies could not lead to optimal policies. The general dynamic programming/branch-and-bound approach is applied to the traveling-salesman problem and the nonlinear knapsack problem. Our computational experience demonstrates that the hybrid approach yields dramatic savings in both computer storage and computational requirements.

...read moreread less

Journal Article•DOI•

Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming

[...]

Huaguang Zhang¹, Ruizhuo Song¹, Qinglai Wei, Tieyan Zhang²•Institutions (2)

Northeastern University (China)¹, Shenyang Institute of Engineering²

01 Dec 2011-IEEE Transactions on Neural Networks

TL;DR: A novel heuristic dynamic programming (HDP) iteration algorithm is proposed to solve the optimal tracking control problem for a class of nonlinear discrete-time systems with time delays.

...read moreread less

Abstract: In this paper, a novel heuristic dynamic programming (HDP) iteration algorithm is proposed to solve the optimal tracking control problem for a class of nonlinear discrete-time systems with time delays. The novel algorithm contains state updating, control policy iteration, and performance index iteration. To get the optimal states, the states are also updated. Furthermore, the “backward iteration” is applied to state updating. Two neural networks are used to approximate the performance index function and compute the optimal control policy for facilitating the implementation of HDP iteration algorithm. At last, we present two examples to demonstrate the effectiveness of the proposed HDP iteration algorithm.

...read moreread less

Journal Article•DOI•

Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models

[...]

Qinfeng Shi¹, Li Cheng², Li Wang³, Alexander J. Smola⁴•Institutions (4)

University of Adelaide¹, Agency for Science, Technology and Research², Nanjing Forestry University³, Yahoo!⁴

01 May 2011-International Journal of Computer Vision

TL;DR: This paper proposes a discriminative semi-Markov model approach, and defines a set of features over boundary frames, segments, as well as neighboring segments that enable it to conveniently capture a combination of local and global features that best represent each specific action type.

...read moreread less

Abstract: A challenging problem in human action understanding is to jointly segment and recognize human actions from an unseen video sequence, where one person performs a sequence of continuous actions. In this paper, we propose a discriminative semi-Markov model approach, and define a set of features over boundary frames, segments, as well as neighboring segments. This enable us to conveniently capture a combination of local and global features that best represent each specific action type. To efficiently solve the inference problem of simultaneous segmentation and recognition, a Viterbi-like dynamic programming algorithm is utilized, which in practice is able to process 20 frames per second. Moreover, the model is discriminatively learned from large margin principle, and is formulated as an optimization problem with exponentially many constraints. To solve it efficiently, we present two different optimization algorithms, namely cutting plane method and bundle method, and demonstrate that each can be alternatively deployed in a "plug and play" fashion. From its theoretical aspect, we also analyze the generalization error of the proposed approach and provide a PAC-Bayes bound. The proposed approach is evaluated on a variety of datasets, and is shown to perform competitively to the state-of-the-art methods. For example, on KTH dataset, it achieves 95.0% recognition accuracy, where the best known result on this dataset is 93.4% (Reddy and Shah in ICCV, 2009).

...read moreread less

Book•

Dynamic programming : foundations and principles

[...]

Moshe Sniedovich

01 Jan 2011

TL;DR: Dynamic Programming - An Outline Preliminary Analysis Markovian Decomposition Scheme Optimality Equation Dynamic Programming Problems The Final State Model Principle of Optimality Summary Solution Methods.

...read moreread less

Abstract: Introduction Welcome to Dynamic Programming! How to Read This Book SCIENCE Fundamentals Introduction Meta-Recipe Revisited Problem Formulation Decomposition of the Solution Set Principle of Conditional Optimization Conditional Problems Optimality Equation Solution Procedure Time Out: Direct Enumeration! Equivalent Conditional Problems Modified Problems The Role of a Decomposition Scheme Dynamic Programming Problem - Revisited Trivial Decomposition Scheme Summary and a Look Ahead Multistage Decision Model Introduction A Prototype Multistage Decision Model Problem vs Problem Formulation Policies Markovian Policies Remarks on the Notation Summary Bibliographic Notes Dynamic Programming - An Outline Introduction Preliminary Analysis Markovian Decomposition Scheme Optimality Equation Dynamic Programming Problems The Final State Model Principle of Optimality Summary Solution Methods Introduction Additive Functional Equations Truncated Functional Equations Nontruncated Functional Equations Summary Successive Approximation Methods Introduction Motivation Preliminaries Functional Equations of Type One Functional Equations of Type Two Truncation Method Stationary Models Truncation and Successive Approximation Summary Bibliographic Notes Optimal Policies Introduction Preliminary Analysis Truncated Functional Equations Nontruncated Functional Equations Successive Approximation in the Policy Space Summary Bibliographic Notes The Curse of Dimensionality Introduction Motivation Discrete Problems Special Cases Complete Enumeration Conclusions The Rest Is Mathematics and Experience Introduction Choice of Model Dynamic Programming Models Forward Decomposition Models Practice What You Preach! Computational Schemes Applications Dynamic Programming Software Summary ART Refinements Introduction Weak-Markovian Condition Markovian Formulations Decomposition Schemes Sequential Decision Models Example Shortest Path Model The Art of Dynamic Programming Modeling Summary Bibliographic Notes The State Introduction Preliminary Analysis Mathematically Speaking Decomposition Revisited Infeasible States and Decisions State Aggregation Nodes as States Multistage vs Sequential Models Models vs Functional Equations Easy Problems Modeling Tips Concluding Remarks Summary Parametric Schemes Introduction Background and Motivation Fractional Programming Scheme C-Programming Scheme Lagrange Multiplier Scheme Summary Bibliographic Notes The Principle of Optimality Introduction Bellman's Principle of Optimality Prevailing Interpretation Variations on a Theme Criticism So What Is Amiss? The Final State Model Revisited Bellman's Treatment of Dynamic Programming Summary Post Script: Pontryagin's Maximum Principle Forward Decomposition Introduction Function Decomposition Initial Problem Separable Objective Functions Revisited Modified Problems Revisited Backward Conditional Problems Revisited Markovian Condition Revisited Forward Functional Equation Impact on the State Space Anomaly Pathologic Cases Summary and Conclusions Bibliographic Notes Push! Introduction The Pull Method The Push Method Monotone Accumulated Return Processes Dijkstra's Algorithm Summary Bibliographic Notes EPILOGUE What Then Is Dynamic Programming? Review Non-Optimization Problems An Abstract Dynamic Programming Model Examples The Towers of Hanoi Problem Optimization-Free Dynamic Programming Concluding Remarks Appendix A: Contraction Mapping Appendix B: Fractional Programming Appendix C: Composite Concave Programming Appendix D: The Principle of Optimality in Stochastic Processes Appendix E: The Corridor Method Bibliography Index

...read moreread less

Proceedings Article•DOI•

A chaotic firefly algorithm applied to reliability-redundancy optimization

[...]

Leandro dos Santos Coelho¹, Diego Luis de Andrade Bernert¹, Viviana Cocco Mariani¹•Institutions (1)

Pontifícia Universidade Católica do Paraná¹

05 Jun 2011

TL;DR: A modified FA approach combined with chaotic sequences (FAC) applied to reliability-redundancy optimization is introduced and was found to outperform the previously best-known solutions available.

...read moreread less

Abstract: The reliability-redundancy allocation problem can be approached as a mixed-integer programming problem. It has been solved by using optimization techniques such as dynamic programming, integer programming, and mixed-integer nonlinear programming. On the other hand, a broad class of meta-heuristics has been developed for reliability-redundancy optimization. Recently, a new meta-heuristics called firefly algorithm (FA) algorithm has emerged. The FA is a stochastic metaheuristic approach based on the idealized behavior of the flashing characteristics of fireflies. In FA, the flashing light can be formulated in such a way that it is associated with the objective function to be optimized, which makes it possible to formulate the firefly algorithm. This paper introduces a modified FA approach combined with chaotic sequences (FAC) applied to reliability-redundancy optimization. In this context, an example of mixed integer programming in reliability-redundancy design of an overspeed protection system for a gas turbine is evaluated. In this application domain, FAC was found to outperform the previously best-known solutions available.

...read moreread less

Journal Article•DOI•

Sampling strategies and stopping criteria for stochastic dual dynamic programming: a case study in long-term hydrothermal scheduling

[...]

Tito Homem-de-Mello¹, Vitor L. de Matos², Erlon Cristian Finardi²•Institutions (2)

University of Illinois at Chicago¹, Universidade Federal de Santa Catarina²

22 Jan 2011-Energy Systems

TL;DR: In this article, two alternative sampling strategies, namely Latin hypercube sampling and randomized quasi-Monte Carlo, were proposed for the generation of scenario trees, as well as for the sampling of scenarios that is part of the SDDP algorithm.

...read moreread less

Abstract: The long-term hydrothermal scheduling is one of the most important problems to be solved in the power systems area. This problem aims to obtain an optimal policy, under water (energy) resources uncertainty, for hydro and thermal plants over a multi-annual planning horizon. It is natural to model the problem as a multi-stage stochastic program, a class of models for which algorithms have been developed. The original stochastic process is represented by a finite scenario tree and, because of the large number of stages, a sampling-based method such as the Stochastic Dual Dynamic Programming (SDDP) algorithm is required. The purpose of this paper is two-fold. Firstly, we study the application of two alternative sampling strategies to the standard Monte Carlo—namely, Latin hypercube sampling and randomized quasi-Monte Carlo—for the generation of scenario trees, as well as for the sampling of scenarios that is part of the SDDP algorithm. Secondly, we discuss the formulation of stopping criteria for the optimization algorithm in terms of statistical hypothesis tests, which allows us to propose an alternative criterion that is more robust than that originally proposed for the SDDP. We test these ideas on a problem associated with the whole Brazilian power system, with a three-year planning horizon.

...read moreread less

Journal Article•DOI•

Adaptive dynamic programming for online solution of a zero-sum differential game

[...]

Draguna Vrabie, Frank L. Lewis¹•Institutions (1)

University of Texas at Arlington¹

19 Jul 2011-Journal of Control Theory and Applications

TL;DR: In this paper, an approximate/adaptive dynamic programming (ADP) algorithm was proposed to determine the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.

...read moreread less

Abstract: This paper will present an approximate/adaptive dynamic programming (ADP) algorithm, that uses the idea of integral reinforcement learning (IRL), to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost. The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation (CT-GARE), which underlies the game problem. We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics. The feasibility of the ADP scheme is demonstrated in simulation for a power system control application. The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.

...read moreread less

Journal Article•DOI•

Multireservoir optimisation in discrete and continuous domains

[...]

Omid Bozorg Haddad, Abbas Afshar, Miguel A. Marino

01 Feb 2011

TL;DR: The honey-bee mating optimisation algorithm, which is based on the mating procedure of honey-bees in nature, is presented and tested with three benchmark multireservoir operation problems in both discrete and continuous domains and it is shown that the performance of the model compares well with results of the well-developed genetic algorithm.

...read moreread less

Abstract: In this paper, the honey-bee mating optimisation (HBMO) algorithm, which is based on the mating procedure of honey-bees in nature, is presented and tested with three benchmark multireservoir operation problems in both discrete and continuous domains. To test the applicability of the algorithm, results are compared with those from different analytical and evolutionary algorithms (linear programming, dynamic programming, differential dynamic programming, discrete differential dynamic programming and genetic algorithm). The first example is a multireservoir operation optimisation problem in a discrete domain with discrete decision and state variables. It is shown that the performance of the model compares well with results of the well-developed genetic algorithm. The second example is a four-reservoir problem in a continuous domain that has recently been approached with different evolutionary algorithms. The third example is a ten-reservoir problem in series and parallel. The best solution obtained is quite ...

...read moreread less

Proceedings Article•DOI•

Checkpointing strategies for parallel jobs

[...]

Marin Bougeret¹, Henri Casanova, Mikaël Rabie¹, Yves Robert¹, Frédéric Vivien² - Show less +1 more•Institutions (2)

École normale supérieure de Lyon¹, French Institute for Research in Computer Science and Automation²

12 Nov 2011

TL;DR: An analysis of checkpointing strategies for minimizing expected job execution times in an environment that is subject to processor failures gives the optimal solution for exponentially distributed failure inter-arrival times, which is the first rigorous proof that periodic checkpointing is optimal.

...read moreread less

Abstract: This work provides an analysis of checkpointing strategies for minimizing expected job execution times in an environment that is subject to processor failures. In the case of both sequential and parallel jobs, we give the optimal solution for exponentially distributed failure inter-arrival times, which, to the best of our knowledge, is the first rigorous proof that periodic checkpointing is optimal. For non-ex-ponentially distributed failures, we develop a dynamic programming algorithm to maximize the amount of work completed before the next failure, which provides a good heuristic for minimizing the expected execution time. Our work considers various models of job parallelism and of parallel checkpointing overhead. We first perform extensive simulation experiments assuming that failures follow Exponential or Weibull distributions, the latter being more representative of real-world systems. The obtained results not only corroborate our theoretical findings, but also show that our dynamic programming algorithm significantly outperforms previously proposed solutions in the case of Weibull failures. We then discuss results from simulation experiments that use failure logs from production clusters. These results confirm that our dynamic programming algorithm significantly outperforms existing solutions for real-world clusters.

...read moreread less

Journal Article•DOI•

A Tractable Class of Algorithms for Reliable Routing in Stochastic Networks

[...]

Samitha Samaranayake¹, Sebastien Blandin¹, Alexandre M. Bayen¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2011-Transportation Research Part C-emerging Technologies

TL;DR: The goal of this article is to provide the theoretical basis for enabling tractable solutions to the "arriving on time" problem and enabling its use in real-time mobile phone applications and to present an efficient algorithm for finding an optimal routing policy with a well bounded computational complexity.

...read moreread less

Abstract: The goal of this article is to provide the theoretical basis for enabling tractable solutions to the "arriving on time" problem and enabling its use in real-time mobile phone applications. Optimal routing in transportation networks with highly varying traffic conditions is a challenging problem due to the stochastic nature of travel-times on links of the network. The definition of optimality criteria and the design of solution methods must account for the random nature of the travel-time on each link. Most common routing algorithms consider the expected value of link travel-time as a sufficient statistic for the problem and produce least expected travel-time paths without consideration of travel-time variability. However, in numerous practical settings the reliability of the route is also an important decision factor. In this article, the authors consider the following optimality criterion: maximizing the probability of arriving on time at a destination given a departure time and a time budget. The authors present an efficient algorithm for finding an optimal routing policy with a well bounded computational complexity, improving on an existing solution that takes an unbounded number of iterations to converge to the optimal solution. A routing policy is an adaptive algorithm that determines the optimal solution based on en route travel-times and therefore provides better reliability guarantees than an a-priori solution. Novel speed-up techniques to efficiently compute the adaptive optimal strategy and methods to prune the search space of the problem are also investigated. Finally, an extension of this algorithm which allows for both time varying traffic conditions and spatio-temporal correlations of link travel-time distributions is presented. The dramatic runtime improvements provided by the algorithm are demonstrated for practical scenarios in California.

...read moreread less

Journal Article•DOI•

Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming

[...]

Jian Fu¹, Haibo He², Xinmin Zhou¹•Institutions (2)

Wuhan University of Technology¹, University of Rhode Island²

01 Jul 2011-IEEE Transactions on Neural Networks

TL;DR: This paper investigates a generalized multiple-input-multiple-output (GMIMO) ADP design for online learning and control, which is more applicable to a wide range of practical real-world applications and test the performance of this approach based on a practical complex system.

...read moreread less

Abstract: Adaptive dynamic programming (ADP) is a promising research field for design of intelligent controllers, which can both learn on-the-fly and exhibit optimal behavior. Over the past decades, several generations of ADP design have been proposed in the literature, which have demonstrated many successful applications in various benchmarks and industrial applications. While many of the existing researches focus on multiple-inputs-single-output system with steepest descent search, in this paper we investigate a generalized multiple-input-multiple-output (GMIMO) ADP design for online learning and control, which is more applicable to a wide range of practical real-world applications. Furthermore, an improved weight-updating algorithm based on recursive Levenberg-Marquardt methods is presented and embodied in the GMIMO approach to improve its performance. Finally, we test the performance of this approach based on a practical complex system, namely, the learning and control of the tension and height of the looper system in a hot strip mill. Experimental results demonstrate that the proposed approach can achieve effective and robust performance.

...read moreread less

Reference Entry•DOI•

Approximate Dynamic Programming I: Modeling

[...]

Warren B. Powell¹•Institutions (1)

Princeton University¹

14 Jan 2011

TL;DR: This article provides a flexible modeling framework that uses a classic control-theoretic framework, avoiding devices such as one-step transition matrices, and describes the five fundamental elements of any stochastic, dynamic program.

...read moreread less

Abstract: The first step in solving a stochastic optimization problem is providing a mathematical model. How the problem is modeled can impact the solution strategy. In this article, we provide a flexible modeling framework that uses a classic control-theoretic framework, avoiding devices such as one-step transition matrices. We describe the five fundamental elements of any stochastic, dynamic program. Different notational conventions are introduced, and the types of policies that can be used to guide decisions are described in detail. This discussion puts approximate dynamic programming in the context of a variety of other algorithmic strategies by using the modeling framework to describe a wide range of policies. A brief discussion of model-free programming is also provided. Keywords: approximate dynamic programming; Markov decision process; state variable; transition function; model-free dynamic programming

...read moreread less

Dissertation•

Improving combinatorial optimization

[...]

Geoffrey Chu

01 Jan 2011

TL;DR: A new nogood learning technique based on constraint projection that allows us to exploit subproblem dominances that arise when two different search paths lead to subproblems which are identical on the remaining unlabeled variables.

...read moreread less

Abstract: Combinatorial Optimization is an important area of computer science that has many theoretical and practical applications. In this thesis, we present important contributions to several different areas of combinatorial optimization, including nogood learning, symmetry breaking, dominance, relaxations and parallelization. We develop a new nogood learning technique based on constraint projection that allows us to exploit subproblem dominances that arise when two different search paths lead to subproblems which are identical on the remaining unlabeled variables. On appropriate problems, this nogood learning technique provides orders of magnitude speedup compared to a base solver which does not learn nogoods. We present a new symmetry breaking technique called SBDS-1UIP, which is an extension of Symmetry Breaking During Search (SBDS). SBDS-1UIP uses symmetric versions of the 1UIP nogoods derived by Lazy Clause Generation solvers to prune symmetric parts of the search space. We show that SBDS-1UIP can exploit at least as many symmetries as SBDS, and that it is strictly more powerful on some problems, allowing us to exploit types of symmetries that no previous general symmetry breaking technique is capable of exploiting. We present two new general methods for exploiting almost symmetries (symmetries which are broken by a small number of constraints). The first is to treat almost symmetries as conditional symmetries and exploit them via conditional symmetry breaking constraints. The second is to modify SDBS-1UIP to handle almost symmetries. Both techniques are capable of producing exponential speedups on appropriate problems. We examine three reasonably well known problems: the Minimization of Open

...read moreread less

On Adaptive-ECMS strategies for hybrid electric vehicles

[...]

Simona Onori¹, Lorenzo Serrao•Institutions (1)

Ohio State University¹

01 Jan 2011

TL;DR: A classification of the methods that have been proposed to design an Adaptive- ECMS (A-ECMS) controller is proposed and a comparative analysis in simulation of three adaptation laws falling into the class of algorithms of adaptation through feedback of SOC is carried out.

...read moreread less

Abstract: The problem of adapting the equivalence factor of the Equivalent Consumption Mini- mization Strategy (ECMS) to achieve a real time implementable sub-optimal solution of the problem of energy management in hybrid electric vehicle (HEV) has been the object of extensive research over the last decade. Contributions in the open literature range from methods based on prediction of driving cycle to driving pattern recognition to feedback from state of charge. In this paper, we first propose a classification of the methods that have been proposed to design an Adaptive-ECMS (A-ECMS) controller and then we carry out a comparative analysis in simulation of three adaptation laws falling into the class of algorithms of adaptation through feedback of SOC. Simulation results are performed on a parallel hybrid vehicle and show the performances of the three adaptation laws as compared to the optimal ECMS (a suitable proxi for the global optimal solution given by the dynamic programming algorithm).

...read moreread less

Journal Article•DOI•

A scenario-based dynamic programming model for multi-period liner ship fleet planning

[...]

Qiang Meng¹, Tingsong Wang¹•Institutions (1)

National University of Singapore¹

01 Jul 2011-Transportation Research Part E-logistics and Transportation Review

TL;DR: In this article, a scenario-based dynamic programming model consisting of a number of integer linear programming formulations for each single planning period is proposed, and the model can be solved efficiently by a shortest path algorithm on an acyclic network.

...read moreread less

Abstract: This paper proposes a more realistic multi-period liner ship fleet planning problem for a liner container shipping company than has been studied in previous literature. The proposed problem is formulated as a scenario-based dynamic programming model consisting of a number of integer linear programming formulations for each single planning period, and the model can be solved efficiently by a shortest path algorithm on an acyclic network. A numerical example is carried out to illustrate the applicability of the proposed model and solution method. The numerical results show that chartering in ships may not always be a better policy for a long-term planning horizon though it is much cheaper than buying ships in the short-term. Purchasing ships seems to be a more profitable investment in the long run.

...read moreread less

Collapse