scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 2012"


Book
05 Jun 2012
TL;DR: In this paper, the authors present a survey of the central algorithmic techniques for designing approximation algorithms, including greedy and local search algorithms, dynamic programming, linear and semidefinite programming, and randomization.
Abstract: Discrete optimization problems are everywhere, from traditional operations research planning problems, such as scheduling, facility location, and network design; to computer science problems in databases; to advertising issues in viral marketing. Yet most such problems are NP-hard. Thus unless P = NP, there are no efficient algorithms to find optimal solutions to such problems. This book shows how to design approximation algorithms: efficient algorithms that find provably near-optimal solutions. The book is organized around central algorithmic techniques for designing approximation algorithms, including greedy and local search algorithms, dynamic programming, linear and semidefinite programming, and randomization. Each chapter in the first part of the book is devoted to a single algorithmic technique, which is then applied to several different problems. The second part revisits the techniques but offers more sophisticated treatments of them. The book also covers methods for proving that optimization problems are hard to approximate. Designed as a textbook for graduate-level algorithms courses, the book will also serve as a reference for researchers interested in the heuristic solution of discrete optimization problems.

759 citations


Journal ArticleDOI
TL;DR: This paper presents a novel policy iteration approach for finding online adaptive optimal controllers for continuous-time linear systems with completely unknown system dynamics, using the approximate/adaptive dynamic programming technique to iteratively solve the algebraic Riccati equation using the online information of state and input.

723 citations


Journal ArticleDOI
TL;DR: An intelligent-optimal control scheme for unknown nonaffine nonlinear discrete-time systems with discount factor in the cost function is developed and implemented via globalized dual heuristic programming technique.

360 citations


Book
27 Sep 2012
TL;DR: In this paper, the authors propose a method for solving control problems by verification, which is based on the Viscosity Solution Equation (VSP) in the sense of VVS.
Abstract: Preface.- 1. Conditional Expectation and Linear Parabolic PDEs.- 2. Stochastic Control and Dynamic Programming.- 3. Optimal Stopping and Dynamic Programming.- 4. Solving Control Problems by Verification.- 5. Introduction to Viscosity Solutions.- 6. Dynamic Programming Equation in the Viscosity Sense.- 7. Stochastic Target Problems.- 8. Second Order Stochastic Target Problems.- 9. Backward SDEs and Stochastic Control.- 10. Quadratic Backward SDEs.- 11. Probabilistic Numerical Methods for Nonlinear PDEs.- 12. Introduction to Finite Differences Methods.- References.

244 citations


Proceedings Article
22 Jul 2012
TL;DR: Time-critical influence maximization under the time-delayed IC model maintains desired properties such as submodularity, which allows a greedy algorithm to achieve an approximation ratio of 1 - 1/e, to circumvent the NP-hardness of the problem.
Abstract: Influence maximization is a problem of finding a small set of highly influential users in a social network such that the spread of influence under certain propagation models is maximized. Inthis paper, we consider time-critical influence maximization, in which one wants to maximize influence spread within a given deadline. Since timing is considered in the optimization, we also extend the Independent Cascade (IC) model to incorporate the time delay aspect of influence diffusion in social networks. We show that time-critical influence maximization under the time-delayed IC model maintains desired properties such as submodularity, which allows a greedy algorithm to achieve an approximation ratio of 1 - 1/e, to circumvent the NP-hardness of the problem. To overcome the inefficiency of the approximation algorithm, we design two heuristic algorithms: the first one is based on a dynamic programming procedure that computes exact influence in tree structures, while the second one converts the problem to one in the original IC model and then applies existing fast heuristics to it. Our simulation results demonstrate that our heuristics achieve the same level of influence spread as the greedy algorithm while running a few orders of magnitude faster, and they also outperform existing algorithms that disregard the deadline constraint and delays in diffusion.

244 citations


Book
01 Jan 2012
TL;DR: Find loads of the approximate dynamic programming book catalogues in this site as the choice of you visiting this page.
Abstract: Find loads of the approximate dynamic programming book catalogues in this site as the choice of you visiting this page. You can also join to the website book library that will show you numerous books from any types. Literature, science, politics, and many more catalogues are presented to offer you the best book to find. The book that really makes you feels satisfied. Or that's the book that will save you from your job deadline.

235 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a novel convex modeling approach which allows for a simultaneous optimization of battery size and energy management of a plug-in hybrid powertrain by solving a semidefinite convex problem.

234 citations


Journal ArticleDOI
TL;DR: The iterative adaptive dynamic programming algorithm using globalized dual heuristic programming technique is introduced to obtain the optimal controller with convergence analysis in terms of cost function and control law for a class of unknown discrete-time nonlinear systems forward-in-time.
Abstract: In this paper, a neuro-optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is developed. The iterative adaptive dynamic programming algorithm using globalized dual heuristic programming technique is introduced to obtain the optimal controller with convergence analysis in terms of cost function and control law. In order to carry out the iterative algorithm, a neural network is constructed first to identify the unknown controlled system. Then, based on the learned system model, two other neural networks are employed as parametric structures to facilitate the implementation of the iterative algorithm, which aims at approximating at each iteration the cost function and its derivatives and the control law, respectively. Finally, a simulation example is provided to verify the effectiveness of the proposed optimal control approach. Note to Practitioners-The increasing complexity of the real-world industry processes inevitably leads to the occurrence of nonlinearity and high dimensions, and their mathematical models are often difficult to build. How to design the optimal controller for nonlinear systems without the requirement of knowing the explicit model has become one of the main foci of control practitioners. However, this problem cannot be handled by only relying on the traditional dynamic programming technique because of the "curse of dimensionality". To make things worse, the backward direction of solving process of dynamic programming precludes its wide application in practice. Therefore, in this paper, the iterative adaptive dynamic programming algorithm is proposed to deal with the optimal control problem for a class of unknown nonlinear systems forward-in-time. Moreover, the detailed implementation of the iterative ADP algorithm through the globalized dual heuristic programming technique is also presented by using neural networks. Finally, the effectiveness of the control strategy is illustrated via simulation study.

229 citations


Journal ArticleDOI
TL;DR: The Hamilton-Jacobi-Bellman equation is solved forward-in-time for the optimal control of a class of general affine nonlinear discrete-time systems without using value and policy iterations and the end result is the systematic design of an optimal controller with guaranteed convergence that is suitable for hardware implementation.
Abstract: In this paper, the Hamilton-Jacobi-Bellman equation is solved forward-in-time for the optimal control of a class of general affine nonlinear discrete-time systems without using value and policy iterations. The proposed approach, referred to as adaptive dynamic programming, uses two neural networks (NNs), to solve the infinite horizon optimal regulation control of affine nonlinear discrete-time systems in the presence of unknown internal dynamics and a known control coefficient matrix. One NN approximates the cost function and is referred to as the critic NN, while the second NN generates the control input and is referred to as the action NN. The cost function and policy are updated once at the sampling instant and thus the proposed approach can be referred to as time-based ADP. Novel update laws for tuning the unknown weights of the NNs online are derived. Lyapunov techniques are used to show that all signals are uniformly ultimately bounded and that the approximated control signal approaches the optimal control input with small bounded error over time. In the absence of disturbances, an optimal control is demonstrated. Simulation results are included to show the effectiveness of the approach. The end result is the systematic design of an optimal controller with guaranteed convergence that is suitable for hardware implementation.

217 citations


Journal ArticleDOI
TL;DR: This paper presents the detailed design architecture and its associated learning algorithm to explain how effective learning and optimization can be achieved in this new ADP architecture and test the performance both on the cart-pole balancing task and the triple-link inverted pendulum balancing task.

208 citations


Journal ArticleDOI
TL;DR: A finite-horizon neuro-optimal tracking control strategy for a class of discrete-time nonlinear systems and three neural networks are used as parametric structures to implement the algorithm, which aims at approximating the cost function, the control law, and the error dynamics.

Journal ArticleDOI
TL;DR: Lower bounds for the optimal total cost are established using results in dynamic programming and the fundamental limits on the maximum achievable information acquisition rate and the optimal reliability are characterized.
Abstract: Consider a decision maker who is responsible to dynamically collect observations so as to enhance his information about an underlying phenomena of interest in a speedy manner while accounting for the penalty of wrong declaration. Due to the sequential nature of the problem, the decision maker relies on his current information state to adaptively select the most ``informative'' sensing action among the available ones. In this paper, using results in dynamic programming, lower bounds for the optimal total cost are established. The lower bounds characterize the fundamental limits on the maximum achievable information acquisition rate and the optimal reliability. Moreover, upper bounds are obtained via an analysis of two heuristic policies for dynamic selection of actions. It is shown that the first proposed heuristic achieves asymptotic optimality, where the notion of asymptotic optimality, due to Chernoff, implies that the relative difference between the total cost achieved by the proposed policy and the optimal total cost approaches zero as the penalty of wrong declaration (hence the number of collected samples) increases. The second heuristic is shown to achieve asymptotic optimality only in a limited setting such as the problem of a noisy dynamic search. However, by considering the dependency on the number of hypotheses, under a technical condition, this second heuristic is shown to achieve a nonzero information acquisition rate, establishing a lower bound for the maximum achievable rate and error exponent. In the case of a noisy dynamic search with size-independent noise, the obtained nonzero rate and error exponent are shown to be maximum.

Book
14 Dec 2012
TL;DR: Adaptive Dynamic Programming in Discrete Time (ADPDP-DTM) as discussed by the authors is a generalization of ADP for nonlinear systems with a focus on optimal control.
Abstract: There are many methods of stable controller design for nonlinear systems. In seeking to go beyond the minimum requirement of stability, Adaptive Dynamic Programming in Discrete Time approaches the challenging topic of optimal control for nonlinear systems using the tools of adaptive dynamic programming (ADP). The range of systems treated is extensive; affine, switched, singularly perturbed and time-delay nonlinear systems are discussed as are the uses of neural networks and techniques of value and policy iteration. The text features three main aspects of ADP in which the methods proposed for stabilization and for tracking and games benefit from the incorporation of optimal control methods: infinite-horizon control for which the difficulty of solving partial differential HamiltonJacobiBellman equations directly is overcome, and proof provided that the iterative value function updating sequence converges to the infimum of all the value functions obtained by admissible control law sequences; finite-horizon control, implemented in discrete-time nonlinear systems showing the reader how to obtain suboptimal control solutions within a fixed number of control steps and with results more easily applied in real systems than those usually gained from infinite-horizon control; nonlinear games for which a pair of mixed optimal policies are derived for solving games both when the saddle point does not exist, and, when it does, avoiding the existence conditions of the saddle point. Non-zero-sum games are studied in the context of a single network scheme in which policies are obtained guaranteeing system stability and minimizing the individual performance function yielding a Nash equilibrium. In order to make the coverage suitable for the student as well as for the expert reader, Adaptive Dynamic Programming in Discrete Time: establishes the fundamental theory involved clearly with each chapter devoted to a clearly identifiable control paradigm; demonstrates convergence proofs of the ADP algorithms to deepen understanding of the derivation of stability and convergence with the iterative computational methods used; and shows how ADP methods can be put to use both in simulation and in real applications. This text will be of considerable interest to researchers interested in optimal control and its applications in operations research, applied mathematics computational intelligence and engineering. Graduate students working in control and operations research will also find the ideas presented here to be a source of powerful methods for furthering their study.

Proceedings ArticleDOI
01 Dec 2012
TL;DR: A procedure from probabilistic model checking is used to combine the system model with an automaton representing the specification and this new MDP is transformed into an equivalent form that satisfies assumptions for stochastic shortest path dynamic programming.
Abstract: We present a method for designing a robust control policy for an uncertain system subject to temporal logic specifications. The system is modeled as a finite Markov Decision Process (MDP) whose transition probabilities are not exactly known but are known to belong to a given uncertainty set. A robust control policy is generated for the MDP that maximizes the worst-case probability of satisfying the specification over all transition probabilities in this uncertainty set. To this end, we use a procedure from probabilistic model checking to combine the system model with an automaton representing the specification. This new MDP is then transformed into an equivalent form that satisfies assumptions for stochastic shortest path dynamic programming. A robust version of dynamic programming solves for a e-suboptimal robust control policy with time complexity O(log1/e) times that for the non-robust case.

Journal ArticleDOI
TL;DR: Wind generation performances can be enhanced and adapted to load demand, obtaining an increased economic gain measured by the difference between the economic revenue obtained with and without the proposed generation shifting policy.
Abstract: The paper proposes the modeling and the optimal management of a hot-temperature (sodium nickel chloride) battery system coupled with wind generators connected to a medium voltage grid. A discrete-time model of the storage device reproducing the battery main dynamics (i.e., state of charge, temperature, current, protection, and limitation systems) has been developed. The model has been validated through some experimental tests. An optimal management strategy has been implemented based on a forward dynamic programming algorithm, specifically developed to exploit the energy price arbitrage along the optimization time horizon (“generation shifting”). Taking advantage of this strategy wind generation performances can be enhanced and adapted to load demand, obtaining an increased economic gain measured by the difference between the economic revenue obtained with and without the proposed generation shifting policy.


Journal ArticleDOI
TL;DR: In this paper, a point absorber WEC employing a hydraulic/electric power take-off system is formulated as an optimal control problem with a disturbance input (the sea elevation) and with both state and input constraints.

Journal ArticleDOI
TL;DR: This paper proposes a new model that efficiently segments common objects from multiple images by segmenting each original image into a number of local regions based on local region similarities and saliency maps and uses the dynamic programming method to solve the co-segmentation problem.
Abstract: Segmenting common objects that have variations in color, texture and shape is a challenging problem. In this paper, we propose a new model that efficiently segments common objects from multiple images. We first segment each original image into a number of local regions. Then, we construct a digraph based on local region similarities and saliency maps. Finally, we formulate the co-segmentation problem as the shortest path problem, and we use the dynamic programming method to solve the problem. The experimental results demonstrate that the proposed model can efficiently segment the common objects from a group of images with generally lower error rate than many existing and conventional co-segmentation methods.

Journal ArticleDOI
TL;DR: Experimental results obtained by the simulation of different traffic scenarios show that the AIM based on ACS outperforms the traditional traffic lights and other recent traffic control strategies.
Abstract: Autonomous intersection management (AIM) is an innovative concept for directing vehicles through the intersections. AIM assumes that the vehicles negotiate the right-of-way. This assumption makes the problem of the intersection management significantly different from the usually studied ones such as the optimization of the cycle time, splits, and offsets. The main difficulty is to define a strategy that improves the traffic efficiency. Indeed, due to the fact that each vehicle is considered individually, AIM faces a combinatorial optimization problem that needs quick and efficient solutions for a real time application. This paper proposes a strategy that evacuates vehicles as soon as possible for each sequence of vehicle arrivals. The dynamic programming (DP) that gives the optimal solution is shown to be greedy. A combinatorial explosion is observed if the number of lanes rises. After evaluating the time complexity of the DP, the paper proposes an ant colony system (ACS) to solve the control problem for large number of vehicles and lanes. The complete investigation shows that the proposed ACS algorithm is robust and efficient. Experimental results obtained by the simulation of different traffic scenarios show that the AIM based on ACS outperforms the traditional traffic lights and other recent traffic control strategies.

Journal ArticleDOI
TL;DR: A Dantzig-Wolfe decomposition approach is proposed, which enables the uncertainty of the data to be encapsulated in the column generation subproblem, and a dynamic programming algorithm is proposed to solve the subproblem with data uncertainty.
Abstract: In this article, we investigate the vehicle routing problem with deadlines, whose goal is to satisfy the requirements of a given number of customers with minimum travel distances while respecting both of the deadlines of the customers and vehicle capacity. It is assumed that the travel time between any two customers and the demands of the customer are uncertain. Two types of uncertainty sets with adjustable parameters are considered for the possible realizations of travel time and demand. The robustness of a solution against the uncertain data can be achieved by making the solution feasible for any travel time and demand defined in the uncertainty sets. We propose a Dantzig-Wolfe decomposition approach, which enables the uncertainty of the data to be encapsulated in the column generation subproblem. A dynamic programming algorithm is proposed to solve the subproblem with data uncertainty. The results of computational experiments involving two well-known test problems show that the robustness of the solution can be greatly improved.

Journal ArticleDOI
TL;DR: An iterative control algorithm is given to devise a decentralized optimal controller that globally asymptotically stabilizes the system in question and is demonstrated via the online learning control of multimachine power systems with governor controllers.
Abstract: This brief presents a new approach to decentralized control design of complex systems with unknown parameters and dynamic uncertainties. A key strategy is to use the theory of robust adaptive dynamic programming and the policy iteration technique. An iterative control algorithm is given to devise a decentralized optimal controller that globally asymptotically stabilizes the system in question. Stability analysis is accomplished by means of the small-gain theorem. The effectiveness of the proposed computational control algorithm is demonstrated via the online learning control of multimachine power systems with governor controllers.

Journal ArticleDOI
TL;DR: This paper considers high-speed control of constrained linear parameter-varying systems using model predictive control, and gathers previous developments and provides new material such as a proof for the optimality of the solution, or, in the case of close-to-optimal solutions, a procedure to determine a bound on the suboptimality ofThe solution.
Abstract: This paper considers high-speed control of constrained linear parameter-varying systems using model predictive control. Existing model predictive control schemes for control of constrained linear parameter-varying systems typically require the solution of a semi-definite program at each sampling instance. Recently, variants of explicit model predictive control were proposed for linear parameter-varying systems with polytopic representation, decreasing the online computational effort by orders of magnitude. Depending on the mathematical structure of the underlying system, the constrained finite-time optimal control problem can be solved optimally, or close-to-optimal solutions can be computed. Constraint satisfaction, recursive feasibility and asymptotic stability can be guaranteed a priori by an appropriate selection of the terminal state constraints and terminal cost. The paper at hand gathers previous developments and provides new material such as a proof for the optimality of the solution, or, in the case of close-to-optimal solutions, a procedure to determine a bound on the suboptimality of the solution.

Posted Content
TL;DR: Memory-Bounded Dynamic Programming is generalized and its scalability is improved by reducing the complexity with respect to the number of observations from exponential to polynomial, and error bounds on solution quality are derived.
Abstract: Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well.

Journal ArticleDOI
TL;DR: This paper describes two different control strategies for a fuel-cell-based hybrid electric vehicle (FCHEV) that are based on dynamic programming, and the online strategy is based on an optimized fuzzy logic controller.
Abstract: This paper describes two different control strategies for a fuel-cell-based hybrid electric vehicle (FCHEV). The offline strategy is based on dynamic programming, and the online strategy is based on an optimized fuzzy logic controller. These two strategies are then compared. Finally, the fuzzy logic controller is validated using a real FCHEV.

Posted Content
TL;DR: In this article, the authors consider time-critical influence maximization, in which one wants to maximize influence spread within a given deadline, and extend the Independent Cascade (IC) model and the Linear Threshold (LT) model to incorporate the time delay aspect of influence diffusion among individuals in social networks.
Abstract: Influence maximization is a problem of finding a small set of highly influential users, also known as seeds, in a social network such that the spread of influence under certain propagation models is maximized. In this paper, we consider time-critical influence maximization, in which one wants to maximize influence spread within a given deadline. Since timing is considered in the optimization, we also extend the Independent Cascade (IC) model and the Linear Threshold (LT) model to incorporate the time delay aspect of influence diffusion among individuals in social networks. We show that time-critical influence maximization under the time-delayed IC and LT models maintains desired properties such as submodularity, which allows a greedy approximation algorithm to achieve an approximation ratio of $1-1/e$. To overcome the inefficiency of the greedy algorithm, we design two heuristic algorithms: the first one is based on a dynamic programming procedure that computes exact influence in tree structures and directed acyclic subgraphs, while the second one converts the problem to one in the original models and then applies existing fast heuristic algorithms to it. Our simulation results demonstrate that our algorithms achieve the same level of influence spread as the greedy algorithm while running a few orders of magnitude faster, and they also outperform existing fast heuristics that disregard the deadline constraint and delays in diffusion.

Journal ArticleDOI
TL;DR: A delay-aware distributed solution with the BS-DTX control at the BS controller (BSC) and the user scheduling at each cluster manager (CM) using approximate dynamic programming and distributed stochastic learning is obtained and the proposed distributed two-timescale algorithm converges almost surely.
Abstract: In this paper, we propose a two-timescale delay-optimal base station discontinuous transmission (BS-DTX) control and user scheduling for downlink coordinated MIMO systems with energy harvesting capability. To reduce the complexity and signaling overhead in practical systems, the BS-DTX control is adaptive to both the energy state information (ESI) and the data queue state information (QSI) over a longer timescale. The user scheduling is adaptive to the ESI, the QSI and the channel state information (CSI) over a shorter timescale. We show that the two-timescale delay-optimal control problem can be modeled as an infinite horizon average cost partially observed Markov decision problem (POMDP), which is well known to be a difficult problem in general. By using sample-path analysis and exploiting specific problem structure, we first obtain some structural results on the optimal control policy and derive an equivalent Bellman equation with reduced state space. To reduce the complexity and facilitate distributed implementation, we obtain a delay-aware distributed solution with the BS-DTX control at the BS controller (BSC) and the user scheduling at each cluster manager (CM) using approximate dynamic programming and distributed stochastic learning. We show that the proposed distributed two-timescale algorithm converges almost surely. Furthermore, using queueing theory, stochastic geometry, and optimization techniques, we derive sufficient conditions for the data queues to be stable in the coordinated MIMO network and discuss various design insights.

Posted Content
TL;DR: In this article, the state space is dynamically partitioned into regions where the value function is the same throughout the region, where the state variables can be expressed by piecewise constant representations.
Abstract: We describe an approach for exploiting structure in Markov Decision Processes with continuous state variables. At each step of the dynamic programming, the state space is dynamically partitioned into regions where the value function is the same throughout the region. We first describe the algorithm for piecewise constant representations. We then extend it to piecewise linear representations, using techniques from POMDPs to represent and reason about linear surfaces efficiently. We show that for complex, structured problems, our approach exploits the natural structure so that optimal solutions can be computed efficiently.

Journal ArticleDOI
TL;DR: A new ϵ-optimal control algorithm based on the iterative ADP approach is proposed that makes the performance index function iteratively converge to the greatest lower bound of all performance indices within an error ϵ in finite time.

Journal ArticleDOI
TL;DR: A dynamic programming algorithm to draw optimal intermodal freight routing with regard to international logistics of container cargo for export and import and a Weighted Constrained Shortest Path Problem (WCSPP) model are presented.
Abstract: This paper presents a dynamic programming algorithm to draw optimal intermodal freight routing with regard to international logistics of container cargo for export and import. This study looks into the characteristics of intermodal transport using multi-modes, and presents a Weighted Constrained Shortest Path Problem (WCSPP) model. This study draws Pareto optimal solutions that can simultaneously meet two objective functions by applying the Label Setting algorithm, a type of Dynamic Programming algorithms, after setting the feasible area. To improve the algorithm performance, pruning rules have also been presented. The algorithm is applied to real transport paths from Busan to Rotterdam, as well as to large-scale cases. This study quantitatively measures the savings in both transport cost and time by comparing single transport modes with intermodal transport paths. Last, this study applies a mathematical model and MADM model to the multiple Pareto optimal solutions to estimate the solutions.

Journal ArticleDOI
TL;DR: In this paper, the assumption of diminishing marginal utility (i.e., concavity) of reservoir utility functions is used to model the water resources system, and the authors show that this is an important characteristic of water resources systems.
Abstract: Diminishing marginal utility is an important characteristic of water resources systems. With the assumption of diminishing marginal utility (i.e., concavity) of reservoir utility functions,...