scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 1990"


Book ChapterDOI
01 Jun 1990
TL;DR: This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning.
Abstract: This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned model of the world. In this paper, I present and show results for two Dyna architectures. The Dyna-PI architecture is based on dynamic programming's policy iteration method and can be related to existing AI ideas such as evaluation functions and universal plans (reactive systems). Using a navigation task, results are shown for a simple Dyna-PI system that simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. The Dyna-Q architecture is based on Watkins's Q-learning, a new kind of reinforcement learning. Dyna-Q uses a less familiar set of data structures than does Dyna-PI, but is arguably simpler to implement and use. We show that Dyna-Q architectures are easy to adapt for use in changing environments.

1,592 citations


01 Jan 1990
TL;DR: In this paper, the authors apply dynamic programming to the energy-minimizing active contours optimization problem, which is set up as a discrete multistage decision process and is solved by a time-delayed discrete dynamic programming algorithm.
Abstract: Dynamic programming is discussed as an approach to solving variational problems in vision. Dynamic programming ensures global optimality of the solution, is numerically stable, and allows for hard constraints to be enforced on the behavior of the solution within a natural and straightforward structure. As a specific example of the approach's efficacy, applying dynamic programming to the energy-minimizing active contours is described. The optimization problem is set up as a discrete multistage decision process and is solved by a time-delayed discrete dynamic programming algorithm. A parallel procedure for decreasing computational costs is discussed. >

1,084 citations


Journal ArticleDOI
TL;DR: The optimization problem is set up as a discrete multistage decision process and is solved by a time-delayed discrete dynamic programming algorithm, and a parallel procedure for decreasing computational costs is discussed.
Abstract: Dynamic programming is discussed as an approach to solving variational problems in vision. Dynamic programming ensures global optimality of the solution, is numerically stable, and allows for hard constraints to be enforced on the behavior of the solution within a natural and straightforward structure. As a specific example of the approach's efficacy, applying dynamic programming to the energy-minimizing active contours is described. The optimization problem is set up as a discrete multistage decision process and is solved by a time-delayed discrete dynamic programming algorithm. A parallel procedure for decreasing computational costs is discussed. >

1,014 citations


Journal ArticleDOI
TL;DR: In this paper, an efficient method for calculating the load flow solution of weakly meshed transmission and distribution systems is presented, which uses active and reactive powers as flow variables rather than complex currents, thus simplifying the treatment of P, V buses and reducing the related computational effort.
Abstract: An efficient method for calculating the load flow solution of weakly meshed transmission and distribution systems is presented. Its essential advantages over a previous approach are the following: (1) It uses active and reactive powers as flow variables rather than complex currents, thus simplifying the treatment of P, V buses and reducing the related computational effort to half; (2) It uses an efficient tree-labeling technique which also contributes to the computational efficiency of the procedure; (3) It uses an improved solution strategy, thereby reducing the burden of mismatch calculations which is an important component of the solution process. Results of tests with 30, 243, 1380, and 4130 bus systems are given to illustrate the performance of the proposed method. >

384 citations


Journal ArticleDOI
TL;DR: In this article, a branch-and-bound procedure using Lagrangean relaxation for determining both lower bounds and feasible solutions is presented, and the relaxed problems are solved by dynamic programming.

264 citations


Journal ArticleDOI
01 Jul 1990
TL;DR: Lower bounds on the optimal cost-to-go from the information-theoretic concepts of Huffman coding and entropy are derived and have made it possible to obtain optimal test sequences to problems that are intractable with traditional dynamic programming techniques.
Abstract: The problem of constructing optimal and near-optimal test sequences to diagnose permanent faults in electronic and electromechanical systems is considered. The test sequencing problem is formulated as an optimal binary AND/OR decision tree construction problem, whose solution is known to be NP-complete. The approach used is based on integrated concepts from information theory and heuristic AND/OR graph search methods to subdue the computational explosion of the optimal test-sequencing problem. Lower bounds on the optimal cost-to-go from the information-theoretic concepts of Huffman coding and entropy are derived. These lower bounds ensure that an optimal solution is found using the heuristic AND/OR graph search algorithms; they have made it possible to obtain optimal test sequences to problems that are intractable with traditional dynamic programming techniques. In addition, a class of test-sequencing algorithms that provide a tradeoff between solution quality and complexity have been derived using the epsilon -optimal and limited search strategies. >

237 citations


Journal ArticleDOI
TL;DR: Four branch and bound algorithms use lower bounds obtained from the Lagrangean relaxation of machine capacity constraints and from dynamic programming state-space relaxation to minimize total weighted tardiness of jobs on a single machine.

207 citations


Journal ArticleDOI
Rein Luus1
TL;DR: The proposed method utilizing a relatively coarse grid followed by systematic reduction in the grid size is shown to converge to the optimal solution in a reasonable number of iterations.
Abstract: The use of dynamic programming to solve non-linear optimal control problems resistant to other methods is investigated. The proposed method utilizing a relatively coarse grid followed by systematic...

158 citations


Journal ArticleDOI
TL;DR: An effective multiplier method-based differential dynamic programming (DDP) algorithm for solving the hydroelectric generation scheduling problem (HSP) is presented and results demonstrate the efficiency and optimality of the algorithm.
Abstract: An effective multiplier method-based differential dynamic programming (DDP) algorithm for solving the hydroelectric generation scheduling problem (HSP) is presented. The algorithm is developed for solving a class of constrained dynamic optimization problems. It relaxes all constraints but the system dynamics by the multiplier method and adopts the DDP solution technique to solve the resultant unconstrained dynamic optimization problem. The authors formulate the HSP of the Taiwan power system and apply the algorithm to it. Results demonstrate the efficiency and optimality of the algorithm for this application. Computational results indicate that the growth of the algorithm's run time with respect to the problem size is moderate. CPU times of the testing cases are well within the Taiwan Power Company's desirable performance; less than 30 minutes on a VAX/780 mini-computer for a one-week scheduling. >

152 citations


Journal ArticleDOI
TL;DR: Generalized DP avoids the potential pitfalls created by this absence of monotonicity, thereby guaranteeing optimality in a prototypical multicriteria DP problem, namely a multicritical version of the shortest path problem.

141 citations


Journal ArticleDOI
TL;DR: Principles of dynamic programming and its application to discrete-utterance and connected-speech recognition are introduced and discussed, and the deterministic form, used for template matching for connected speech, is described in detail.
Abstract: Principles of dynamic programming and its application to discrete-utterance and connected-speech recognition are introduced and discussed. The deterministic form, used for template matching for connected speech, is described in detail, and a number of algorithms are examined. The Viterbi algorithm, which is a form of dynamic programming for a stochastic system, is briefly discussed. >

Journal ArticleDOI
TL;DR: This paper presents a theoretical framework to analyze the stability of the closed-loop system resulting from the use of on-line optimization in the feedback loop and shows that a suboptimal algorithm yields asymptotically stable systems, provided that enough computation power is available to solve at each sampling interval an optimization problem considerably simpler than the original.

Journal ArticleDOI
TL;DR: Tests and computer results of four systems of different combinations of units and intervals are given to show the advantages of the techniques proposed, including a quadratic programming technique combined with a linear programming redispatch technique.
Abstract: Consideration is given to dispatch problems that involve the allocation of system generation optimally among generating units while tracking a load curve and observing the power rate limits of the units, system spinning reserve requirements, and other security constraints. Two methods are used in the solution of the problem. The first method is a quadratic programming technique combined with a linear programming redispatch technique. The latter utilizes a linear programming formulation of the dynamic dispatch problem about the base case static economic dispatch solution. The second method is based on the Dantzig-Wolfe decomposition technique. Tests and computer results of four systems of different combinations of units and intervals are given to show the advantages of the techniques proposed. >

Journal ArticleDOI
TL;DR: A dynamic programming procedure for determining the optimal just-in-time (JIT) production schedule for a mixed-model facility is presented and heuristics, which have been proposed for these large problems, can now be evaluated by generating problems and comparing the schedules produced by the heuristic with the optimal schedule.

Journal ArticleDOI
TL;DR: Two fundamental classes of problems in large-scale linear and quadratic programming are described and strong properties of duality are revealed which support the development of iterative approximate techniques of solution in terms of saddlepoints.
Abstract: Two fundamental classes of problems in large-scale linear and quadratic programming are described. Multistage problems covering a wide variety of models in dynamic programming and stochastic programming are represented in a new way. Strong properties of duality are revealed which support the development of iterative approximate techniques of solution in terms of saddlepoints. Optimality conditions are derived in a form that emphasizes the possibilities of decomposition.

Journal ArticleDOI
TL;DR: A paradigm for a general class of such optimizations that yields a polyhedral description for each model in the class that provides a succinct characterization of the solutions to the underlying optimization problem expressed through an appropriate change of variables is developed.
Abstract: Many interesting combinatorial problems can be optimized efficiently using recursive computations often termed discrete dynamic programming. In this paper, we develop a paradigm for a general class of such optimizations that yields a polyhedral description for each model in the class. The elementary concept of dynamic programs as shortest path problems in acyclic graphs is generalized to one seeking a least cost solution in a directed hypergraph. Sufficient conditions are then provided for binary integrality of the associated hyperflow problem. Given a polynomially solvable dynamic program, the result is a linear program, in polynomially many variables and constraints, having the solution vectors of the dynamic program as its extreme-point optima. That is, the linear program provides a succinct characterization of the solutions to the underlying optimization problem expressed through an appropriate change of variables. We also discuss projecting this formulation to recover constraints on the original variables and illustrate how some important dynamic programming solvable models fit easily into our paradigm. A classic multiechelon lot sizing problem in production and a host of optimization problems on recursively defined classes of graphs are included.

Journal ArticleDOI
Rein Luus1
TL;DR: In this article, the x-grid is represented using dynamic programming, by taking only accessible states for the xgrid and using an iterative procedure employing region contraction, only a small number of grid points are required at each point.
Abstract: In using dynamic programming, by taking only accessible states for the x-grid and using an iterative procedure employing region contraction, only a small number of grid points are required at each ...

24 Jan 1990
TL;DR: The existence of decision algorithms with low-degree polynomial running times for a number of well-studied graph layout, placement, and routing problems is nonconstructively proved using the recent Robertson–Seymour theorems on the well-partial-ordering of graphs.
Abstract: We nonconstructively prove the existence of decision algorithms with low-degree polynomial running times for a number of well-studied graph layout, placement, and routing problems. Some were not previously known to be in p at all; others were only known to be in p by way of brute force or dynamic programming formulations with unboundedly high-degree polynomial running times. Our methods include the application of the recent Robertson-Seymour theorems on the well-partial-ordering of graphs under both the minor and immersion orders. We also briefly address the complexity of search versions of these problems.

Journal ArticleDOI
David Eppstein1
TL;DR: This work extends algorithms for solving the minimum-weight edit sequence problem with non-linear costs for multiple insertions and deletions to cost functions that are neither convex nor concave, but a mixture of both.


Proceedings ArticleDOI
01 Jan 1990
TL;DR: In this paper, an on-line two-dimensional dynamic programming algorithm for the prediction of RNA secondary structure is presented. But the complexity of the algorithm is not the same as the one presented in this paper.
Abstract: An on-line problem is a problem where each input is available only after certain outputs have been calculated. The usual kind of problem, where all inputs are available at all times, is referred to as an off-line problem. We present an efficient algorithm for Waterman's problem, an on-line two-dimensional dynamic programming problem that is used for the prediction of RNA secondary structure. Our algorithm uses as a module an algorithm for solving a certain on-line one-dimensional dynamic programming problem. The time complexity of our algorithm is n times the complexity of the on-line one-dimensional dynamic programming problem. For the concave case, we present a linear time algorithm for on-line searching in totally monotone matrices which is a generalization of the on-line one-dimensional problem. This yields an optimal O(n2) time algorithm for the on-line two-dimensional concave problem. The constants in the time complexity of this algorithm are fairly small, which make it practical. For the convex case, we use an O(nα(n)) time algorithm for the on-line one-dimensional problem, where α(·) is the functional inverse of Ackermann's function. This yields an O(n2α(n)) time algorithm for the on-line two-dimensional convex problem. Our techniques can be extended to solve the sparse version of Waterman's problem. We obtain an O(n + h log min {h, n 2 h }) time algorithm for the sparse concave case, and an O(n + hα(h)) log min {h, n 2 h }) time algorithm for the sparse convex case, where h is the number of possible base pairs in the RNA structure. All our algorithms improve on previously known algorithms.

Journal ArticleDOI
TL;DR: A heuristic solution for short-term thermal unit commitment is presented and produces the same unit commitment schedule as a standard dynamic programming (DP)-based algorithm in one fourth the computation time for the DP-based algorithm.
Abstract: A heuristic solution for short-term thermal unit commitment is presented. The algorithm has been developed as a FORTRAN-based production-type program. Tests of this algorithm on data provided by a midwest utility have demonstrated satisfactory results. The proposed algorithm produces the same unit commitment schedule as a standard dynamic programming (DP)-based algorithm in one fourth the computation time for the DP-based algorithm. The algorithm is nearly linear with the number of unit periods. A unit period is defined as the online duration time versus operation level for each unit. Operating system restrictions limit the prototype to 30 thermal power units as implemented on a PS/2 model 50 personal computer. >

Journal ArticleDOI
TL;DR: The stockout option means that horizons can exist and permits the use of horizons to develop a forward algorithm for solving the problem, which is shown in the worst case to be asymptotically linear in computational requirements.
Abstract: In this paper, we consider the lot size model for the production and storage of a single commodity with limitations on production capacity and the possibility of not meeting demand, i.e., stockouts, at a penalty. The stockout option means that horizons can exist and permits the use of horizons to develop a forward algorithm for solving the problem. The forward algorithm is shown in the worst case to be asymptotically linear in computational requirements, in contrast to the case for the classical lot size model which has exponential computing requirements. Two versions of the model are considered: first, in which the upper bound on production is the same for every time period; and second, in which the upper bound on production is permitted to vary each time period. In the first case, the worst case computational difficulty increases in a cubic fashion initially, and then becomes linear. In the second case, the initial increase is exponential before becoming linear. Besides the forward algorithm, a number of necessary conditions are derived that reduce the computational burden of solving the integer programming problem posed by the model.

Proceedings ArticleDOI
05 Dec 1990
TL;DR: While GAMS appears to work well only for linear-quadratic optimal control problems or problems with a short horizon, the genetic algorithm applies to more general problems and appears to be competitive with search-based methods.
Abstract: The application of the genetic algorithm to discrete-time optimal control problems is studied. The numerical results obtained are compared with a system for construction and solution of large and complex mathematical programming models, GAMS. It is shown that while GAMS appears to work well only for linear-quadratic optimal control problems or problems with a short horizon, the genetic algorithm applies to more general problems and appears to be competitive with search-based methods. >

Journal ArticleDOI
TL;DR: In this article, the concept of biconvergence was introduced, which is a weak and intuitive topological assumption on the utility function and the production function together, and it has been shown that the true value function exists, it is the unique admissible solution to Bellman's equation and it may be calculated numerically as the limit of successive approximations.
Abstract: This paper introduces the concept of biconvergence, which is a weak and intuitive topological assumption on the utility function and the production function together. Concerning recursive utility, we show that, given biconvergence, the utility function is the unique admissible solution to Koopmans' equation. Concerning dynamic programming, we show that, given biconvergence, the true value function exists, it is the unique admissible solution to Bellman's equation, and it may be calculated numerically as the limit of successive approximations. Finally, we develop an overly strong sufficient condition for biconvergence which substantially weakens the Lipschitz condition used by contraction-mapping techniques.

Journal ArticleDOI
TL;DR: In this paper, a stochastic optimal control problem where the randomness is essentially concentrated in the stopping time terminating the process is formulated as an infinite-horizon optimization problem.
Abstract: This paper deals with a stochastic optimal control problem where the randomness is essentially concentrated in the stopping time terminating the process. If the stopping time is characterized by an intensity depending on the state and control variables, one can reformulate the problem equivalently as an infinite-horizon optimal control problem. Applying dynamic programming and minimum principle techniques to this associated deterministic control problem yields specific optimality conditions for the original stochastic control problem. It is also possible to characterize extremal steady states. The model is illustrated by an example related to the economics of technological innovation.

Journal ArticleDOI
Xun Yu Zhou1
TL;DR: In this article, the relationship between the maximum principle and the Hamilton-Jacobi-Bellman equation was investigated in the case of deterministic, finite-dimensional systems, by employing the notions of superdifferential and subdifferential introduced by Crandall and Lions.
Abstract: Two major tools for studying optimally controlled systems are Pontryagin's maximum principle and Bellman's dynamic programming, which involve the adjoint function, the Hamiltonian function, and the value function. The relationships among these functions are investigated in this work, in the case of deterministic, finite-dimensional systems, by employing the notions of superdifferential and subdifferential introduced by Crandall and Lions. Our results are essentially non-smooth versions of the classical ones. The connection between the maximum principle and the Hamilton-Jacobi-Bellman equation (in the viscosity sense) is thereby explained by virtue of the above relationship.

Journal ArticleDOI
Byong-Hun Ahn1, Jae-Ho Hyun1
TL;DR: This paper develops alternatively an efficient iterative heuristic algorithm that alleviates the memory problem and establishes a simple yet powerful necessary condition for optimality, the intergroup shortest processing time (SPT) property.

Journal ArticleDOI
TL;DR: An efficient dynamic programming algorithm is developed to solve a finite horizon problem and results are presented to find decision/forecast horizons.
Abstract: This paper considers the dynamic lot sizing problem of H. M. Wagner and T. M. Whitin with the assumption that the total cost of n setups is a concave nondecreasing function of n. Such setup costs could arise from the worker learning in setups and/or technological improvements in setup methods. An efficient dynamic programming algorithm is developed to solve a finite horizon problem and results are presented to find decision/forecast horizons. Several new results presented in the paper have potential use in solving other related problems.

Journal ArticleDOI
TL;DR: In this article, the problem of dynamic lot size model with quantity discount in purchase price was studied with two different cost structures: the all-units-discount cost structure and the incremental discount cost structure, and the problem was solved under both discount cost structures by dynamic programming algorithms of complexity O(T3) and O (T2), with T the number of periods in the planning horizon.
Abstract: This article treats the dynamic lot size model with quantity discount in purchase price. We study the problem with two different cost structures: the all-units-discount cost structure and the incremental-discount cost structure. We solve the problem under both discount cost structures by dynamic programming algorithms of complexity O(T3) and O(T2),respectively, with T the number of periods in the planning horizon.