scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 1994"


Journal ArticleDOI
TL;DR: In this paper, Monte Carlo integration is used to stimulate the required multiple integrals at a subset of the state points, and interpolating the non-simulated values using a regression function.
Abstract: Over the past decade, a substantial literature on methods for the estimation of discrete choice dynamic programming (DDP) models of behavior has developed. However, the implementation of these methods can impose major computational burdens because solving for agents' decision rules often involves high dimensional integrations that must be performed at each point in the state space. In this paper we develop an approximate solution method that consists of: (1) using Monte Carlo integration to stimulate the required multiple integrals at a subset of the state points, and (2) interpolating the non-simulated values using a regression function. The overall performance of this approximation method appears to be excellent. Copyright 1994 by MIT Press.

425 citations


Posted Content
TL;DR: In this article, Monte Carlo integration is used to simulate the required multiple integrals at a subset of the state points, and interpolating the non-simulated values using a regression function.
Abstract: Over the past decade, a substantial literature on the estimation of discrete choice dynamic programming (DC-DP) models of behavior has developed. However, this literature now faces major computational barriers. Specifically, in order to solve the dynamic programming (DP) problems that generate agents' decision rules in DC-DP models, high dimensional integrations must be performed at each point in the state space of the DP problem. In this paper we explore the performance of approximate solutions to DP problems. Our approximation method consists of: 1) using Monte Carlo integration to simulate the required multiple integrals at a subset of the state points, and 2) interpolating the non-simulated values using a regression function. The overall performance of this approximation method appears to be excellent, both in terms of the degree to which it mimics the exact solution, and in terms of the parameter estimates it generates when embedded in an estimation algorithm.

424 citations


Proceedings ArticleDOI
01 Jun 1994
TL;DR: The stability and convergence results for dynamic programming-based reinforcement learning applied to linear quadratic regulation (LQR) are presented and the specific algorithm is based on Q-learning and it is proven to converge to an optimal controller.
Abstract: In this paper we present the stability and convergence results for dynamic programming-based reinforcement learning applied to linear quadratic regulation (LQR). The specific algorithm we analyze is based on Q-learning and it is proven to converge to an optimal controller provided that the underlying system is controllable and a particular signal vector is persistently excited. This is the first convergence result for DP-based reinforcement learning algorithms for a continuous problem.

394 citations


Proceedings Article
01 Jan 1994
TL;DR: This work proposes algorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems, and demonstrates these algorithms by applying them to the problem of determining the optimal control for a simple queueing system.
Abstract: Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Among these are TD(λ), Q-learning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose algorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal control for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully applied.

328 citations


Journal ArticleDOI
01 Jul 1994
TL;DR: In this article, two GA solutions to the economic dispatch problem are presented, which do not impose any convexity restrictions on the generator cost functions and can be coded to work on parallel machines.
Abstract: Two genetic algorithm (GA) solutions to the economic dispatch problem are presented. An advantage of the GA solutions is that they do not impose any convexity restrictions on the generator cost functions. Another advantage is that GAs can be very effectively coded to work on parallel machines. Test results with systems of up to 72 generating units with nonconvex cost functions show that both GAs outperform the dynamic programming solution to the economic dispatch problem. Furthermore, the execution time of the second GA solution increases almost linearly with the number of generators.

314 citations


Journal ArticleDOI
TL;DR: The small noise limit is interpreted as a deterministic partially observed dynamic game, and new insights into the optimal solution of such game problems are obtained.
Abstract: Solves a finite-horizon partially observed risk-sensitive stochastic optimal control problem for discrete-time nonlinear systems and obtains small noise and small risk limits. The small noise limit is interpreted as a deterministic partially observed dynamic game, and new insights into the optimal solution of such game problems are obtained. Both the risk-sensitive stochastic control problem and the deterministic dynamic game problem are solved using information states, dynamic programming, and associated separated policies. A certainty equivalence principle is also discussed. The authors' results have implications for the nonlinear robust stabilization problem. The small risk limit is a standard partially observed risk-neutral stochastic optimal control problem. >

265 citations


Journal ArticleDOI
TL;DR: A heuristic algorithm based on Lagrangian optimization using an operational rate-distortion framework that, with computing complexity reduced by an order of magnitude, approaches the optimally achievable performance.
Abstract: The authors formalize the description of the buffer-constrained adaptive quantization problem. For a given set of admissible quantizers used to code a discrete nonstationary signal sequence in a buffer-constrained environment, they formulate the optimal solution. They also develop slightly suboptimal but much faster approximations. These solutions are valid for any globally minimum distortion criterion, which is additive over the individual elements of the sequence. As a first step, they define the problem as one of constrained, discrete optimization and establish its equivalence to some of the problems studied in the field of integer programming. Forward dynamic programming using the Viterbi algorithm is shown to provide a way of computing the optimal solution. Then, they provide a heuristic algorithm based on Lagrangian optimization using an operational rate-distortion framework that, with computing complexity reduced by an order of magnitude, approaches the optimally achievable performance. The algorithms can serve as a benchmark for assessing the performance of buffer control strategies and are useful for applications such as multimedia workstation displays, video encoding for CD-ROMs, and buffered JPEG coding environments, where processing delay is not a concern but decoding buffer size has to be minimized. >

259 citations


Journal ArticleDOI
TL;DR: A dynamic programming algorithm for computing a best global alignment of two sequences that is robust in identifying any of several global relationships between two sequences and a multiple alignment algorithm based on the pairwise algorithm.
Abstract: We present a dynamic programming algorithm for computing a best global alignment of two sequences. The proposed algorithm is robust in identifying any of several global relationships between two sequences. The algorithm delivers a best alignment of two sequences in linear space and quadratic time. We also describe a multiple alignment algorithm based on the pairwise algorithm. Both algorithms have been implemented as portable C programs. Experimental results indicate that for a commonly used set of gap penalties, the new programs produce more satisfactory alignments on sequences of various lengths than some existing pairwise and multiple programs based on the dynamic programming algorithm of Needleman and Wunsch.

257 citations


Book
01 Apr 1994
TL;DR: Presentation of water systems modelling and simulation state and parameter estimation demand prediction mathematical models in operational control problems optimal scheduling of supply systems and application of dynamic programming operational control of water retention systems.
Abstract: Presentation of water systems modelling and simulation state and parameter estimation demand prediction mathematical models in operational control problems optimal scheduling of supply systems direct application of nonlinear programming techniques optimal scheduling of combined supply and distribution systems application of dynamic programming operational control of water retention systems.

210 citations


Journal ArticleDOI
TL;DR: In this paper, a methodology for determining optimal pump operation schedules for water distribution systems is presented, in addition to minimizing the energyconsumption cost, the model includes a constraint to l...
Abstract: A methodology for determining optimal pump operation schedules for waterdistribution systems is presented. In addition to minimizing the energyconsumption cost, the model includes a constraint to l...

181 citations


Journal ArticleDOI
TL;DR: In this article, a general method for constructing high-order approximation schemes for Hamilton-Jacobi-Bellman equations is given, based on a discrete version of the Dynamic Programming Principle.
Abstract: A general method for constructing high-order approximation schemes for Hamilton-Jacobi-Bellman equations is given. The method is based on a discrete version of the Dynamic Programming Principle. We prove a general convergence result for this class of approximation schemes also obtaining, under more restrictive assumptions, an estimate in $L^\infty$ of the order of convergence and of the local truncation error. The schemes can be applied, in particular, to the stationary linear first order equation in ${\Bbb R}^n$ . We present several examples of schemes belonging to this class and with fast convergence to the solution.

Journal ArticleDOI
TL;DR: In this article, the grey dynamic programming (GDP) method is introduced for decision making under uncertainty, which allows uncertain information to be directly communicated into the optimization process and resulting solutions, such that decision alternatives could be generated through the interpretation and analysis of the grey solutions according to projected applicable system conditions.
Abstract: The present paper introduces a grey dynamic programming (GDP) method by incorporating concepts of grey systems and grey decisions within a dynamic programming framework as a means for decision making under uncertainty. The grey dynamic programming approach improves upon previous dynamic programming methods by allowing uncertain information to be directly communicated into the optimization process and resulting solutions, such that decision alternatives could be generated through the interpretation and analysis of the grey solutions according to projected applicable system conditions. The method also does not lead to more complicated intermediate models, and thus has reasonable computational requirements and is applicable to practical problems. Application of the method to a hypothetical problem of waste management facility expansion/ use planning within a municipal‐solid‐waste management system indicates that reasonable solutions have been generated. Comparisons between grey dynamic programming and other ...

Book
31 Oct 1994
TL;DR: In this article, the Ekeland Variational Principle is applied to the problem of optimal control of nonlinear Parameter Distributed Systems (PDS) and the HINFINITY-Control Problem is formulated.
Abstract: Preface. Symbols and Notations. I: Generalized Gradients and Optimality. 1. Fundamentals of Convex Analysis. 2. Generalized Gradients. 3. The Ekeland Variational Principle. II: Optimal Control of Ordinary Differential Systems. 1. Formulation of the Problem and Existence. 2. The Maximum Principle. 3. Applications of the Maximum Principle. III: The Dynamic Programming Method. 1. The Dynamic Programming Equation. 2. Variational and Viscosity Solutions to the Equation of Dynamic Programming. 3. Constructive Approaches to Synthesis Problem IV: Optimal Control of Parameter Distributed Systems. 1. General Description of Parameter Distributed Systems. 2. Optimal Convex Control Problems. 3. The HINFINITY-Control Problem. 4. Optimal Control of Nonlinear Parameter Distributed Systems. Subject Index.

Journal ArticleDOI
TL;DR: The focus of this paper is to use simulated annealing as the basis for developing an efficient multiple sequence alignment algorithm called MSASA, which can use natural gap costs which can generate better solution, align more sequences and take less computation time.
Abstract: Multiple sequence alignment is a useful technique for studying molecular evolution and analyzing structure-sequence relationships. Dynamic programming of multiple sequence alignment has been widely used to find an optimal alignment. However, dynamic programming does not allow for certain types of gap costs, and it limits the number of sequences that can be aligned due to its high computational complexity. The focus of this paper is to use simulated annealing as the basis for developing an efficient multiple sequence alignment algorithm. An algorithm called Multiple Sequence Alignment using Simulated Annealing (MSASA) has been developed. The computational complexity of MSASA is significantly reduced by replacing the high-temperature phase of the annealing process by a fast heuristic algorithm. This heuristic algorithm facilitates in minimizing the solution set of the low-temperature phase of the annealing process. Compared to the dynamic programming approach, MSASA can (i) use natural gap costs which can generate better solution, (ii) align more sequences and (iii) take less computation time.

Journal ArticleDOI
TL;DR: In this paper, an optimization-based method for scheduling hydrothermal systems based on the Lagrangian relaxation technique is presented, where the problem is converted into the scheduling of individual units.
Abstract: This paper presents an optimization-based method for scheduling hydrothermal systems based on the Lagrangian relaxation technique. After system-wide constraints are relaxed by Lagrange multipliers, the problem is converted into the scheduling of individual units. This paper concentrates on the solution methodology for pumped-storage units. There are, many constraints limiting the operation of a pumped-storage unit, such as pond level dynamics and constraints, and discontinuous generation and pumping regions. The most challenging issue in solving pumped-storage subproblems within the Lagrangian relaxation framework is the integrated consideration of these constraints. The basic idea of the method is to relax the pond level dynamics and constraints by using another set of multipliers. The subproblem is then converted into the optimization of generation or pumping; levels for each operating state at individual hours, and the optimization of operating states across hours. The optimal generation or pumping level for a particular operating state at each hour can be obtained by optimizing a single variable function without discretizing pond levels. Dynamic programming is then used to optimize operating states across hours with only a few number of states and transitions. A subgradient algorithm is used to update the pond level Lagrangian multipliers. This method provides an efficient way to solve a class of subproblems involving continuous dynamics and constraints, discontinuous operating regions, and discrete operating states. >

Journal ArticleDOI
TL;DR: This work approaches and describes a machine scheduling model for the problem of selecting and scheduling projects to maximize the scientific, military or commercial value of a space mission, and describes two upper bounding procedures, based upon a preemptive relaxation of the problem, and upon the use of Lagragean relaxation.

Journal ArticleDOI
TL;DR: A new method for predicting RNA secondary structure based on a genetic algorithm designed to run on a massively parallel SIMD computer and pointed out a long-standing simplification in the implementation of the original dynamic programming algorithm.
Abstract: We present a new method for predicting RNA secondary structure based on a genetic algorithm. The algorithm is designed to run on a massively parallel SIMD computer. Statistical analysis shows that the program performs well when compared to a dynamic programming algorithm used to solve the same problem. The program has also pointed out a long-standing simplification in the implementation of the original dynamic programming algorithm that sometimes causes it not to find the optimal secondary structure.

Book ChapterDOI
10 Jul 1994
TL;DR: A novel incremental algorithm that combines Q-learning, a well-known dynamic programming-based reinforcement learning method, with the TD(A) return estimation process, which is typically used in actor-critic learning, which leads to faster learning and to alleviate the non-Markovian effect of coarse state-space quantization.
Abstract: This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic programming-based reinforcement learning method, with the TD(A) return estimation process, which is typically used in actor-critic learning, another well-known dynamic programming-based reinforcement learning method. The parameter A is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm is demonstrated through computer simulations of the standard benchmark control problem of learning to balance a pole on a cart.

Patent
Eugene J. Shekita1, Honesty C. Young1
30 Dec 1994
TL;DR: In this article, a query optimizer for optimizing join queries in a relational database system by iterative application of dynamic programming (DP) to select optimal subgraph join execution plans is presented.
Abstract: A query optimizer for optimizing join queries in a relational database system by iterative application of dynamic programming (DP) to select optimal subgraph join execution plans. Unlike traditional DP optimization methods, bounds on search space time and space complexity can be established and adjusted by imposing a subgraph threshold. Each bounded subgraph is selected using a greedy heuristic (GH) hill-climbing procedure or other similarly useful technique to build a low-cost execution plan. The low-cost GH subgraph execution plan is then discarded in favor of an optimal DP subgraph execution plan selected by a dynamic programming optimizer for each subgraph identified by the bounded GH optimization process. The complexity bound may be dynamically tuned to improve execution plan quality responsive to changes in query complexity.

Journal ArticleDOI
TL;DR: A new optimizing approach for resource leveling based on non-serial dynamic programming that permits a marked reduction of the complexity of the problem as it checks for only the feasible time subsets which are far less numerous than feasible sequences.

Journal ArticleDOI
TL;DR: This work presents a unifying framework for the parallel computation of dynamic programming recurrences with more than O(1) dependency, and uses two well-known methods, the closure method and the matrix product method, as general paradigms for developing parallel algorithms.

Journal ArticleDOI
TL;DR: A new dynamic programming method for the single item capacitated dynamic lot size model with non-negative demands and no backlogging is developed, which builds the Optimal value function in piecewise linear segments.
Abstract: We develop a new dynamic programming method for the single item capacitated dynamic lot size model with non-negative demands and no backlogging. This approach builds the Optimal value function in piecewise linear segments. It works very well on the test problems, requiring less than 0.3 seconds to solve problems with 48 periods on a VAX 8600. Problems with the time horizon up to 768 periods are solved. Empirically, the computing effort increases only at a quadratic rate relative to the number of periods in the time horizon.

Journal ArticleDOI
TL;DR: A genetic algorithm is used to search for a near optimal solution of the rigid-body superposition of two whole protein structures, using a least-squares fitting algorithm to optimize the fit between the final set of equivalences.
Abstract: We introduce a completely automatic and objective procedure for the comparison of protein structures. A genetic algorithm is used to search for a near optimal solution of the rigid-body superposition of two whole protein structures. The specification of an initial set of equivalences is not required. Topological equivalences in the final structural alignment are defined by a conventional dynamic programming routine, which is commonly used to compare protein sequences. A least-squares fitting algorithm is then used to optimize the fit between the final set of equivalences. We have applied our method to the comparison of ribonucleic acid structures, as well as protein structures. The structural alignments are generally consistent with those previously published. In fact, on most occasions our method defines at least the same number of topological equivalences as other procedures, but always with a lower r.m.s. distance between them.

Journal ArticleDOI
TL;DR: Computational results on problem instances of up to n = 35 exhibit a clear superiority of this SSDP approach over the original dynamic programming recursion.

Journal ArticleDOI
01 Jan 1994
TL;DR: The developed grey fuzzy dynamic programming model improves upon previous DP methods by allowing uncertain input information to be directly communicated into the optimization process and solutions through the use of different a-cut levels of fuzzy numbers for the input fuzzy information.
Abstract: This paper integrates the concepts of grey systems and fuzzy sets into optimization analysis by dynamic programming as a means of accounting for system uncertainty. The developed grey fuzzy dynamic programming (GFDP) model improves upon previous DP methods by allowing uncertain input information to be directly communicated into the optimization process and solutions through the use of different a-cut levels of fuzzy numbers for the input fuzzy information, and the use of a grey fuzzy linear programming (GFLP) method for an embedded LP problem. The modelling approach is applied to a hypothetical problem for the planning of waste flow allocation and treatment/disposal facility expansion within a municipal solid waste management system. The solutions of the GFDP model corresponding to different a-cut levels provide optimal decisions regarding different development alternatives in a multi-period, multi-facility and multi-scale context, as well as the upper and lower limits of waste flow allocation. T...

Journal ArticleDOI
01 Feb 1994-Infor
TL;DR: In this paper, a new bounding procedure and a new optimal algorithm based on dynamic programming are presented for dial-a-ride with precedence constraints, which is NP-hard and arises in practical transportation and sequencing problems.
Abstract: The Traveling Salesman Problem with Precedence Constraints is to find an hamiltonian tour of minimum cost in a graph G = (X,A) of n vertices, starting from vertex 1, visiting every vertex that must precede i before i (i = 2,3,..., n) and returning to vertex 1. This problem is NP-hard and arises in practical transportation and sequencing problems. In this paper we describe a new bounding procedure and a new optimal algorithm, based on dynamic programming. Computational results are given for randomly generated test problems, including the dial-a-ride problem with the classical TSP objective function.

Journal ArticleDOI
TL;DR: In this paper, the authors present two methods for approximating the optimal groundwater pumping policy for several interrelated aquifers in a stochastic setting that also involves conjunctive use of surface water.
Abstract: This paper presents two methods for approximating the optimal groundwater pumping policy for several interrelated aquifers in a stochastic setting that also involves conjunctive use of surface water. The first method employs a policy iteration dynamic programming (DP) algorithm where the value function is estimated by Monte Carlo simulation combined with curve-fitting techniques. The second method uses a Taylor series approximation to the functional equation of DP which reduces the problem, for a given observed state, to solving a system of equations equal in number to the aquifers. The methods are compared using a four-state variable, stochastic dynamic programming model of Madera County, California. The two methods yield nearly identical estimates of the optimal pumping policy, as well as the steady state pumping depth, suggesting that either method can be used in similar applications.

Proceedings ArticleDOI
21 Jun 1994
TL;DR: The matching is obtained as a piecewise parametric function, and no discretization is involved, nor any parameterized deformation assumed, so a measure of the deformation of the matched contours is obtained, yielding information on the quality of the match.
Abstract: This paper presents a subpixel contour matching algorithm using a novel dynamic programming scheme. Unlike classical dynamic programming methods, where a discrete path is searched for across a graph, our approach allows the optimal continuous path to be determined. The matching is obtained as a piecewise parametric function, and no discretization is involved, nor any parameterized deformation assumed. As a side result, a measure of the deformation of the matched contours is obtained, yielding information on the quality of the match. The algorithm has been tested with different types of images, demonstrating its ability to deal with chains of contour segments as well as chains of contour edges, since the discretization of the contours does not limit the precision of the matches anymore. >

Proceedings ArticleDOI
08 May 1994
TL;DR: The method can solve difficult high-dimensional path planning problems without using any problem-specific heuristics and an extension of VDP can solve manipulator planning problems of unprecedented complexity.
Abstract: This paper presents a novel approach to path planning. It is a variational technique, consisting of iteratively improving an initial path possibly colliding with obstacles. At each iteration, the path is improved by performing a dynamic programming search in a sub-manifold of the configuration space containing the current path. We call this method variational dynamic programming (VDP). The method can solve difficult high-dimensional path planning problems without using any problem-specific heuristics. More importantly, an extension of VDP can solve manipulator planning problems of unprecedented complexity. >

Journal ArticleDOI
TL;DR: A hybrid artificial neural network (ANN) dynamic programming (DP) method for optimal feeder capacitor scheduling is presented and it is found that execution time of scheduling is highly reduced, while the cost is almost the same as the optimal one derived from full DP.
Abstract: A hybrid artificial neural network (ANN) dynamic programming (DP) method for optimal feeder capacitor scheduling is presented in this paper. To overcome the time-consuming problem of full dynamic programming method, a strategy of ANN assisted partial DP is proposed. In this method, the DP procedures are performed on historical load data offline. The results are managed and valuable knowledge is extracted by using cluster algorithms. By the assistance of the extracted knowledge, a partial DP of reduced size is then performed online to give the optimal schedule for the forecasted load. Two types of clustering algorithms, hard clustering by Euclidean algorithm and soft clustering by an unsupervised learning neural network, are studied and compared in the paper. The effectiveness of the proposed algorithm is demonstrated by a typical feeder in Taipei City with its 365 days' load records. It is found that execution time of scheduling is highly reduced, while the cost is almost the same as the optimal one derived from full DP. >