scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 1995"


Book
01 May 1995
TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.
Abstract: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization. The treatment focuses on basic unifying themes, and conceptual foundations. It illustrates the versatility, power, and generality of the method with many examples and applications from engineering, operations research, and other fields. It also addresses extensively the practical application of the methodology, possibly through the use of approximations, and provides an extensive treatment of the far-reaching methodology of Neuro-Dynamic Programming/Reinforcement Learning.

10,834 citations


Journal ArticleDOI
TL;DR: The information provided by the user's selected points is explored and an optimal method to detect contours which allows a segmentation of the image is applied, based on dynamic programming (DP), and applies to a wide variety of shapes.
Abstract: The problem of segmenting an image into separate regions and tracking them over time is one of the most significant problems in vision. Terzopoulos et al. (1987) proposed an approach to detect the contour regions of complex shapes, assuming a user selected initial contour not very far from the desired solution. We propose to further explore the information provided by the user's selected points and apply an optimal method to detect contours which allows a segmentation of the image. The method is based on dynamic programming (DP), and applies to a wide variety of shapes. It is exact and not iterative. We also consider a multiscale approach capable of speeding up the algorithm by a factor of 20, although at the expense of losing the guaranteed optimality characteristic. The problem of tracking and matching these contours is addressed. For tracking, the final contour obtained at one frame is sampled and used as initial points for the next frame. Then, the same DP process is applied. For matching, a novel strategy is proposed where the solution is a smooth displacement field in which unmatched regions are allowed while cross vectors are not. The algorithm is again based on DP and the optimal solution is guaranteed. We have demonstrated the algorithms on natural objects in a large spectrum of applications, including interactive segmentation and automatic tracking of the regions of interest in medical images. >

512 citations


Journal ArticleDOI
TL;DR: The augmented Lagrangian relaxation method enhanced by the decomposition and coordination techniques avoids oscillations associated with piece-wise linear cost functions and is fast and efficient in dealing with numerous power system constraints.
Abstract: This paper proposes a new approach based on augmented Lagrangian relaxation for short term generation scheduling problems with transmission and environmental constraints. In this method, the power system constraints, e.g. load demand, spinning reserve, transmission capacity and environmental constraints, are relaxed by using Lagrangian multipliers, and quadratic penalty terms associated with power system load demand balance are added to the Lagrangian objective function. Then, the decomposition and coordination technique is used, and nonseparable quadratic penalty terms are replaced by linearization around the solution obtained from the previous iteration. In order to improve the convergence property, the exactly convex quadratic terms of decision variables are added to the objective function as strongly convex, differentiable and separable auxiliary functions. The overall problem is decomposed into N subproblems, multipliers and penalty coefficients are updated in the dual problem and power system constraints are satisfied iteratively. The corresponding unit commitment subproblems are solved by dynamic programming, and the economic dispatch with transmission and environmental constraints is solved by an efficient network flow programming algorithm. The augmented Lagrangian relaxation method enhanced by the decomposition and coordination techniques avoids oscillations associated with piece-wise linear cost functions. Numerical results indicate that the proposed approach is fast and efficient in dealing with numerous power system constraints. >

484 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present new elimination tests which greatly enhance the performance of a relatively well established dynamic programming approach and its application to the minimization of the total traveling cost for the traveling salesman problem with time windows.
Abstract: This paper presents the development of new elimination tests which greatly enhance the performance of a relatively well established dynamic programming approach and its application to the minimization of the total traveling cost for the traveling salesman problem with time windows. The tests take advantage of the time window constraints to significantly reduce the state space and the number of state transitions. These reductions are performed both a priori and during the execution of the algorithm. The approach does not experience problems stemming from increasing problem size, wider or overlapping time windows, or an increasing number of states nearly as rapidly as other methods. Our computational results indicate that the algorithm was successful in solving problems with up to 200 nodes and fairly wide time windows. When the density of the nodes in the geographical region was kept constant as the problem size was increased, the algorithm was capable of solving problems with up to 800 nodes. For these problems, the CPU time increased linearly with problem size. These problem sizes are much larger than those of problems previously reported in the literature.

318 citations


Journal ArticleDOI
TL;DR: A simple O(n) partitioning algorithm for deriving the optimal linear solution to the Multiple-Choice Knapsack Problem is presented, and it is shown how it may be incorporated in a dynamic programming algorithm such that a minimal number of classes are enumerated, sorted and reduced.

280 citations


Journal ArticleDOI
TL;DR: Heuristics, based on the properties of the optimal solutions, are developed to find "good" solutions for the general problem and derive upper bounds which are useful when evaluating the performance of the heuristics.
Abstract: In this paper we study optimal strategies for renting hotel rooms when there is a stochastic and dynamic arrival of customers from different market segments. We formulate the problem as a stochastic and dynamic programming model and characterize the optimal policies as functions of the capacity and the time left until the end of the planning horizon. We consider three features that enrich the problem: we make no assumptions concerning the particular order between the arrivals of different classes of customers; we allow for multiple types of rooms and downgrading; and we consider requests for multiple nights. We also consider implementations of the optimal policy. The properties we derive for the optimal solution significantly reduce the computational effort needed to solve the problem, yet in the multiple product and/or multiple night case this is often not enough. Therefore, heuristics, based on the properties of the optimal solutions, are developed to find "good" solutions for the general problem. We also derive upper bounds which are useful when evaluating the performance of the heuristics. Computational experiments show a satisfactory performance of the heuristics in a variety of scenarios using real data from a medium size hotel.

242 citations


Book
30 Oct 1995
TL;DR: Semi-regenerative decision models as discussed by the authors describe a basic decision model with robust definitions and assumptions, and examples of Controlled Queues Optimization Problems Renewal Kernels of the decision model special classes of strategies Sufficiency of Markov Strategies Dynamic Programming Discounting in Continuous Time Dynamic Programming Equation Bellman Functions Finite-Horizon Problem Infinite-Horzon Discounted-Cost Problem Random-Horzone Problem Average Cost Criterion Preliminaries: Weak Topology, Limit Passages Prelimineurs: Taboo Probabilities, Limit Theorems for Markov Renewal
Abstract: Semi-Regenerative Decision Models Description of Basic Decision Model Rigorous Definitions and Assumptions Examples of Controlled Queues Optimization Problems Renewal Kernels of the Decision Model Special Classes of Strategies Sufficiency of Markov Strategies Dynamic Programming Discounting in Continuous Time Dynamic Programming Equation Bellman Functions Finite-Horizon Problem Infinite-Horizon Discounted-Cost Problem Random-Horizon Problem Average Cost Criterion Preliminaries: Weak Topology, Limit Passages Preliminaries: Taboo Probabilities, Limit Theorems for Markov Renewal Processes Notation, Recurrence-Communication Assumptions, Examples Existence of Optimal Policies Existence of Optimal Strategies: General Criterion Existence of Optimal Strategies: Sufficient Conditions Optimality Equation Constrained Average-Cost Problem Average-Cost Optimality as Limiting Case of Discounted-Cost Optimality Continuously Controlled Markov Jump Processes Facts About Measurability of Stochastic Processes Marked Point Processes and Random Measures The Predictable s-Algebra Dual Predictable Projections of Random Measures Definition of Controlled Markov Jump Process An M/M/1 Queue With Controllable Input and Service Rate Dynamic Programming Optimization Problems Structured Optimization Problems for Decision Processes Convex Regularization Submodular and Supermodular Functions Existence of Monotone Solutions for Optimization Problems Processes with Bounded Drift Birth and Death Processes Control of Arrivals The Model Description Finite-Horizon Discounted-Cost Problem Cost Functionals Infinite-Horizon Case with and without Discounting Optimal Dynamic Pricing Policy: Model Results Control of Service Mechanism Description of the System Static Optimization Problem Optimal Policies for the Queueing Process Service System with Two Interacting Servers Analysis of Optimality Equation Optimal Control in Models with Several Classes of Customers Description of Models and Processes Associated Controlled Processes Existence of Optimal Simple Strategies for the Systems with Alternating Priority Existence of Optimal Simple Strategy for the System with Feedback Equations for Stationary Distributions Stationary Characteristics of the Systems with Alternating Priority Stationary Characteristics of the System with Feedback Models with Alternating Priority: Linear Programming Problem Linear Programming Problem in the Model with Feedback Model with Periods of Idleness and Discounted-Cost Criterion Basic Formulas Construction of Optimal Modified Priority Discipline Bibliography Index Each chapter also includes an Introduction, and a Remarks and Exercises section

177 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examined whether there is a substantial additional payoff to be derived from using mathematical optimization techniques to globally define a set of mini-clusters and presented a new approximate method to mini clustering that involves solving a multi-vehicle pick-up and delivery problem with time windows by column generation.
Abstract: This paper examines whether there is a substantial additional payoff to be derived from using mathematical optimization techniques to globally define a set of mini-clusters. Specifically, we present a new approximate method to mini-clustering that involves solving a multi-vehicle pick-up and delivery problem with time windows by column generation. To solve this problem we have enhanced an existing optimal algorithm in several ways. First, we present an original network design based on lists of neighboring transportation requests. Second, we have developed a specialized initialization procedure which reduces the processing time by nearly 40%. Third, the algorithm was easily generalized to multi-dimensional capacity. Finally, we have also developed a heuristic to reduce the size of the network, while incurring only small losses in solution quality. This allows the application of our approach to much larger problems. To be able to compare the results of optimization-based and local heuristic mini-clustering,...

173 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed the use of absolute error penalty functions (AEPF) in handling constrained optimal control problems in chemical engineering by posing the problem as a nonsmooth dynamic optimization problem.

166 citations



Journal ArticleDOI
TL;DR: In this article, a mixed-integer hydroelectric power model for short-term generation planning is presented, the advantage of this model is that the schedules only include points with good efficiency.
Abstract: In this paper, a mixed-integer hydroelectric power model for short-term generation planning is presented, The advantage of this model is that the schedules only include points with good efficiency The planning problem is decomposed into a subproblem for each hydroelectric power plant In order to obtain smooth schedules, the model includes start-up costs for hydro aggregates The main mathematical methods used in this work are Lagrange relaxation, dynamic programming and network programming The model is illustrated by a numerical example from a part of the Swedish power system

Journal ArticleDOI
TL;DR: In this article, the authors considered the case where there is a maximum number of times that a piece may be used in a cutting pattern and presented a tree search algorithm for this problem in which the size of the tree search is limited by using a bound derived from a state space relaxation of a dynamic programming formulation of the problem.

Journal ArticleDOI
TL;DR: It is proved that performance improves remarkably when using a tree-based iterative method, which iteratively refines an alignment whenever two subalignments are merged in aTree-based way.
Abstract: Multiple sequence alignment is an important problem in the biosciences. To date, most multiple alignment systems have employed a tree-based algorithm, which combines the results of two-way dynamic programming in a tree-like order of sequence similarity. The alignment quality is not, however, high enough when the sequence similarity is low. Once an error occurs in the alignment process, that error can never be corrected. Recently, an effective new class of algorithms has been developed. These algorithms iteratively apply dynamic programming to partially aligned sequences to improve their alignment quality. The iteration corrects any errors that may have occurred in the alignment process. Such an iterative strategy requires heuristic search methods to solve practical alignment problems. Incorporating such methods yields various iterative algorithms. This paper reports our comprehensive comparison of iterative algorithms. We proved that performance improves remarkably when using a tree-based iterative method, which iteratively refines an alignment whenever two subalignments are merged in a tree-based way. We propose a tree-dependent, restricted partitioning technique to efficiently reduce the execution time of iterative algorithms.

Journal ArticleDOI
TL;DR: A decomposition methodology to generate cost-effective expansion plans for one major component of the network hierarchy-the local access network and the possibility of effectively combining decomposition methods and polyhedral approaches is illustrated.
Abstract: Growing demand, increasing diversity of services, and advances in transmission and switching technologies are prompting telecommunication companies to rapidly expand and modernize their networks. This paper develops and tests a decomposition methodology to generate cost-effective expansion plans, with performance guarantees, for one major component of the network hierarchy-the local access network. The model captures economies of scale in facility costs and tradeoffs between installing concentrators and expanding cables to accommodate demand growth. Our solution method exploits the special tree and routing structure of the expansion planning problem to incorporate valid inequalities, obtained by studying the problem's polyhedral structure, in a dynamic program which solves an uncapacitated version of the problem. Computational results for three realistic test networks demonstrate that our enhanced dynamic programming algorithm, when embedded in a Lagrangian relaxation scheme with problem preprocessing and local improvement, is very effective in generating good upper and lower bounds: Implemented on a personal computer, the method generates solutions within 1.2-7.0% of optimality. In addition to developing a successful solution methodology for a practical problem, this paper illustrates the possibility of effectively combining decomposition methods and polyhedral approaches.

01 Nov 1995
TL;DR: A new algorithm is offered, called the witness algorithm, which can compute updated value functions efficiently on a restricted class of POMDPs in which the number of linear facets is not too great and it is found that it is the fastest algorithm over a wide range of PomDP sizes.
Abstract: We examine the problem of performing exact dynamic-programming updates in partially observable Markov decision processes (POMDPs) from a computational complexity viewpoint. Dynamic-programming updates are a crucial operation in a wide range of POMDP solution methods and we find that it is intractable to perform these updates on piecewise-linear convex value functions for general POMDPs. We offer a new algorithm, called the witness algorithm, which can compute updated value functions efficiently on a restricted class of POMDPs in which the number of linear facets is not too great. We compare the witness algorithm to existing algorithms analytically and empirically and find that it is the fastest algorithm over a wide range of POMDP sizes.

Proceedings ArticleDOI
01 Aug 1995
TL;DR: This paper addresses the problem of optimizing throughput in task pipelines and presents two new solution algorithms based on dynamic programming and finds the optimal mapping of k tasks onto P processors in O(P4k2) time.
Abstract: Many applications in a variety of domains including digital signal processing, image processing and computer vision are composed of a sequence of tasks that act on a stream of input data sets in a pipelined manner. Recent research has established that these applications are best mapped to a massively parallel machine by dividing the tasks into modules and assigning a subset of the available processors to each module. This paper addresses the problem of optimally mapping such applications onto a massively parallel machine. We formulate the problem of optimizing throughput in task pipelines and present two new solution algorithms. The formulation uses a general and realistic model for inter-task communication, takes memory constraints into account, and addresses the entire problem of mapping which includes clustering tasks into modules, assignment of processors to modules, and possible replication of modules. The first algorithm is based on dynamic programming and finds the optimal mapping of k tasks onto P processors in O(P4k2) time. We also present a heuristic algorithm that is linear in the number of processors and establish with theoretical and practical results that the solutions obtained are optimal in practical situations. The entire framework is implemented as an automatic mapping tool for the Fx parallelizing compiler for High Performance Fortran. We present experimental results that demonstrate the importance of choosing a good mapping and show that the methods presented yield efficient mappings and predict optimal performance accurately.

Proceedings Article
20 Sep 1995
TL;DR: The robust link grammar parser as discussed by the authors uses the notion of a null link in order to allow a connection between any pair of adjacent words, regardless of their dictionary definitions, to parse the Switchboard corpus of conversational English.
Abstract: In this paper we present a robust parsing algorithm based on the link grammar formalism for parsing natural languages Our algorithm is a natural extension of the original dynamic programming recognition algorithm which recursively counts the number of linkages between two words in the input sentence The modified algorithm uses the notion of a null link in order to allow a connection between any pair of adjacent words, regardless of their dictionary definitions The algorithm proceeds by making three dynamic programming passes In the first pass, the input is parsed using the original algorithm which enforces the constraints on links to ensure grammaticality In the second pass, the total cost of each substring of words is computed, where cost is determined by the number of null links necessary to parse the substring The final pass counts the total number of parses with minimal cost All of the original pruning techniques have natural counterparts in the robust algorithm When used together with memoization, these techniques enable the algorithm to run efficiently with cubic worst-case complexity We have implemented these ideas and tested them by parsing the Switchboard corpus of conversational English This corpus is comprised of approximately three million words of text, corresponding to more than 150 hours of transcribed speech collected from telephone conversations restricted to 70 different topics Although only a small fraction of the sentences in this corpus are “grammatical” by standard criteria, the robust link grammar parser is able to extract relevant structure for a large portion of the sentences We present the results of our experiments using this system, including the analyses of selected and random sentences from the corpus We placed a version of the robust parser on the Word Wide Web for experimentation It can be reached at URL http://wwwcscmuedu/afs/esemuedu/project/link/www/robusthtml In this version there are some limitations such as the maximum length of a sentence in words and the maximum amount of memory the parser can use

Journal ArticleDOI
TL;DR: In this article, a short-term scheduling of hydrothermal systems by using extended differential dynamic programming and mixed coordination is proposed. But the authors focus on the problem of estimating the impact of an unpredictable change on total cost.
Abstract: This paper addresses short-term scheduling of hydrothermal systems by using extended differential dynamic programming and mixed coordination. The problem is first decomposed into a thermal subproblem and a hydro subproblem by relaxing the supply-demand constraints. The thermal subproblem is solved analytically. The hydro subproblem is further decomposed into a set of smaller problems that can be solved in parallel. Extended differential dynamic programming and mixed coordination are used to solve the hydro subproblem. Two problems are tested and the results show that the new approach performs well under a simulated parallel processing environment, and high speedup is obtained. The method is then extended to handle unpredictable changes in natural inflow by utilizing the variational feedback nature of the control strategy. A quick estimate on the impact of an unpredictable change on total cost is also obtained. Numerical results show that estimates are accurate, and unpredictable change in natural inflow can be quickly and effectively handled.

Proceedings Article
Ralph Neuneier1
27 Nov 1995
TL;DR: Asset allocation is formalized as a Markovian Decision Problem which can be optimized by applying dynamic programming or reinforcement learning based algorithms and is shown to be equivalent to a policy computed by dynamic programming.
Abstract: In recent years, the interest of investors has shifted to computerized asset allocation (portfolio management) to exploit the growing dynamics of the capital markets. In this paper, asset allocation is formalized as a Markovian Decision Problem which can be optimized by applying dynamic programming or reinforcement learning based algorithms. Using an artificial exchange rate, the asset allocation strategy optimized with reinforcement learning (Q-Learning) is shown to be equivalent to a policy computed by dynamic programming. The approach is then tested on the task to invest liquid capital in the German stock market. Here, neural networks are used as value function approximators. The resulting asset allocation strategy is superior to a heuristic benchmark policy. This is a further example which demonstrates the applicability of neural network based reinforcement learning to a problem setting with a high dimensional state space.

Journal ArticleDOI
TL;DR: This work investigates the problem of evaluating Fortran 90-style array expressions on massively parallel distributed-memory machines and presents algorithms based on dynamic programming that solve the embedding problem optimally for several communication cost metrics: multidimensional grids and rings, hypercubes, fat-trees, and the discrete metric.
Abstract: We investigate the problem of evaluating Fortran 90-style array expressions on massively parallel distributed-memory machines. On such a machine, an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor. If the arrays are not aligned in this manner, the cost of aligning them is part of the cost of evaluating the expression tree. The choice of where to perform the operation then affects this cost.We describe the communication cost of the parallel machine theoretically as a metric space; we model the alignment problem as that of finding a minimum-cost embedding of the expression tree into this space. We present algorithms based on dynamic programming that solve the embedding problem optimally for several communication cost metrics: multidimensional grids and rings, hypercubes, fat-trees, and the discrete metric. We also extend our approach to handle operations that change the shape of the arrays.

Journal ArticleDOI
TL;DR: A deterministic optimal control problem is obtained that is equivalent to the stochastic production planning problem under consideration and derived the optimal feedback control policy in terms of the directional derivatives of the value function.

Journal ArticleDOI
TL;DR: A straightforward and practical dynamic programming algorithm is developed that solves the problem of partitioning a sequence of n real numbers into p intervals in time O(p(n-p), which is an improvement of a factor of log p compared to the previous best algorithm.
Abstract: We consider the problem of partitioning a sequence of n real numbers into p intervals such that the cost of the most expensive interval, measured with a cost function f is minimized. This problem is of importance for the scheduling of jobs both in parallel and pipelined environments. We develop a straightforward and practical dynamic programming algorithm that solves this problem in time O(p(n-p)), which is an improvement of a factor of log p compared to the previous best algorithm. A number of variants of the problem are also considered. >

Journal ArticleDOI
TL;DR: A O(kn2 √ m logm √ k log k + k2n2) algorithm which combines convolutions with dynamic programming is shown which solves the Smaller Matching Problem and the k-Aligned Ones with Location Problem.
Abstract: Efficient algorithms exist for the approximate two dimensional matching problem for rectangles. This is the problem of finding all occurrences of an m × m pattern in an n × n text with no more than k mismatch, insertion, and deletion errors. In computer vision it is important to generalize this problem to non-rectangular figures. We make progress towards this goal by defining half-rectangular figures of height m and area a. The approximate two dimensional matching problem for half-rectangular patterns can be solved using a dynamic programming approach in time O(an2). We show an O(kn2formula]formula] + k2n2) algorithm which combines convolutions with dynamic programming. Note that our algorithm is superior to previous known solutions for k ? m13. At the heart of the algorithm are the Smaller Matching Problem and the k-Aligned Ones with Location Problem. These are interesting problems in their own right. Efficient algorithms to solve both these problems are presented.

Journal ArticleDOI
TL;DR: It has been concluded that solutions obtained by this approach are always efficient, hence an “optimal” compromise solution can be introduced.

Journal ArticleDOI
01 Aug 1995
TL;DR: A prototype GENEtic algorithms based decision support SYStem (GENESYS) is designed for the product design problem, which helps the decision maker to avoid those solutions which are caught in local maxima.
Abstract: Often complex decision problems requiring decision aids, such as a Decision Support System (DSS), do not have solution procedures that can generate an optimal solution in a realistic time period. This has led to the specification of heuristic solution procedures. However, the quality of the solution obtained using a heuristic in specific instances can be uncertain and may be open to debate. One approach to increase the confidence in the quality of the obtained solution is to use the triangulation approach recommended and often used in the social sciences. Thus, the result obtained with a specific heuristic can be considered 'good' (i.e., close to optimal) if that result is in the ball park of the result obtained through a maximally different method. In other words, using very different solution techniques helps provide benchmarks and thus enables the decision maker to avoid those solutions which are caught in local maxima. Based on this notion we have designed a prototype GENEtic algorithms based decision support SYStem (GENESYS) for the product design problem. The DSS provides three different solution techniques, specifically, complete enumeration (optimal solution) for small problems, heuristic dynamic programming and genetic algorithms, to address the product design problems.

Book ChapterDOI
13 Dec 1995
TL;DR: A methodological framework is developed and algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture are developed.
Abstract: Summary form only given. We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

02 Jan 1995
TL;DR: This dissertation expands the theoretical and empirical understanding of IDP algorithms and increases their domain of practical application, and proves convergence of a DP-based reinforcement learning algorithm to the optimal policy for any continuous domain.
Abstract: Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended for use in situations where the information or computational resources needed by traditional dynamic programming algorithms are not available. IDP algorithms attempt to find a global solution to a DP problem by incrementally improving local constraint satisfaction properties as experience is gained through interaction with the environment. This class of algorithms is not new, going back at least as far as Samuel's adaptive checkers-playing programs, but the links to DP have only been noted and understood very recently. This dissertation expands the theoretical and empirical understanding of IDP algorithms and increases their domain of practical application. We address a number of issues concerning the use of IDP algorithms for on-line adaptive optimal control. We present a new algorithm, Real-Time Dynamic Programming, that generalizes Korf's Learning Real-Time A* to a stochastic domain, and show that it has computational advantages over conventional DP approaches to such problems. We then describe several new IDP algorithms based on the theory of Least Squares function approximation. Finally, we begin the extension of IDP theory to continuous domains by considering the problem of Linear Quadratic Regulation. We present an algorithm based on Policy Iteration and Watkins' Q-functions and prove convergence of the algorithm (under the appropriate conditions) to the optimal policy. This is the first result proving convergence of a DP-based reinforcement learning algorithm to the optimal policy for any continuous domain. We also demonstrate that IDP algorithms cannot be applied blindly to problems from continuous domains, even such simple domains as Linear Quadratic Regulation.

Journal ArticleDOI
TL;DR: Dynamic programming and branch-and-bound methodologies are combined to produce a hybrid algorithm for the multiple-choice knapsack problem that is faster than the best published algorithm and is simpler to code.

Journal ArticleDOI
TL;DR: An algorithm for the automatic generation of full-stacked layouts in CMOS analog circuits is described, and the quality of results is comparable to that of hand-made circuits.
Abstract: An algorithm for the automatic generation of full-stacked layouts in CMOS analog circuits is described in this paper. The set of stacks obtained is optimum with respect to a cost function which accounts for critical parasitics and device area minimization. Device interleaving and common-centroid patterns are automatically introduced when possible, and all symmetry and matching constraints are enforced. The algorithm is based on operations performed on a graph representation of circuit connectivity, exploiting the equivalence between stack generation and path partitioning in the circuit graph. Path partitioning is carried out in two phases: in the first phase, all paths are generated by a dynamic programming procedure. In the second phase, the optimum partition is selected by solving a clique problem. Original heuristics have been introduced, which preserve the optimality of the solution, while effectively improving the computational efficiency of the algorithm. The algorithm has been implemented in the "C" programming language. Many test cases have been run, and the quality of results is comparable to that of hand-made circuits. Results also demonstrate the effectiveness of the heuristics employed, even for relatively complex circuits. >

Journal ArticleDOI
TL;DR: A brief account of the methods being developed by reinforcement learning researchers is provided, what is novel about them, and what their advantages might be over classical applications of dynamic programming to large-scale stochastic optimal control problems are suggested.