scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 2001"


Journal ArticleDOI
TL;DR: It is shown that LAO* can be used to solve Markov decision problems and that it shares the advantage heuristic search has over dynamic programming for other classes of problems.

431 citations


Journal ArticleDOI
TL;DR: It is demonstrated how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs.
Abstract: We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

396 citations


Proceedings Article
04 Aug 2001
TL;DR: This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions and produces a logical description of the optimal value function and policy by constructing a set of first-order formulae that minimally partition state space according to distinctions made by the valuefunction and policy.
Abstract: We present a dynamic programming approach for the solution of first-order Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions. It produces a logical description of the optimal value function and policy by constructing a set of first-order formulae that minimally partition state space according to distinctions made by the value function and policy. This is achieved through the use of an operation known as decision-theoretic regression. In effect, our algorithm performs value iteration without explicit enumeration of either the state or action spaces of the MDP. This allows problems involving relational fluents and quantification to be solved without requiring explicit state space enumeration or conversion to propositional form.

262 citations


Journal ArticleDOI
TL;DR: In this paper, a greedy randomized adaptive search procedure (GRASP) is applied to solve the transmission network expansion problem, and the best solution over all GRASP iterations is chosen as the result.
Abstract: A greedy randomized adaptive search procedure (GRASP) is a heuristic method that has shown to be very powerful in solving combinatorial problems. In this paper we apply GRASP to solve the transmission network expansion problem. This procedure is an expert iterative sampling technique that has two phases for each iteration. The first, construction phase, finds a feasible solution for the problem. The second phase, a local search, seeks for improvements on construction phase solution by a local search. The best solution over all GRASP iterations is chosen as the result.

238 citations


Journal ArticleDOI
TL;DR: This paper gives the first rigorous convergence analysis of analogues of Watkins's Q-learning algorithm, applied to average cost control of finite-state Markov chains, using the ODE method.
Abstract: This paper gives the first rigorous convergence analysis of analogues of Watkins's Q-learning algorithm, applied to average cost control of finite-state Markov chains. We discuss two algorithms which may be viewed as stochastic approximation counterparts of two existing algorithms for recursively computing the value function of the average cost problem---the traditional relative value iteration (RVI) algorithm and a recent algorithm of Bertsekas based on the stochastic shortest path (SSP) formulation of the problem. Both synchronous and asynchronous implementations are considered and analyzed using the ODE method. This involves establishing asymptotic stability of associated ODE limits. The SSP algorithm also uses ideas from two-time-scale stochastic approximation.

208 citations


Journal ArticleDOI
TL;DR: The proposed algorithm is particularly effective when the facility reopening and closing costs are relatively significant in the multi-period problem, and can be implemented to solve the composite problem.

168 citations


Journal ArticleDOI
TL;DR: In this article, the authors present an automated and practical optimization model for repetitive construction projects such as highways, high-rise buildings, and housing projects, which utilizes dynamic programming formulation and incorporates a scheduling algorithm and an interruption algorithm so as to automate the generation of interruptions during scheduling.
Abstract: Optimizing resource utilization can lead to significant reduction in the duration and cost of repetitive construction projects such as highways, high-rise buildings, and housing projects. This can be achieved by identifying an optimum crew size and interruption strategy for each activity in the project. Available dynamic programming formulations can be applied to provide solutions for this optimization problem; however, their application is limited, as they require planners to specify an arbitrary and an unbounded set of interruption options prior to scheduling. Such a requirement is not practical and may render the optimization problem infeasible. To circumvent the limitations of available formulations, this paper presents an automated and practical optimization model. The model utilizes dynamic programming formulation and incorporates a scheduling algorithm and an interruption algorithm so as to automate the generation of interruptions during scheduling. This transforms the consideration of interruption options, in optimizing resource utilization, from an unbounded and impractical problem to a bounded and feasible one. A numerical example from the literature is analyzed to illustrate the use and capabilities of the model.

158 citations


Journal ArticleDOI
TL;DR: In this paper, a new algorithm for the optimal feeder routing problem using the dynamic programming technique and geographical information systems (GIS) facilities is proposed, where all practical issues, such as cost parameters (investments, line losses, reliability), technical constraints (voltage drop and thermal limits), as well as physical routing constraints (obstacles, high-cost passages, existing line sections) are taken into consideration.
Abstract: Optimal feeder routing is an important part of the general optimal distribution network planning. This paper proposes a new algorithm for the optimal feeder routing problem using the dynamic programming technique and geographical information systems (GIS) facilities. All practical issues, such as cost parameters (investments, line losses, reliability) and technical constraints (voltage drop and thermal limits), as well as physical routing constraints (obstacles, high-cost passages, existing line sections) are taken into consideration. The algorithm developed is validated comparing its results for a simplified study case, with those obtained by an established solver. The effectiveness of the algorithm is further illustrated for a "real-world" study case.

152 citations


Journal ArticleDOI
TL;DR: In this article, the authors use dynamic programming techniques to describe reach sets and related problems of forward and backward reachability, which are reformulated in terms of optimization problems solved through the Hamilton-Jacobi-Bellman Equations.
Abstract: This paper uses dynamic programming techniques to describe reach sets andrelated problems of forward and backward reachability The original problemsdo not involve optimization criteria and are reformulated in terms ofoptimization problems solved through the Hamilton–Jacobi–Bellmanequations The reach sets are the level sets of the value function solutionsto these equations Explicit solutions for linear systems with hard boundsare obtained Approximate solutions are introduced and illustrated forlinear systems and for a nonlinear system similar to that of theLotka–Volterra type

152 citations


Journal ArticleDOI
TL;DR: This paper analyzes the characteristics of existing techniques for meta-level control of anytime algorithms and develops a new framework for monitoring and control that handles effectively the uncertainty associated with the algorithm’s performance profile, the uncertaintyassociated with the domain of operation, and the cost of monitoring progress.

122 citations


Journal ArticleDOI
TL;DR: A fuzzy decision making approach has been developed to solve the unit commitment problem and to train the artificial neural network to incorporate the changes due to the addition of new constraints automatically, an expert system (ES).

Journal ArticleDOI
TL;DR: In this article, a new method of moving force identification is developed by making use of the dynamic programming technique to overcome the weakness of having large fluctuations in the identified results, and results from the simulation studies and laboratory work show great improvements over existing methods in the accuracy of identification.

Book ChapterDOI
10 Jun 2001
TL;DR: In this article, the authors consider linear multistage stochastic integer programs and study their functional and dynamic programming formulations as well as conditions for optimality and stability of solutions.
Abstract: We consider linear multistage stochastic integer programs and study their functional and dynamic programming formulations as well as conditions for optimality and stability of solutions. Furthermore, we study the application of the Rockafellar-Wets dualization approach as well as the structure and algorithmic potential of corresponding dual problems. For discrete underlying probability distributions we discuss possible large scale mixed-integer linear programming formulations and three dual decomposition approaches, namely, scenario, component and nodal decomposition.

Journal ArticleDOI
TL;DR: The problem of sequential vector quantization of a stationary Markov source is cast as an equivalent stochastic control problem with partial observations, leading to a characterization of optimal encoding schemes.
Abstract: The problem of sequential vector quantization of a stationary Markov source is cast as an equivalent stochastic control problem with partial observations. This problem is analyzed using the techniques of dynamic programming, leading to a characterization of optimal encoding schemes.

Journal ArticleDOI
28 Jun 2001
TL;DR: This paper generalizes the Kalman filter to one that approximates the fixed point of an operator that is known to be a Euclidean norm contraction, and establishes convergence of the algorithm and explores efficiency gains through computational experiments involving optimal stopping and queueing problems.
Abstract: The traditional Kalman filter can be viewed as a recursive stochastic algorithm that approximates an unknown function via a linear combination of prespecified basis functions given a sequence of noisy samples. In this paper, we generalize the algorithm to one that approximates the fixed point of an operator that is known to be a Euclidean norm contraction. Instead of noisy samples of the desired fixed point, the algorithm updates parameters based on noisy samples of functions generated by application of the operator, in the spirit of Robbins---Monro stochastic approximation. The algorithm is motivated by temporal-difference learning, and our developments lead to a possibly more efficient variant of temporal-difference learning. We establish convergence of the algorithm and explore efficiency gains through computational experiments involving optimal stopping and queueing problems.

Journal ArticleDOI
TL;DR: This work casts the progressive hedging algorithm (PHA) of Rockafellar and Wets in a meta-heuristic framework with the sub-problems generated for each scenario solved heuristically, and uses an algorithm for sub-Problems that is exact in its usual context but serves as a heuristic for the meta- heuristic.

Journal ArticleDOI
01 Oct 2001
TL;DR: The paper presents a new genetic algorithm (GA)-based discrete dynamic programming (DDP) approach for generating static schedules in a flexible manufacturing system (FMS) environment that is capable of identifying locally optimized partial schedules and shares the computation efficiency of dynamic programming.
Abstract: The paper presents a new genetic algorithm (GA)-based discrete dynamic programming (DDP) approach for generating static schedules in a flexible manufacturing system (FMS) environment. This GA-DDP approach adopts a sequence-dependent schedule generation strategy, where a GA is employed to generate feasible job sequences and a series of discrete dynamic programs are constructed to generate legal schedules for a given sequence of jobs. In formulating the GA, different performance criteria could be easily included. The developed DDF algorithm is capable of identifying locally optimized partial schedules and shares the computation efficiency of dynamic programming. The algorithm is designed In such a way that it does not suffer from the state explosion problem inherent in pure dynamic programming approaches in FMS scheduling. Numerical examples are reported to illustrate the approach.

Journal ArticleDOI
TL;DR: This work describes an on-line algorithm that greedily acknowledges exactly when the cost for an acknowledgment is less than the latency cost incurred by not acknowledging, and shows that for each objective function, at least one of the algorithms is optimal.
Abstract: We study an on-line problem that is motivated by the networking problem of dynamically adjusting of acknowledgments in the Transmission Control Protocol (TCP). We provide a theoretical model for this problem in which the goal is to send acks at a time that minimize a linear combination of the cost for the number of acknowledgments sent and the cost for the additional latency introduced by delaying acknowledgments. To study the usefulness of applying packet arrival time prediction to this problem, we assume there is an oracle that provides the algorithm with the times of the next L arrivals, for some L ≥ 0.We give two different objective functions for measuring the cost of a solution, each with its own measure of latency cost. For each objective function we first give an O(n2)-time dynamic programming algorithm for optimally solving the off-line problem. Then we describe an on-line algorithm that greedily acknowledges exactly when the cost for an acknowledgment is less than the latency cost incurred by not acknowledging. We show that for this algorithm there is a sequence of n packet arrivals for which it is O (***)-competitive for the first objective function, 2-competitive for the second function for L = 0, and 1-competitivefor the second function for L = 1. Next we present a second on-line algorithm which is a slight modification of the first, and we prove that it is 2-competitive for both objective functions for all L. We also give lower bounds on the competitive ratio for any deterministic on-line algorithm. These results show that for each objective function, at least one of our algorithms is optimal.Finally, we give some initial empirical results using arrival sequences from real network traffic where we compare the two methods used in TCP for acknowledgment delay with our two on-line algorithms. In all cases we examine performance with L = 0 and L = 1.

Journal ArticleDOI
TL;DR: Results show that this adaptive-critic based systematic approach holds promise for obtaining the optimal control design of both linear and nonlinear distributed parameter systems.

Book ChapterDOI
01 Jan 2001
TL;DR: This work shows how the homonym function in harmonic analysis is (and how it is not) the same stochastic optimal control Bellman function, and presents several creatures from Bellman’s Zoo.
Abstract: The stochastic optimal control uses the differential equation of Bell-man and its solution—the Bellman function. We show how the homonym function in harmonic analysis is (and how it is not) the same stochastic optimal control Bellman function. Then we present several creatures from Bellman’s Zoo: a function that proves the inverse Holder inequality, as well as several other harmonic analysis Bellman functions and their corresponding Bellman PDE’s. Finally we translate the approach of Burkholder to the language of “our” Bellman function.

Journal ArticleDOI
TL;DR: This paper proposes three energy saving strategies, namely (i) assigning live variables to registers, (ii) avoiding repetitive address computations, and (iii) minimizing memory accesses, and concludes that by suitably choosing an algorithm for a problem and applying the energy saving techniques, energy savings can be achieved.
Abstract: A variety of systems with possibly embedded computing power, such as small portable robots, hand-held computers, and automated vehicles, have power supply constraints. Their batteries generally last only for a few hours before being replaced or recharged. It is important that all design efforts are made to conserve power in those systems. Energy consumption in a system can be reduced using a number of techniques, such as low-power electronics, architecture-level power reduction, compiler techniques, to name just a few. However, energy conservation at the application software-level has not yet been explored. In this paper, we show the impact of various software implementation techniques on energy saving. Based on the observation that different instructions of a processor cost different amount of energy, we propose three energy saving strategies, namely (i) assigning live variables to registers, (ii) avoiding repetitive address computations, and (iii) minimizing memory accesses. We also study how a variety of algorithm design and implementation techniques affect energy consumption. In particular, we focus on the following aspects: (i) recursive versus iterative (with stacks and without stacks), (ii) different representations of the same algorithm, (iii) different algorithms - with identical asymptotic complexity - for the same problem, and (iv) different input representations. We demonstrate the energy saving capabilities of these approaches by studying a variety of applications related to power-conscious systems, such as sorting, pattern matching, matrix operations, depth-first search, and dynamic programming. From our experimental results, we conclude that by suitably choosing an algorithm for a problem and applying the energy saving techniques, energy savings in excess of 60% can be achieved.

Journal ArticleDOI
Mhand Hifi1
TL;DR: Two exact algorithms for solving both two-staged and three staged unconstrained (un)weighted cutting problems and their performance is evaluated on some problem instances of the literature and other hard randomly-generated problem instances.
Abstract: In this paper we propose two exact algorithms for solving both two-staged and three staged unconstrained (un)weighted cutting problems. The two-staged problem is solved by applying a dynamic programming procedure originally developed by Gilmore and Gomory [Gilmore and Gomory, Operations Research, vol. 13, pp. 94–119, 1965]. The three-staged problem is solved by using a top-down approach combined with a dynamic programming procedure. The performance of the exact algorithms are evaluated on some problem instances of the literature and other hard randomly-generated problem instances (a total of 53 problem instances). A parallel implementation is an important feature of the algorithm used for solving the three-staged version.

Journal ArticleDOI
TL;DR: The efficiency and the speed of this multiscale optimization strategy is demonstrated in the difficult context of the minimization of a region-based contour energy function ensuring the boundary detection of anatomical structures in ultrasound medical imagery.

Journal ArticleDOI
TL;DR: This paper examines a model that incorporates a fundamental cause of the e$ciency/timeliness con#ict in practice and proposes solution methodologies and properties of an optimal solution for the purpose of exposing insights that may ultimately be useful in research on more complex models.

Journal ArticleDOI
TL;DR: This work improves classical optimal control techniques for problems of interest to the authors by introducing a simplicial complex representation and proposing a novel interpolation scheme that reduces a key bottleneck in the techniques from O(2n) running time to O(n lg n), in which n is the state space dimension.
Abstract: The authors address the problem of computing a navigation function that serves as a feedback motion strategy for problems that involve generic differential constraints, nonconvex collision constraints, and the optimization of a specified criterion. The determination of analytical solutions to such problems is well beyond the state of the art; therefore, the authors focus on obtaining numerical solutions that are based on discretization of the state space (although they do not force trajectories to visit discretized points). This work improves classical optimal control techniques for problems of interest to the authors. By introducing a simplicial complex representation, the authors propose a novel interpolation scheme that reduces a key bottleneck in the techniques from O(2n) running time to O(n lg n), in which n is the state space dimension. By exploiting local structure in the differential constraints, the authors present a progressive series of three improved algorithms that use dynamic programming con...

Journal ArticleDOI
TL;DR: Two algorithms for solving both unweighted and weighted constrained two-dimensional two-staged cutting stock problems with good lower and upper bounds which lead to significant branching cuts are proposed.
Abstract: In this paper we propose two algorithms for solving both unweighted and weighted constrained two-dimensional two-staged cutting stock problems. The problem is called two-staged cutting problem because each produced (sub)optimal cutting pattern is realized by using two cut-phases. In the first cut-phase, the current stock rectangle is slit down its width (resp. length) into a set of vertical (resp. horizontal) strips and, in the second cut-phase, each of these strips is taken individually and chopped across its length (resp. width). First, we develop an approximate algorithm for the problem. The original problem is reduced to a series of single bounded knapsack problems and solved by applying a dynamic programming procedure. Second, we propose an exact algorithm tailored especially for the constrained two-staged cutting problem. The algorithm starts with an initial (feasible) lower bound computed by applying the proposed approximate algorithm. Then, by exploiting dynamic programming properties, we obtain good lower and upper bounds which lead to significant branching cuts. Extensive computational testing on problem instances from the literature shows the effectiveness of the proposed approximate and exact approaches.

Journal ArticleDOI
TL;DR: A simple polynomial-time heuristic to order tests is developed, based on criteria that offer local optimality, that is able to generate an improved order for large problem sizes when the optimal algorithm is not able to do so.
Abstract: As tester complexity and cost increase, reducing test time is an important manufacturing priority. Test time can be reduced by ordering tests so as to fail defective units early in the test process. Algorithms to order tests that guarantee optimality require execution time that is exponential in the number of tests applied. We develop a simple polynomial-time heuristic to order tests. The heuristic, based on criteria that offer local optimality, offers globally optimal solutions in many cases. An ordering algorithm requires information on the ability of tests to detect defective units. One way to obtain this information is by simulation. We obtain it by applying all possible tests to a small subset of manufactured units and assuming the information obtained from this subset is representative. The ordering heuristic was applied to manufactured digital and analog integrated circuits (ICs) tested with commercial testers. When both approaches work, the orders generated by the heuristic are optimal. More importantly, the heuristic is able to generate an improved order for large problem sizes when the optimal algorithm is not able to do so. The new test orders result in a significant reduction, as high as a factor of four, in the time needed to identify defective units. We also assess the validity of using such sampling techniques to order tests.

01 Jan 2001
TL;DR: In this article, the authors show that optimal control and dynamic programming are equivalent solutions to the problem at hand and demonstrate that the functional form of the optimal response function is stated a priori instead of resulting from the optimization calculus.
Abstract: Recent research on rules and discretion in monetary policy emphasizes the empirically well based property of persistence in Output and employment, making optimal monetary policy the Solution to a dynamic rather than a repeated game. As in the static case, the precommitment Solution serves as reference comparing alternative Solutions. Methodologically, existing literature is not fully convincing in deriving the precommitment Solution. Firstly, elements of Optimal Control (Lagrange method) are used within an otherwise Dynamic Programming framework (Bellman method). Secondly, the optimal rationally expected inflation is not derived explicitly. Thirdly, the functional form of the optimal response function is stated a priori instead of resulting from the optimization calculus. The present paper overcomes these shortcomings and demonstrates that Optimal Control and Dynamic Programming are equivalent Solution methods to the problem at hand.

Journal ArticleDOI
TL;DR: A class of terminating Markov decision processes with an exponential risk-averse objective function and compact constraint sets is considered, establishing the existence of a real-valued optimal cost function which can be achieved by a stationary policy.

Proceedings ArticleDOI
04 Nov 2001
TL;DR: This paper studies the problem of constructing routing trees with simultaneous buffer insertion and wire sizing in the presence of routing and buffer obstacles, and proposes a hierarchical approach to construct routing tree for a large number of sinks.
Abstract: Buffer insertion and wire sizing are critical in deep submicron VLSI design. This paper studies the problem of constructing routing trees with simultaneous buffer insertion and wire sizing in the presence of routing and buffer obstacles. No previous algorithms consider all these factors simultaneously. Previous dynamic programming based algorithm is first extended to solve the problem. However, with the size of routing graph increasing and with wire sizing taken into account, the time and space requirement increases enormously. Then a new approach is proposed to formulate the problem as a series of graph problems. The routing tree solution is obtained by finding shortest paths in a series of graphs. In the new approach, wire sizing can be handled almost without any additional time and space requirement, Moreover, the time and space requirement is only polynomial in terms of the size of routing graph. Our algorithm differs from traditional dynamic programming, and is capable of addressing the problem of inverter insertion and sink polarity. Both theoretical and experimental results show that the graph-based algorithm outperforms the DP-based algorithm by a large margin. We also propose a hierarchical approach to construct routing tree for a large number of sinks.