scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic programming published in 2010"


Proceedings ArticleDOI
25 Jul 2010
TL;DR: Empirical studies on a large real-world mobile social network show that this algorithm is more than an order of magnitudes faster than the state-of-the-art Greedy algorithm for finding top-K influential nodes and the error of the approximate algorithm is small.
Abstract: With the proliferation of mobile devices and wireless technologies, mobile social network systems are increasingly available. A mobile social network plays an essential role as the spread of information and influence in the form of "word-of-mouth". It is a fundamental issue to find a subset of influential individuals in a mobile social network such that targeting them initially (e.g. to adopt a new product) will maximize the spread of the influence (further adoptions of the new product). The problem of finding the most influential nodes is unfortunately NP-hard. It has been shown that a Greedy algorithm with provable approximation guarantees can give good approximation; However, it is computationally expensive, if not prohibitive, to run the greedy algorithm on a large mobile network. In this paper we propose a new algorithm called Community-based Greedy algorithm for mining top-K influential nodes. The proposed algorithm encompasses two components: 1) an algorithm for detecting communities in a social network by taking into account information diffusion; and 2) a dynamic programming algorithm for selecting communities to find influential nodes. We also provide provable approximation guarantees for our algorithm. Empirical studies on a large real-world mobile social network show that our algorithm is more than an order of magnitudes faster than the state-of-the-art Greedy algorithm for finding top-K influential nodes and the error of our approximate algorithm is small.

521 citations


Journal Article
TL;DR: The framework of stochastic optimal control with path integrals is used to derive a novel approach to RL with parameterized policies to demonstrate interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well.
Abstract: With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parameterized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open algorithmic parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured. The update equations have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a simulated 12 degree-of-freedom robot dog illustrates the functionality of our algorithm in a complex robot learning scenario. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.

520 citations


Book
08 Nov 2010
TL;DR: Discovering surprises in the face of intractability is found to be a challenge in finding solutions to intractable problems.
Abstract: Today most computer scientists believe that NP-hard problems cannot be solved by polynomial-time algorithms. From the polynomial-time perspective, all NP-complete problems are equivalent but their exponential-time properties vary widely. Why do some NP-hard problems appear to be easier than others? Are there algorithmic techniques for solving hard problems that are significantly faster than the exhaustive, brute-force methods? The algorithms that address these questions are known as exact exponential algorithms.The history of exact exponential algorithms for NP-hard problems dates back to the 1960s. The two classical examples are Bellman, Held and Karps dynamic programming algorithm for the traveling salesman problem and Rysers inclusionexclusion formula for the permanent of a matrix. The design and analysis of exact algorithms leads to a better understanding of hard problems and initiates interesting new combinatorial and algorithmic challenges. The last decade has witnessed a rapid development of the area, with many new algorithmic techniques discovered. This has transformed exact algorithms into a very active research field. This book provides an introduction to the area and explains the most common algorithmic techniques, and the text is supported throughout with exercises and detailed notes for further reading.The book is intended for advanced students and researchers in computer science, operations research, optimization and combinatorics.

494 citations


Journal ArticleDOI
TL;DR: An improved practical version of this method is provided by combining it with a reduced version of the dynamic programming algorithm and it is proved that, in an appropriate asymptotic framework, this method provides consistent estimators of the change points with an almost optimal rate.
Abstract: We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-square criterion with a l1-type penalty for this purpose. We explain how to implement this method in practice by using the LARS / LASSO algorithm. We then prove that, in an appropriate asymptotic framework, this method provides consistent estimators of the change points with an almost optimal rate. We finally provide an improved practical version of this method by combining it with a reduced version of the dynamic programming algorithm and we successfully compare it with classical methods.

329 citations


Journal ArticleDOI
TL;DR: In this article, the authors develop a theory for stochastic control problems which are time inconsistent in the sense that they do not admit a Bellman optimality principle and attach these problems by viewing them within a game theoretic framework, and look for Nash subgame perfect equilibrium points.
Abstract: We develop a theory for stochastic control problems which, in various ways, are time inconsistent in the sense that they do not admit a Bellman optimality principle. We attach these problems by viewing them within a game theoretic framework, and we look for Nash subgame perfect equilibrium points. For a general controlled Markov process and a fairly general objective functional we derive an extension of the standard Hamilton-Jacobi-Bellman equation, in the form of a system of non-linear equations, for the determination for the equilibrium strategy as well as the equilibrium value function. All known examples of time inconsistency in the literature are easily seen to be special cases of the present theory. We also prove that for every time inconsistent problem, there exists an associated time consistent problem such that the optimal control and the optimal value function for the consistent problem coincides with the equilibrium control and value function respectively for the time inconsistent problem. We also study some concrete examples.

315 citations


Proceedings Article
11 Jul 2010
TL;DR: It is shown that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging "equivalent" stacks based on feature values, and the final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging "equivalent" stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce dependency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.

253 citations


Journal ArticleDOI
TL;DR: The proposed method not only enhances the accuracy of the final global optimum but also allows for a reduction of the state-space resolution with maintained accuracy, which substantially reduces the computational effort to calculate the global optimum.
Abstract: In this paper we present issues related to the implementation of dynamic programming for optimal control of a one-dimensional dynamic model, such as the hybrid electric vehicle energy management problem A study on the resolution of the discretized state space emphasizes the need for careful implementation A new method is presented to treat numerical issues appropriately In particular, the method deals with numerical problems that arise due to high gradients in the optimal cost-to-go function These gradients mainly occur on the border of the feasible state region The proposed method not only enhances the accuracy of the final global optimum but also allows for a reduction of the state-space resolution with maintained accuracy The latter substantially reduces the computational effort to calculate the global optimum This allows for further applications of dynamic programming for hybrid electric vehicles such as extensive parameter studies

222 citations


Journal ArticleDOI
TL;DR: In this article, the optimal operation of railway systems minimizing total energy consumption is discussed, and three methods of solving the formulation, dynamic programming (DP), gradient method, and sequential quadratic programming (SQP), are introduced.
Abstract: The optimal operation of railway systems minimizing total energy consumption is discussed in this paper. Firstly, some measures of finding energy-saving train speed profiles are outlined. After the characteristics that should be considered in optimizing train operation are clarified, complete optimization based on optimal control theory is reviewed. Their basic formulations are summarized taking into account most of the difficult characteristics peculiar to railway systems. Three methods of solving the formulation, dynamic programming (DP), gradient method, and sequential quadratic programming (SQP), are introduced. The last two methods can also control the state of charge (SOC) of the energy storage devices. By showing some numerical results of simulations, the significance of solving not only optimal speed profiles but also optimal SOC profiles of energy storage are emphasized, because the numerical results are beyond the conventional qualitative studies. Future scope for applying the methods to real-time optimal control is also mentioned. Copyright © 2010 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

217 citations


Journal ArticleDOI
TL;DR: A novel and tractable approximate dynamic programming method is developed that, coupled with Monte Carlo simulation, computes lower and upper bounds on the value of storage and finds that these heuristics are extremely fast to execute but significantly suboptimal compared to the upper bound.
Abstract: The valuation of the real option to store natural gas is a practically important problem that entails dynamic optimization of inventory trading decisions with capacity constraints in the face of uncertain natural gas price dynamics. Stochastic dynamic programming is a natural approach to this valuation problem, but it does not seem to be widely used in practice because it is at odds with the high-dimensional natural gas price evolution models that are widespread among traders. According to the practice-based literature, practitioners typically value natural gas storage heuristically. The effectiveness of the heuristics discussed in this literature is currently unknown because good upper bounds on the value of storage are not available. We develop a novel and tractable approximate dynamic programming method that, coupled with Monte Carlo simulation, computes lower and upper bounds on the value of storage, which we use to benchmark these heuristics on a set of realistic instances. We find that these heuristics are extremely fast to execute but significantly suboptimal compared to our upper bound, which appears to be fairly tight and much tighter than a simpler perfect information upper bound; computing our lower bound takes more time than using these heuristics, but our lower bound substantially outperforms them in terms of valuation. Moreover, with periodic reoptimizations embedded in Monte Carlo simulation, the practice-based heuristics become nearly optimal, with one exception, at the expense of higher computational effort. Our lower bound with reoptimization is also nearly optimal, but exhibits a higher computational requirement than these heuristics. Besides natural gas storage, our results are potentially relevant for the valuation of the real option to store other commodities, such as metals, oil, and petroleum products.

186 citations


Proceedings Article
31 Mar 2010
TL;DR: In this article, the authors propose to solve the combinatorial problem of nding the highest scoring Bayesian network structure from data, which is viewed as an inference problem where the variables specify the choice of parents for each node in the graph.
Abstract: We propose to solve the combinatorial problem of nding the highest scoring Bayesian network structure from data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial diculty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope dened by valid acyclic structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal Bayesian network. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. Empirical results suggest that the method is competitive or faster than alternative exact methods based on dynamic programming.

186 citations


Journal ArticleDOI
TL;DR: The proposed truncated linear replenishment policy (TLRP) is proposed, which is piecewise linear with respect to demand history, improves upon static and linear policies, and achieves objective values that are reasonably close to optimal.
Abstract: We propose a robust optimization approach to address a multiperiod inventory control problem under ambiguous demands, that is, only limited information of the demand distributions such as mean, support, and some measures of deviations. Our framework extends to correlated demands and is developed around a factor-based model, which has the ability to incorporate business factors as well as time-series forecast effects of trend, seasonality, and cyclic variations. We can obtain the parameters of the replenishment policies by solving a tractable deterministic optimization problem in the form of a second-order cone optimization problem (SOCP), with solution time; unlike dynamic programming approaches, it is polynomial and independent on parameters such as replenishment lead time, demand variability, and correlations. The proposed truncated linear replenishment policy (TLRP), which is piecewise linear with respect to demand history, improves upon static and linear policies, and achieves objective values that are reasonably close to optimal.

Proceedings ArticleDOI
14 Mar 2010
TL;DR: A new efficient approximation algorithm, called Distributed Max-Contribution (DMC) that performs greedy scheduling, routing and replication based only on locally and contemporarily available information is presented.
Abstract: This is by far the first paper considering joint optimization of link scheduling, routing and replication for disruption-tolerant networks (DTNs). The optimization problems for resource allocation in DTNs are typically solved using dynamic programming which requires knowledge of future events such as meeting schedules and durations. This paper defines a new notion of optimality for DTNs, called snapshot optimality where nodes are not clairvoyant, i.e., cannot look ahead into future events, and thus decisions are made using only contemporarily available knowledge. Unfortunately, the optimal solution for snapshot optimality still requires solving an NP-hard problem of maximum weight independent set and a global knowledge of who currently owns a copy and what their delivery probabilities are. This paper presents a new efficient approximation algorithm, called Distributed Max-Contribution (DMC) that performs greedy scheduling, routing and replication based only on locally and contemporarily available information. Through a simulation study based on real GPS traces tracking over 4000 taxies for about 30 days in a large city, DMC outperforms existing heuristically engineered resource allocation algorithms for DTNs.

Journal ArticleDOI
TL;DR: This work points out the difficulties brought by the representation of the road network as a weighted complete graph for general vehicle routing problems and introduces the so-called fixed sequence arc selection problem (FSASP), and proposes a dynamic programming solution method for this problem.

Journal ArticleDOI
TL;DR: In this paper, the equivalent static loads (ESLs) method is proposed and applied to nonlinear dynamic response optimization, where the ESLs are made from the results of non-linear dynamic analysis and used as external forces in linear static response optimization.

Journal ArticleDOI
TL;DR: This paper presents an exact algorithm for the identical parallel machine scheduling problem over a formulation where each variable is indexed by a pair of jobs and a completion time, and is the first time that medium-sized instances of the P||∑wjTj have been solved to optimality.
Abstract: This paper presents an exact algorithm for the identical parallel machine scheduling problem over a formulation where each variable is indexed by a pair of jobs and a completion time. We show that such a formulation can be handled, in spite of its huge number of variables, through a branch cut and price algorithm enhanced by a number of practical techniques, including a dynamic programming procedure to fix variables by Lagrangean bounds and dual stabilization. The resulting method permits the solution of many instances of the P||∑w j T j problem with up to 100 jobs, and having 2 or 4 machines. This is the first time that medium-sized instances of the P||∑w j T j have been solved to optimality.

Posted Content
06 Apr 2010
TL;DR: The proposed pruned DP algorithm is able to process a million points in a matter of minutes compared to several days with the classical DP algorithm and can be extended to other convex losses and as the algorithm process one observation after the other it could be adapted for on-line problems.
Abstract: Multiple change-point detection models assume that the observed data is a realization of an independent random process affected by K-1 abrupt changes, called change-points, at some unknown positions. For off-line detection a dynamic programming (DP) algorithm retrieves the K-1 change-points minimizing the quadratic loss and reduces the complexity from \Theta(n^K) to \Theta(Kn^2) where n is the number of observations. The quadratic complexity in n still restricts the use of such an algorithm to small or intermediate values of n. We propose a pruned DP algorithm that recovers the optimal solution. We demonstrate that at worst the complexity is in O(Kn^2) time and O(Kn) space and is therefore at worst equivalent to the classical DP algorithm. We show empirically that the run-time of our proposed algorithm is drastically reduced compared to the classical DP algorithm. More precisely, our algorithm is able to process a million points in a matter of minutes compared to several days with the classical DP algorithm. Moreover, the principle of the proposed algorithm can be extended to other convex losses (for example the Poisson loss) and as the algorithm process one observation after the other it could be adapted for on-line problems.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: An algorithm for recovering the globally optimal 2D human figure detection using a loopy graph model by recycling the dynamic programming tables associated with the tree model to look up the tree based lower bound rather than recomputing the lower bound from scratch.
Abstract: This paper presents an algorithm for recovering the globally optimal 2D human figure detection using a loopy graph model. This is computationally challenging because the time complexity scales exponentially in the size of the largest clique in the graph. The proposed algorithm uses Branch and Bound (BB) to search for the globally optimal solution. The algorithm converges rapidly in practice and this is due to a novel method for quickly computing tree based lower bounds. The key idea is to recycle the dynamic programming (DP) tables associated with the tree model to look up the tree based lower bound rather than recomputing the lower bound from scratch. This technique is further sped up using Range Minimum Query data structures to provide O(1) cost for computing the lower bound for most iterations of the BB algorithm. The algorithm is evaluated on the Iterative Parsing dataset and it is shown to run fast empirically.

Journal ArticleDOI
TL;DR: This paper develops sequential spectrum sensing algorithms which explicitly take into account the sensing time overhead, and optimize a performance metric capturing the effective average data rate of CR transmitters.
Abstract: Effective spectrum sensing is a critical prerequisite for multi-channel cognitive radio (CR) networks, where multiple spectrum bands are sensed to identify transmission opportunities, while preventing interference to the primary users. The present paper develops sequential spectrum sensing algorithms which explicitly take into account the sensing time overhead, and optimize a performance metric capturing the effective average data rate of CR transmitters. A constrained dynamic programming problem is formulated to obtain the policy that chooses the best time to stop taking measurements and the best set of channels to access for data transmission, while adhering to hard “collision” constraints imposed to protect primary links. Given the associated Lagrange multipliers, the optimal access policy is obtained in closed form, and the subsequent problem reduces to an optimal stopping problem. A basis expansion-based sub-optimal strategy is employed to mitigate the prohibitive computational complexity of the optimal stopping policy. A novel on-line implementation based on the recursive least-squares (RLS) algorithm along with a stochastic dual update procedure is then developed to obviate the lengthy training phase of the batch scheme. Cooperative sequential sensing generalizations are also provided with either raw or quantized measurements collected at a central processing unit. The numerical results presented verify the efficacy of the proposed algorithms.

Journal ArticleDOI
TL;DR: A new dynamic programming decomposition method for the network revenue management problem with customer choice behavior that chooses the revenue allocations by solving an auxiliary optimization problem that takes the probabilistic nature of the customer choices into consideration.
Abstract: In this paper, we propose a new dynamic programming decomposition method for the network revenue management problem with customer choice behavior. The fundamental idea behind our dynamic programming decomposition method is to allocate the revenue associated with an itinerary among the different flight legs and to solve a single-leg revenue management problem for each flight leg in the airline network. The novel aspect of our approach is that it chooses the revenue allocations by solving an auxiliary optimization problem that takes the probabilistic nature of the customer choices into consideration. We compare our approach with two standard benchmark methods. The first benchmark method uses a deterministic linear programming formulation. The second benchmark method is a dynamic programming decomposition idea that is similar to our approach, but it chooses the revenue allocations in an ad hoc manner. We establish that our approach provides an upper bound on the optimal total expected revenue, and this upper bound is tighter than the ones obtained by the two benchmark methods. Computational experiments indicate that our approach provides significant improvements over the performances of the benchmark methods. © 2009 Production and Operations Management Society.

Journal ArticleDOI
TL;DR: A stochastic dynamic programming based motion planning framework developed by modifying the discrete version of an infinite-horizon partially observable Markov decision process algorithm to highlight effectiveness in crowded scenes and flexibility.
Abstract: Automated particle transport using optical tweezers requires the use of motion planning to move the particle while avoiding collisions with randomly moving obstacles. This paper describes a stochastic dynamic programming based motion planning framework developed by modifying the discrete version of an infinite-horizon partially observable Markov decision process algorithm. Sample trajectories generated by this algorithm are presented to highlight effectiveness in crowded scenes and flexibility. The algorithm is tested using silica beads in a holographic tweezer set-up and data obtained from the physical experiments are reported to validate various aspects of the planning simulation framework. This framework is then used to evaluate the performance of the algorithm under a variety of operating conditions.

Journal ArticleDOI
TL;DR: A Petri Net decomposition approach to the optimization of route planning problems for automated guided vehicles (AGVs) in semiconductor fabrication bays and an augmented PN is developed to model the concurrent dynamics for multiple AGVs.
Abstract: In this paper, we propose a Petri Net (PN) decomposition approach to the optimization of route planning problems for automated guided vehicles (AGVs) in semiconductor fabrication bays. An augmented PN is developed to model the concurrent dynamics for multiple AGVs. The route planning problem to minimize the total transportation time is formulated as an optimal transition firing sequence problem for the PN. The PN is decomposed into several subnets such that the subnets are made independent by removing the original shared places and creating its own set of resource places for each subnet with the appropriate connections. The partial solution derived at each subnet is not usually making a feasible solution for the entire PN. The penalty function algorithm is used to integrate the solutions derived at the decomposed subnets. The optimal solution for each subnet is repeatedly generated by using the shortest-path algorithm in polynomial time with a penalty function embedded in the objective function. The effectiveness of the proposed method is demonstrated for a practical-sized route planning problem in semiconductor fabrication bay from computational experiments.

Journal ArticleDOI
TL;DR: In this paper, the optimal bidding strategy for a hybrid system of renewable power generation and energy storage is formulated as a continuous-state Markov decision process and presented a solution based on approximate dynamic programming.
Abstract: A renewable power producer who trades on a day-ahead market sells electricity under supply and price uncertainty. Investments in energy storage mitigate the associated financial risks and allow for decoupling the timing of supply and delivery. This paper introduces a model of the optimal bidding strategy for a hybrid system of renewable power generation and energy storage. We formulate the problem as a continuous-state Markov decision process and present a solution based on approximate dynamic programming. We propose an algorithm that combines approximate policy iteration with Least Squares Policy Evaluation (LSPE) which is used to estimate the weights of a polynomial value function approximation. We find that the approximate policies produce significantly better results for the continuous state space than an optimal discrete policy obtained by linear programming. A numerical analysis of the response surface of rewards on model parameters reveals that supply uncertainty, imbalance costs and a negative correlation of market price and supplies are the main drivers for investments in energy storage. Supply and price autocorrelation, on the other hand, have a negative effect on the value of storage.

Journal ArticleDOI
TL;DR: In this paper, the authors address the bicriterion routing and scheduling problem arising in hazardous materials distribution planning under the assumption that the cost and risk attributes of each arc of the underlying transportation network are time-dependent.
Abstract: Hazardous materials routing and scheduling decisions involve the determination of the minimum cost and/or risk routes for servicing the demand of a given set of customers. This paper addresses the bicriterion routing and scheduling problem arising in hazardous materials distribution planning. Under the assumption that the cost and risk attributes of each arc of the underlying transportation network are time-dependent, the proposed routing and scheduling problem pertains to the determination of the non-dominated time-dependent paths for servicing a given and fixed sequence of customers (intermediate stops) within specified time windows. Due to the heavy computational burden for solving this bicriterion problem, an alternative algorithm is proposed that determines the k-shortest time-dependent paths. Moreover an algorithm is provided for solving the bicriterion problem. The proximity of the solutions of the k-shortest time-dependent path problem with the non-dominated solutions is assessed on a set of problems developed by the authors.

Journal ArticleDOI
TL;DR: It is proved that the natural extension of NFL theorems, for the current formalization of probability, does not hold, but that a weaker form of NFL does hold, by stating the existence of non-trivial distributions of fitness leading to equal performances for all search heuristics.
Abstract: This paper analyses extensions of No-Free-Lunch (NFL) theorems to countably infinite and uncountable infinite domains and investigates the design of optimal optimization algorithms. The original NFL theorem due to Wolpert and Macready states that, for finite search domains, all search heuristics have the same performance when averaged over the uniform distribution over all possible functions. For infinite domains the extension of the concept of distribution over all possible functions involves measurability issues and stochastic process theory. For countably infinite domains, we prove that the natural extension of NFL theorems, for the current formalization of probability, does not hold, but that a weaker form of NFL does hold, by stating the existence of non-trivial distributions of fitness leading to equal performances for all search heuristics. Our main result is that for continuous domains, NFL does not hold. This free-lunch theorem is based on the formalization of the concept of random fitness functions by means of random fields. We also consider the design of optimal optimization algorithms for a given random field, in a black-box setting, namely, a complexity measure based solely on the number of requests to the fitness function. We derive an optimal algorithm based on Bellman’s decomposition principle, for a given number of iterates and a given distribution of fitness functions. We also approximate this algorithm thanks to a Monte-Carlo planning algorithm close to the UCT (Upper Confidence Trees) algorithm, and provide experimental results.

Journal ArticleDOI
TL;DR: The problem of managing an Agile Earth Observing Satellite consists of selecting and scheduling a subset of photographs among a set of candidate ones that satisfy imperative constraints and maximize a gain function and a tabu search algorithm is proposed to solve this NP-hard problem.
Abstract: The problem of managing an Agile Earth Observing Satellite consists of selecting and scheduling a subset of photographs among a set of candidate ones that satisfy imperative constraints and maximize a gain function. We propose a tabu search algorithm to solve this NP-hard problem. This one is formulated as a constrained optimization problem and involves stereoscopic and time window visibility constraints; and a convex evaluation function that increases its hardness. To obtain a wide-ranging and an efficient exploration of the search space, we sample it by consistent and saturated configurations. Our algorithm is also hybridized with a systematic search that uses partial enumerations. To increase the solution quality, we introduce and solve a secondary problem; the minimization of the sum of the transition durations between the acquisitions. Upper bounds are also calculated by a dynamic programming algorithm on a relaxed problem. The obtained results show the efficiency of our approach.

Journal ArticleDOI
TL;DR: The experimental results show that the proposed metaheuristic techniques were successfully employed in solving the scheduling problems of parallel batch-processing machines with makespan criterion.
Abstract: This study addresses the scheduling problem of parallel batch-processing machines encountered in different manufacturing environments, such as the burn-in operation in the manufacture of semiconductors and the aging test operation in the manufacture of thin film transistor-liquid crystal displays (TFT-LCDs). Each machine simultaneously processes several jobs in a batch, as long as the total size of all jobs in the batch does not exceed machine capacity. The processing time of a batch is represented by the longest time among the jobs in the batch. For this problem, a mixed integer programming (MIP) model is provided, and metaheuristics based on simulated annealing (SA) and genetic algorithm (GA) are proposed. In the proposed GA and SA, a string with (n+m-1) numbers is used to assign jobs into each machine. The multi-stage dynamic programming (MSDP) algorithm, taken from the dynamic programming (DP) algorithm of Chou in 2007, is then applied to group the jobs into batches for each machine. The experimental results show that the proposed metaheuristic techniques were successfully employed in solving the scheduling problems of parallel batch-processing machines with makespan criterion.

Proceedings Article
Scott Sanner1, Kristian Kersting
11 Jul 2010
TL;DR: This work shows that it is also possible to exploit the full expressive power of first-order quantification to achieve state, action, and observation abstraction in a dynamic programming solution to relationally specified POMDPs.
Abstract: Partially-observable Markov decision processes (POMDPs) provide a powerful model for sequential decision-making problems with partially-observed state and are known to have (approximately) optimal dynamic programming solutions. Much work in recent years has focused on improving the efficiency of these dynamic programming algorithms by exploiting symmetries and factored or relational representations. In this work, we show that it is also possible to exploit the full expressive power of first-order quantification to achieve state, action, and observation abstraction in a dynamic programming solution to relationally specified POMDPs. Among the advantages of this approach are the ability to maintain compact value function representations, abstract over the space of potentially optimal actions, and automatically derive compact conditional policy trees that minimally partition relational observation spaces according to distinctions that have an impact on policy values. This is the first lifted relational POMDP solution that can optimally accommodate actions with a potentially infinite relational space of observation outcomes.

Reference EntryDOI
15 May 2010
TL;DR: In this paper, the authors outline the structure of stochastic control problems that appear in insurance models and describe the procedure of the solution via the dynamic programming principle and list the recent advances in this area.
Abstract: We outline the structure of stochastic control problems that appear in insurance models. These are classified according to the type of modeling used, the nature of the controls involved, and the objectives, such as ruin probability minimization or dividend optimization. We describe the procedure of the solution via the dynamic programming principle and list the recent advances in this area. Keywords: stochastic control; reinsurance; dynamic programming; Hamilton–Jacobi–Bellman equation; singular control; impulse control

Journal ArticleDOI
TL;DR: In this paper, an optimal control scheme for a class of nonlinear systems with time delays in both state and control variables with respect to a quadratic performance index function is proposed using a new iterative adaptive dynamic programming (ADP) algorithm.

Journal ArticleDOI
TL;DR: It is shown how a stochastic viability kernel and viable feedbacks relying on probability (or chance) constraints can be defined and computed by a dynamic programming equation.