Showing papers on "Dynamic programming published in 2013"

PDF

Open Access

Posted Content•

SPUDD: Stochastic Planning using Decision Diagrams

[...]

Jesse Hoey¹, Robert St-Aubin¹, Alan J. Hu¹, Craig Boutilier¹•Institutions (1)

23 Jan 2013-arXiv: Artificial Intelligence

TL;DR: This work proposes and examines a new value iteration algorithm for MDPs that uses algebraic decision diagrams (ADDs) to represent value functions and policies, assuming an ADD input representation of the MDP.

...read moreread less

Abstract: Markov decisions processes (MDPs) are becoming increasing popular as models of decision theoretic planning. While traditional dynamic programming methods perform well for problems with small state spaces, structured methods are needed for large problems. We propose and examine a value iteration algorithm for MDPs that uses algebraic decision diagrams(ADDs) to represent value functions and policies. An MDP is represented using Bayesian networks and ADDs and dynamic programming is applied directly to these ADDs. We demonstrate our method on large MDPs (up to 63 million states) and show that significant gains can be had when compared to tree-structured representations (with up to a thirty-fold reduction in the number of nodes required to represent optimal value functions).

...read moreread less

416 citations

Journal Article•DOI•

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

[...]

Ashutosh Nayyar¹, Aditya Mahajan², Demosthenis Teneketzis³•Institutions (3)

University of California, Berkeley¹, McGill University², University of Michigan³

11 Jan 2013-IEEE Transactions on Automatic Control

TL;DR: A general model of decentralized stochastic control called partial history sharing information structure is presented and the optimal control problem at the coordinator is shown to be a partially observable Markov decision process (POMDP) which is solved using techniques fromMarkov decision theory.

...read moreread less

Abstract: A general model of decentralized stochastic control called partial history sharing information structure is presented. In this model, at each step the controllers share part of their observation and control history with each other. This general model subsumes several existing models of information sharing as special cases. Based on the information commonly known to all the controllers, the decentralized problem is reformulated as an equivalent centralized problem from the perspective of a coordinator. The coordinator knows the common information and selects prescriptions that map each controller's local information to its control actions. The optimal control problem at the coordinator is shown to be a partially observable Markov decision process (POMDP) which is solved using techniques from Markov decision theory. This approach provides 1) structural results for optimal strategies and 2) a dynamic program for obtaining optimal strategies for all controllers in the original decentralized problem. Thus, this approach unifies the various ad-hoc approaches taken in the literature. In addition, the structural results on optimal control strategies obtained by the proposed approach cannot be obtained by the existing generic approach (the person-by-person approach) for obtaining structural results in decentralized problems; and the dynamic program obtained by the proposed approach is simpler than that obtained by the existing generic approach (the designer's approach) for obtaining dynamic programs in decentralized problems.

...read moreread less

319 citations

Journal Article•DOI•

Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems

[...]

Derong Liu, Qinglai Wei

07 Mar 2013-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite-horizon discrete-time nonlinear systems with finite approximation errors and it is shown that the iterative performance index functions can converge to a finite neighborhood of the greatest lower bound of all performance indexes.

...read moreread less

Abstract: In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite-horizon discrete-time nonlinear systems with finite approximation errors. The idea is to use an iterative ADP algorithm to obtain the iterative control law that makes the iterative performance index function reach the optimum. When the iterative control law and the iterative performance index function in each iteration cannot be accurately obtained, the convergence conditions of the iterative ADP algorithm are obtained. When convergence conditions are satisfied, it is shown that the iterative performance index functions can converge to a finite neighborhood of the greatest lower bound of all performance index functions under some mild assumptions. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.

...read moreread less

263 citations

Journal Article•DOI•

Single-Train Trajectory Optimization

[...]

Shaofeng Lu¹, Stuart Hillmansen¹, Tin Kin Ho², Clive J. Roberts¹•Institutions (2)

University of Birmingham¹, University of Wollongong²

01 Jun 2013-IEEE Transactions on Intelligent Transportation Systems

TL;DR: This paper proposes a distance-based train trajectory searching model, upon which three optimization algorithms are applied to search for the optimum train speed trajectory, found that the ant colony optimization (ACO) algorithm obtains better balance between stability and the quality of the results, in comparison with the genetic algorithm (GA).

...read moreread less

Abstract: An energy-efficient train trajectory describing the motion of a single train can be used as an input to a driver guidance system or to an automatic train control system. The solution for the best trajectory is subject to certain operational, geographic, and physical constraints. There are two types of strategies commonly applied to obtain the energy-efficient trajectory. One is to allow the train to coast, thus using its available time margin to save energy. The other one is to control the speed dynamically while maintaining the required journey time. This paper proposes a distance-based train trajectory searching model, upon which three optimization algorithms are applied to search for the optimum train speed trajectory. Instead of searching for a detailed complicated control input for the train traction system, this model tries to obtain the speed level at each preset position along the journey. Three commonly adopted algorithms are extensively studied in a comparative style. It is found that the ant colony optimization (ACO) algorithm obtains better balance between stability and the quality of the results, in comparison with the genetic algorithm (GA). For offline applications, the additional computational effort required by dynamic programming (DP) is outweighed by the quality of the solution. It is recommended that multiple algorithms should be used to identify the optimum single-train trajectory and to improve the robustness of searched results.

...read moreread less

262 citations

Posted Content•

Fast Marching Tree: a Fast Marching Sampling-Based Method for Optimal Motion Planning in Many Dimensions

[...]

Lucas Janson, Edward Schmerling, Ashley Clark, Marco Pavone

15 Jun 2013-arXiv: Robotics

TL;DR: The Fast Marching Tree algorithm (FMT*) as mentioned in this paper is a sampling-based motion planning algorithm for high-dimensional configuration spaces that is proven to be asymptotically optimal and converges to an optimal solution faster than its state-of-the-art counterparts.

...read moreread less

Abstract: In this paper we present a novel probabilistic sampling-based motion planning algorithm called the Fast Marching Tree algorithm (FMT*). The algorithm is specifically aimed at solving complex motion planning problems in high-dimensional configuration spaces. This algorithm is proven to be asymptotically optimal and is shown to converge to an optimal solution faster than its state-of-the-art counterparts, chiefly PRM* and RRT*. The FMT* algorithm performs a "lazy" dynamic programming recursion on a predetermined number of probabilistically-drawn samples to grow a tree of paths, which moves steadily outward in cost-to-arrive space. As a departure from previous analysis approaches that are based on the notion of almost sure convergence, the FMT* algorithm is analyzed under the notion of convergence in probability: the extra mathematical flexibility of this approach allows for convergence rate bounds--the first in the field of optimal sampling-based motion planning. Specifically, for a certain selection of tuning parameters and configuration spaces, we obtain a convergence rate bound of order $O(n^{-1/d+\rho})$, where $n$ is the number of sampled points, $d$ is the dimension of the configuration space, and $\rho$ is an arbitrarily small constant. We go on to demonstrate asymptotic optimality for a number of variations on FMT*, namely when the configuration space is sampled non-uniformly, when the cost is not arc length, and when connections are made based on the number of nearest neighbors instead of a fixed connection radius. Numerical experiments over a range of dimensions and obstacle configurations confirm our theoretical and heuristic arguments by showing that FMT*, for a given execution time, returns substantially better solutions than either PRM* or RRT*, especially in high-dimensional configuration spaces and in scenarios where collision-checking is expensive.

...read moreread less

254 citations

Journal Article•DOI•

Risk neutral and risk averse Stochastic Dual Dynamic Programming method

[...]

Alexander Shapiro¹, Wajdi Tekaya¹, Joari Paulo da Costa, Murilo Pereira Soares•Institutions (1)

Georgia Institute of Technology¹

16 Jan 2013-European Journal of Operational Research

TL;DR: Risk neutral and risk averse approaches to multistage (linear) stochastic programming problems based on the Stochastic Dual Dynamic Programming (SDDP) method are discussed.

...read moreread less

235 citations

Journal Article•DOI•

Comparative Study of Dynamic Programming and Pontryagin’s Minimum Principle on Energy Management for a Parallel Hybrid Electric Vehicle

[...]

Zou Yuan¹, Liu Teng¹, Sun Feng-chun¹, Huei Peng•Institutions (1)

Beijing Institute of Technology¹

22 Apr 2013-Energies

TL;DR: In this article, the authors compared two optimal energy management methods for parallel hybrid electric vehicles using an Automatic Manual Transmission (AMT) and applied Dynamic Programming and Pontryagin's Minimum Principle (PMP) to obtain the optimal solutions.

...read moreread less

Abstract: This paper compares two optimal energy management methods for parallel hybrid electric vehicles using an Automatic Manual Transmission (AMT). A control-oriented model of the powertrain and vehicle dynamics is built first. The energy management is formulated as a typical optimal control problem to trade off the fuel consumption and gear shifting frequency under admissible constraints. The Dynamic Programming (DP) and Pontryagin’s Minimum Principle (PMP) are applied to obtain the optimal solutions. Tuning with the appropriate co-states, the PMP solution is found to be very close to that from DP. The solution for the gear shifting in PMP has an algebraic expression associated with the vehicular velocity and can be implemented more efficiently in the control algorithm. The computation time of PMP is significantly less than DP.

...read moreread less

209 citations

Journal Article•DOI•

Optimal Strategies for Communication and Remote Estimation With an Energy Harvesting Sensor

[...]

Ashutosh Nayyar¹, Tamer Basar², Demosthenis Teneketzis³, Venugopal V. Veeravalli²•Institutions (3)

University of California, Berkeley¹, University of Illinois at Urbana–Champaign², University of Michigan³

26 Mar 2013-IEEE Transactions on Automatic Control

TL;DR: In this paper, the authors considered a distributed estimation problem with an energy harvesting sensor and a remote estimator, where the sensor observes the state of a discrete-time source which may be a finite state Markov chain or a multidimensional linear Gaussian system.

...read moreread less

Abstract: We consider a remote estimation problem with an energy harvesting sensor and a remote estimator. The sensor observes the state of a discrete-time source which may be a finite state Markov chain or a multidimensional linear Gaussian system. It harvests energy from its environment (say, for example, through a solar cell) and uses this energy for the purpose of communicating with the estimator. Due to randomness of the energy available for communication, the sensor may not be able to communicate all of the time. The sensor may also want to save its energy for future communications. The estimator relies on messages communicated by the sensor to produce real-time estimates of the source state. We consider the problem of finding a communication scheduling strategy for the sensor and an estimation strategy for the estimator that jointly minimizes the expected sum of communication and distortion costs over a finite time horizon. Our goal of joint optimization leads to a decentralized decision-making problem. By viewing the problem from the estimator's perspective, we obtain a dynamic programming characterization for the decentralized decision-making problem that involves optimization over functions. Under some symmetry assumptions on the source statistics and the distortion metric, we show that an optimal communication strategy is described by easily computable thresholds and that the optimal estimate is a simple function of the most recently received sensor observation.

...read moreread less

185 citations

Journal Article•DOI•

Coordinated Multi-Microgrids Optimal Control Algorithm for Smart Distribution Management System

[...]

Jiang Wu, Xiaohong Guan

04 Nov 2013-IEEE Transactions on Smart Grid

TL;DR: This paper focuses on the decentralized optimal control algorithm for distribution management system by considering distribution network as coupled microgrids, and a coordinated dynamic programming algorithm is used to solve the problem by introducing a look-head dual multiplier mechanism as decentralized control signals from centralized information.

...read moreread less

Abstract: This paper focuses on the decentralized optimal control algorithm for distribution management system by considering distribution network as coupled microgrids. Based on the autonomous control model of each single microgrid, coordinated information and strategies among different microgrids are used to decrease the operating cost of distributed generation, improve the efficiency of the distributed storages utilization, and reduce the complexity of distribution network operation. The optimal control problem of microgrids is modeled as a decentralized partially-observable Markov decision process (DEC-POMDP), and a coordinated dynamic programming algorithm is used to solve the problem by introducing a look-head dual multiplier mechanism as decentralized control signals from centralized information. The performances of this algorithm and the impacts of different coordinated information are discussed by some study cases in the end of this paper.

...read moreread less

184 citations

Journal Article•DOI•

Optimal trajectory planning for trains – A pseudospectral method and a mixed integer linear programming approach

[...]

Yihui Wang¹, Yihui Wang², Bart De Schutter¹, Ton J.J. van den Boom¹, Bin Ning² - Show less +1 more•Institutions (2)

Delft University of Technology¹, Beijing Jiaotong University²

01 Apr 2013-Transportation Research Part C-emerging Technologies

TL;DR: In this article, a pseudospectral method is used to solve the problem of train optimal control under constraints and fixed arrival time, where the objective function is a trade-off between the energy consumption and the riding comfort.

...read moreread less

Abstract: The optimal trajectory planning problem for train operations under constraints and fixed arrival time is considered. The varying line resistance, variable speed restrictions, and varying maximum traction force are included in the problem definition. The objective function is a trade-off between the energy consumption and the riding comfort. Two approaches are proposed to solve this optimal control problem. First, we propose to use the pseudospectral method, a state-of-the-art method for optimal control problems, which has not used for train optimal control before. In the pseudospectral method, the optimal trajectory planning problem is recast into a multiple-phase optimal control problem, which is then transformed into a nonlinear programming problem. However, the calculation time for the pseudospectral method is too long for the real-time application in an automatic train operation system. To shorten the computation time, the optimal trajectory planning problem is reformulated as a mixed-integer linear programming (MILP) problem by approximating the nonlinear terms in the problem by piecewise affine functions. The MILP problem can be solved efficiently by existing solvers that guarantee to return the global optimum for the proposed MILP problem. Simulation results comparing the pseudospectral method, the new MILP approach, and a discrete dynamic programming approach show that the pseudospectral method has the best control performance, but that if the required computation time is also take into consideration, the MILP approach yields the best overall performance. More specifically, for the given case study the control performance of the pseudospectral approach is about 10% better than that of the MILP approach, and the computation time of the MILP approach is two to three orders of magnitude smaller than that of the pseudospectral method and the discrete dynamic programming approach.

...read moreread less

183 citations

Journal Article•DOI•

Active sequential hypothesis testing

[...]

Mohammad Naghshvar, Tara Javidi

01 Dec 2013-Annals of Statistics

TL;DR: In this paper, a decision maker is responsible to dynamically collect observations so as to enhance his information about an underlying phenomena of interest in a speedy manner while accounting for the penalty of wrong declaration.

...read moreread less

Abstract: Consider a decision maker who is responsible to dynamically collect observations so as to enhance his information about an underlying phenomena of interest in a speedy manner while accounting for the penalty of wrong declaration. Due to the sequential nature of the problem, the decision maker relies on his current information state to adaptively select the most “informative” sensing action among the available ones. In this paper, using results in dynamic programming, lower bounds for the optimal total cost are established. The lower bounds characterize the fundamental limits on the maximum achievable information acquisition rate and the optimal reliability. Moreover, upper bounds are obtained via an analysis of two heuristic policies for dynamic selection of actions. It is shown that the first proposed heuristic achieves asymptotic optimality, where the notion of asymptotic optimality, due to Chernoff, implies that the relative difference between the total cost achieved by the proposed policy and the optimal total cost approaches zero as the penalty of wrong declaration (hence the number of collected samples) increases. The second heuristic is shown to achieve asymptotic optimality only in a limited setting such as the problem of a noisy dynamic search. However, by considering the dependency on the number of hypotheses, under a technical condition, this second heuristic is shown to achieve a nonzero information acquisition rate, establishing a lower bound for the maximum achievable rate and error exponent. In the case of a noisy dynamic search with size-independent noise, the obtained nonzero rate and error exponent are shown to be maximum.

...read moreread less

Journal Article•DOI•

The robust vehicle routing problem with time windows

[...]

Agostinho Agra¹, Marielle Christiansen², Rosa Figueiredo¹, Lars Magnus Hvattum², Michael Poss³, Cristina Requejo¹ - Show less +2 more•Institutions (3)

University of Aveiro¹, Norwegian University of Science and Technology², University of Technology of Compiègne³

01 Mar 2013-Computers & Operations Research

TL;DR: This paper addresses the robust vehicle routing problem with time windows by proposing two new formulations for the robust problem, each based on a different robust approach, and develops a new cutting plane technique for robust combinatorial optimization problems with complicated constraints.

...read moreread less

Journal Article•DOI•

A Novel Dynamic Programming Algorithm for Track-Before-Detect in Radar Systems

[...]

Emanuele Grossi¹, Marco Lops¹, Luca Venturino¹•Institutions (1)

University of Cassino¹

01 May 2013-IEEE Transactions on Signal Processing

TL;DR: A novel procedure for multi-frame detection in radar systems that combines a track-before-detect (TBD) processor, which jointly elaborates observations from multiple scans (or frames) and confirms reliable plots.

...read moreread less

Abstract: In this paper we present a novel procedure for multi-frame detection in radar systems. The proposed architecture consists of a pre-processing stage, which extracts a set of candidate alarms (or plots) from the raw data measurements (e.g., this can be the Detector and Plot-Extractor of common radar systems), and a track-before-detect (TBD) processor, which jointly elaborates observations from multiple scans (or frames) and confirms reliable plots. A computationally efficient dynamic programming algorithm for the TBD processor is derived, which does not require a discretization of the state space and operates directly on the input plot-lists. Finally, a simple algorithm to solve possible data association problems arising at the track-formation step is given, and a thorough complexity and performance analysis is provided, showing that large detection gains with respect to the standard radar processing are achievable with negligible complexity increase.

...read moreread less

Journal Article•DOI•

Implementation of Dynamic Programming for $n$ -Dimensional Optimal Control Problems With Final State Constraints

[...]

Philipp Elbert¹, Soren Ebbesen¹, Lino Guzzella¹•Institutions (1)

ETH Zurich¹

01 May 2013-IEEE Transactions on Control Systems and Technology

TL;DR: The proposed method allows the use of a substantially lower level of discretization while achieving the same accuracy, and the evaluation time was reduced by a factor of about 300, while the accuracy of the solution was maintained.

...read moreread less

Abstract: Many optimal control problems include a continuous nonlinear dynamic system, state, and control constraints, and final state constraints. When using dynamic programming to solve such a problem, the solution space typically needs to be discretized and interpolation is used to evaluate the cost-to-go function between the grid points. When implementing such an algorithm, it is important to treat numerical issues appropriately. Otherwise, the accuracy of the found solution will deteriorate and global optimality can be restored only by increasing the level of discretization. Unfortunately, this will also increase the computational effort needed to calculate the solution. A known problem is the treatment of states in the time-state space from which the final state constraint cannot be met within the given final time. In this brief, a novel method to handle this problem is presented. The new method guarantees global optimality of the found solution, while it is not restricted to a specific class of problems. Opposed to that, previously proposed methods either sacrifice global optimality or are applicable to a specific class of problems only. Compared to the basic implementation, the proposed method allows the use of a substantially lower level of discretization while achieving the same accuracy. As an example, an academic optimal control problem is analyzed. With the new method, the evaluation time was reduced by a factor of about 300, while the accuracy of the solution was maintained.

...read moreread less

Journal Article•DOI•

Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems

[...]

Yu Jiang¹, Zhong-Ping Jiang²•Institutions (2)

MathWorks¹, New York University²

30 Dec 2013-arXiv: Dynamical Systems

TL;DR: This paper presents a novel method of global adaptive dynamic programming (ADP) for the adaptive optimal control of nonlinear polynomial systems by relaxing the problem of solving the Hamilton-Jacobi-Bellman equation to an optimization problem, which is solved via a new policy iteration method.

...read moreread less

Abstract: This paper presents a novel method of global adaptive dynamic programming (ADP) for the adaptive optimal control of nonlinear polynomial systems. The strategy consists of relaxing the problem of solving the Hamilton-Jacobi-Bellman (HJB) equation to an optimization problem, which is solved via a new policy iteration method. The proposed method distinguishes from previously known nonlinear ADP methods in that the neural network approximation is avoided, giving rise to significant computational improvement. Instead of semiglobally or locally stabilizing, the resultant control policy is globally stabilizing for a general class of nonlinear polynomial systems. Furthermore, in the absence of the a priori knowledge of the system dynamics, an online learning method is devised to implement the proposed policy iteration technique by generalizing the current ADP theory. Finally, three numerical examples are provided to validate the effectiveness of the proposed method.

...read moreread less

Journal Article•DOI•

Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm

[...]

Derong Liu¹, Hongliang Li¹, Ding Wang¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jun 2013-Neurocomputing

TL;DR: A greedy heuristic dynamic programming iteration algorithm is developed to solve the zero-sum game problems, which can be used to solves the Hamilton-Jacobi-Isaacs equation associated with H"~ optimal regulation control problems.

...read moreread less

Journal Article•DOI•

Optimizing Trading Decisions for Hydro Storage Systems using Approximate Dual Dynamic Programming

[...]

Nils Löhndorf¹, David Wozabal², Stefan Minner•Institutions (2)

University of Luxembourg¹, Technische Universität München²

22 Aug 2013-Operations Research

TL;DR: The results indicate that the proposed new approach to optimize operations of hydro storage systems with multiple connected reservoirs is tractable for a real-world application and that the gap between theoretical upper and a simulated lower bound decreases sufficiently fast.

...read moreread less

Abstract: We propose a new approach to optimize operations of hydro storage systems with multiple connected reservoirs whose operators participate in wholesale electricity markets. Our formulation integrates short-term intraday with long-term interday decisions. The intraday problem considers bidding decisions as well as storage operation during the day and is formulated as a stochastic program. The interday problem is modeled as a Markov decision process of managing storage operation over time, for which we propose integrating stochastic dual dynamic programming with approximate dynamic programming. We show that the approximate solution converges towards an upper bound of the optimal solution. To demonstrate the efficiency of the solution approach, we fit an econometric model to actual price and in inflow data and apply the approach to a case study of an existing hydro storage system. Our results indicate that the approach is tractable for a real-world application and that the gap between theoretical upper and a simulated lower bound decreases sufficiently fast. (authors' abstract)

...read moreread less

Journal Article•DOI•

On Solving Multistage Stochastic Programs with Coherent Risk Measures

[...]

Andy Philpott¹, Vitor L. de Matos, Erlon Cristian Finardi²•Institutions (2)

University of Auckland¹, Universidade Federal de Santa Catarina²

19 Jul 2013-Operations Research

TL;DR: A general computational approach based on dynamic programming is derived that can be shown to converge to an optimal policy by computing an inner approximation to future cost functions and an outer approximation that delivers a lower bound.

...read moreread less

Abstract: We consider a class of multistage stochastic linear programs in which at each stage a coherent risk measure of future costs is to be minimized. A general computational approach based on dynamic programming is derived that can be shown to converge to an optimal policy. By computing an inner approximation to future cost functions, we can evaluate an upper bound on the cost of an optimal policy, and an outer approximation delivers a lower bound. The approach we describe is particularly useful in sampling-based algorithms, and a numerical example is provided to show the efficacy of the methodology when used in conjunction with stochastic dual dynamic programming.

...read moreread less

Journal Article•DOI•

An Efficient Multi-Frame Track-Before-Detect Algorithm for Multi-Target Tracking

[...]

Wei Yi¹, Mark R. Morelande², Lingjiang Kong¹, Jianyu Yang¹•Institutions (2)

University of Electronic Science and Technology of China¹, University of Melbourne²

02 Apr 2013-IEEE Journal of Selected Topics in Signal Processing

TL;DR: By factorizing the joint posterior density using the structure of MTT, an efficient DP-TBD algorithm is developed to approximately solve the joint maximization in a fast but accurate manner and can accurately estimate the number of targets and reliably track multiple targets even when targets are in proximity.

...read moreread less

Abstract: This paper considers the multi-target tracking (MTT) problem through the use of dynamic programming based track-before-detect (DP-TBD) methods. The usual solution of this problem is to adopt a multi-target state, which is the concatenation of individual target states, then search the estimate in the expanded multi-target state space. However, this solution involves a high-dimensional joint maximization which is computationally intractable for most realistic problems. Additionally, the dimension of the multi-target state has to be determined before implementing the DP search. This is problematic when the number of targets is unknown. We make two contributions towards addressing these problems. Firstly, by factorizing the joint posterior density using the structure of MTT, an efficient DP-TBD algorithm is developed to approximately solve the joint maximization in a fast but accurate manner. Secondly, we propose a novel detection procedure such that the dimension of the multi-target state no longer needs be to pre-determined before the DP search. Our analysis indicates that the proposed algorithm could achieve a computational complexity which is almost linear to the number of processed frames and independent of the number of targets. Simulation results show that this algorithm can accurately estimate the number of targets and reliably track multiple targets even when targets are in proximity.

...read moreread less

Journal Article•DOI•

An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs

[...]

Derong Liu¹, Ding Wang¹, Xiong Yang¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2013-Information Sciences

TL;DR: The adaptive dynamic programming approach is employed for designing an optimal controller of unknown discrete-time nonlinear systems with control constraints and a neural network is constructed for identifying the unknown dynamical system with stability proof.

...read moreread less

Journal Article•DOI•

A Cluster-Based Differential Evolution Algorithm With External Archive for Optimization in Dynamic Environments

[...]

Udit Halder¹, Swagatam Das, Dipankar Maity¹•Institutions (1)

Jadavpur University¹

07 Mar 2013-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Experimental results indicate that CDDE_Ar can enjoy a statistically superior performance on a wide range of DOPs in comparison to some of the best known dynamic evolutionary optimizers.

...read moreread less

Abstract: This paper presents a Cluster-based Dynamic Differential Evolution with external Ar chive (CDDE_Ar) for global optimization in dynamic fitness landscape The algorithm uses a multipopulation method where the entire population is partitioned into several clusters according to the spatial locations of the trial solutions The clusters are evolved separately using a standard differential evolution algorithm The number of clusters is an adaptive parameter, and its value is updated after a certain number of iterations Accordingly, the total population is redistributed into a new number of clusters In this way, a certain sharing of information occurs periodically during the optimization process The performance of CDDE_Ar is compared with six state-of-the-art dynamic optimizers over the moving peaks benchmark problems and dynamic optimization problem (DOP) benchmarks generated with the generalized-dynamic-benchmark-generator system for the competition and special session on dynamic optimization held under the 2009 IEEE Congress on Evolutionary Computation Experimental results indicate that CDDE_Ar can enjoy a statistically superior performance on a wide range of DOPs in comparison to some of the best known dynamic evolutionary optimizers

...read moreread less

Book•

A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning

[...]

Alborz Geramifard¹, Tom Walsh¹, Stefanie Tellex¹, Girish Chowdhary¹, Nicholas Roy¹, Jonathan P. How¹ - Show less +2 more•Institutions (1)

Massachusetts Institute of Technology¹

02 Dec 2013

TL;DR: This article describes algorithms in a unified framework, giving pseudocode together with memory and iteration complexity analysis for each, and empirical evaluations of these techniques with four representations across four domains provide insight into how these algorithms perform with various feature sets in terms of running time and performance.

...read moreread less

Abstract: A Markov Decision Process (MDP) is a natural framework for formulating sequential decision-making problems under uncertainty. In recent years, researchers have greatly advanced algorithms for learning and acting in MDPs. This article reviews such algorithms, beginning with well-known dynamic programming methods for solving MDPs such as policy iteration and value iteration, then describes approximate dynamic programming methods such as trajectory based value iteration, and finally moves to reinforcement learning methods such as Q-Learning, SARSA, and least-squares policy iteration. We describe algorithms in a unified framework, giving pseudocode together with memory and iteration complexity analysis for each. Empirical evaluations of these techniques with four representations across four domains, provide insight into how these algorithms perform with various feature sets in terms of running time and performance.

...read moreread less

Journal Article•DOI•

A Dynamic Programming Algorithm for the Fused Lasso and L 0-Segmentation

[...]

Nicholas A. Johnson¹•Institutions (1)

Stanford University¹

30 May 2013-Journal of Computational and Graphical Statistics

TL;DR: A dynamic programming algorithm for the one-dimensional Fused Lasso Signal Approximator (FLSA) has a linear running time in the worst case, and simulations indicate substantial performance improvement over existing algorithms.

...read moreread less

Abstract: We propose a dynamic programming algorithm for the one-dimensional Fused Lasso Signal Approximator (FLSA). The proposed algorithm has a linear running time in the worst case. A similar approach is developed for the task of least squares segmentation, and simulations indicate substantial performance improvement over existing algorithms. Examples of R and C implementations are provided in the online Supplementary materials, posted on the journal web site.

...read moreread less

Journal Article•DOI•

An integrated model for lot sizing with supplier selection and quantity discounts

[...]

Amy H. I. Lee¹, He-Yau Kang², Chun-Mei Lai³, Wan-Yu Hong²•Institutions (3)

Chung Hua University¹, National Chin-Yi University of Technology², Communist University of the Toilers of the East³

01 Apr 2013-Applied Mathematical Modelling

TL;DR: A mixed integer programming (MIP) model is constructed first to solve the lot-sizing problem with multiple suppliers, multiple periods and quantity discounts and an efficient Genetic Algorithm (GA) is proposed next to tackle the problem when it becomes too complicated.

...read moreread less

Journal Article•DOI•

Security-aware optimization for ubiquitous computing systems with SEAT graph approach

[...]

Meikang Qiu¹, Lei Zhang², Zhong Ming³, Zhi Chen¹, Xiao Qin⁴, Laurence T. Yang⁵ - Show less +2 more•Institutions (5)

University of Kentucky¹, University of Texas at Dallas², Shenzhen University³, Auburn University⁴, St. Francis Xavier University⁵

01 Aug 2013-Journal of Computer and System Sciences

TL;DR: This paper proposes methods to generate security strategies to achieve the maximal overall security strength while meeting the real-time constraint and proposes an optimal algorithm, Integer Linear Programming Security Optimization (ILP-SOP), based on the SEAT graph approach.

...read moreread less

Journal Article•DOI•

Robust Adaptive Dynamic Programming With an Application to Power Systems

[...]

Yu Jiang¹, Zhong-Ping Jiang¹•Institutions (1)

New York University¹

21 Mar 2013-IEEE Transactions on Neural Networks

TL;DR: This brief presents a novel framework of robust adaptive dynamic programming (robust-ADP) aimed at computing globally stabilizing and suboptimal control policies in the presence of dynamic uncertainties.

...read moreread less

Abstract: This brief presents a novel framework of robust adaptive dynamic programming (robust-ADP) aimed at computing globally stabilizing and suboptimal control policies in the presence of dynamic uncertainties. A key strategy is to integrate ADP theory with techniques in modern nonlinear control with a unique objective of filling up a gap in the past literature of ADP without taking into account dynamic uncertainties. Neither the system dynamics nor the system order are required to be precisely known. As an illustrative example, the computational algorithm is applied to the controller design of a two-machine power system.

...read moreread less

Journal Article•DOI•

Multi-objective planning of electrical distribution systems using dynamic programming

[...]

Sanjib Ganguly¹, N. C. Sahoo¹, Debapriya Das¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Mar 2013-International Journal of Electrical Power & Energy Systems

TL;DR: In this article, a dynamic programming based planning algorithm for optimization of the feeder routes and branch conductor sizes is proposed, and a set of Pareto solutions is obtained using a weighted aggregation of the two objectives with different weight settings.

...read moreread less

Journal Article•DOI•

An Overview of Research on Adaptive Dynamic Programming

[...]

Huaguang Zhang¹, Xin Zhang², Yanhong Luo¹, Jun Yang¹•Institutions (2)

Northeastern University (China)¹, China University of Petroleum²

01 Apr 2013-Acta Automatica Sinica

TL;DR: This paper gives a review of ADP in the order of the variation on the structure of ADp scheme, the development ofADP algorithms and applications, aiming to bring the reader into this novel field of optimization technology.

...read moreread less

Journal Article•DOI•

Optimal operation scheduling of a pumping station with multiple pumps

[...]

Xiangtao Zhuan¹, Xiaohua Xia²•Institutions (2)

Wuhan University¹, University of Pretoria²

01 Apr 2013-Applied Energy

TL;DR: In this paper, an extended reduced dynamic programming algorithm (RDPA) was proposed to solve the problem of optimal operation scheduling of a pumping station with multiple pumps, where both the energy cost and maintenance cost were considered in the performance function of the optimization problem.

...read moreread less

Proceedings Article•DOI•

Multi-stage dynamic programming algorithm for eco-speed control at traffic signalized intersections

[...]

Raj Kishore Kamalanathsharma, Hesham A. Rakha

01 Oct 2013

TL;DR: A multi-stage dynamic programming tool that uses a recursive trajectory generation that is similar to least-cost path-finding algorithms that optimizes the upstream profile while comparing discretized downstream cases is suggested.

...read moreread less

Abstract: Researchers have attempted to compute a fuel-optimal vehicle trajectory by receiving traffic signal phasing and timing information. This problem, however, is complex when microscopic models are used to compute the objective function. This paper suggests use of a multi-stage dynamic programming tool that not only provides outputs that are closer to optimum, but are also computationally much faster. It uses a recursive trajectory generation that is similar to least-cost path-finding algorithms that optimizes the upstream profile while comparing discretized downstream cases. Since dynamic programming is faster than traditional computational methods, the algorithm can afford to use microscopic models and thereby be sensitive to a multitude of inputs such as grade, weather etc. Agent-based simulations suggest fuel savings in the range of 19 percent and travel-time savings of 32 percent in the vicinity of intersections. This research also showed potential benefits to vehicles following a vehicle that uses the proposed logic.

...read moreread less

Collapse