scispace - formally typeset
Open AccessJournal ArticleDOI

The Linear Programming Approach to Approximate Dynamic Programming

TLDR
In this article, an efficient method based on linear programming for approximating solutions to large-scale stochastic control problems is proposed. But the approach is not suitable for large scale queueing networks.
Abstract
The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. The approach "fits" a linear combination of pre-selected basis functions to the dynamic programming cost-to-go function. We develop error bounds that offer performance guarantees and also guide the selection of both basis functions and "state-relevance weights" that influence quality of the approximation. Experimental results in the domain of queueing network control provide empirical support for the methodology.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Reinforcement learning and adaptive dynamic programming for feedback control

TL;DR: This work describes mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming that give insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior.
Book

Algorithms for Reinforcement Learning

TL;DR: This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
Proceedings Article

Relative entropy policy search

TL;DR: The Relative Entropy Policy Search (REPS) method is suggested, which differs significantly from previous policy gradient approaches and yields an exact update step and works well on typical reinforcement learning benchmark problems.
Dissertation

On the Sample Complexity of Reinforcement Learning

TL;DR: Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time.
Journal ArticleDOI

Robust Dynamic Programming

TL;DR: It is proved that when this set of measures has a certain "rectangularity" property, all of the main results for finite and infinite horizon DP extend to natural robust counterparts.
References
More filters
Journal ArticleDOI

Valuing American Options by Simulation: A Simple Least-Squares Approach

TL;DR: In this paper, a new approach for approximating the value of American options by simulation is presented, using least squares to estimate the conditional expected payoff to the optionholder from continuation.
Journal ArticleDOI

Temporal difference learning and TD-Gammon

TL;DR: The domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning.
Journal ArticleDOI

An analysis of temporal-difference learning with function approximation

TL;DR: In this article, the temporal difference learning algorithm is applied to approximating the cost-to-go function of an infinite-horizon discounted Markov chain with a finite or infinite state space.
Journal ArticleDOI

Regression methods for pricing complex American-style options

TL;DR: A simulation-based approximate dynamic programming method for pricing complex American-style options, with a possibly high-dimensional underlying state space, and a related method which uses a single (parameterized) value function, which is a function of the time-state pair.
Proceedings Article

Improving Elevator Performance Using Reinforcement Learning

TL;DR: Results in simulation surpass the best of the heuristic elevator control algorithms of which the author is aware and demonstrate the power of RL on a very large scale stochastic dynamic optimization problem of practical utility.