scispace - formally typeset
Open AccessProceedings Article

Non-parametric Approximate Dynamic Programming via the Kernel Method

TLDR
A novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees and can serve as a viable alternative to state-of-the-art parametric ADP algorithms.
Abstract
This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Q-learning with Nearest Neighbors

TL;DR: In this article, the authors considered a model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, and provided tight finite sample analysis of the convergence rate.
Journal ArticleDOI

Practical kernel-based reinforcement learning

TL;DR: An algorithm that turns KBRL into a practical reinforcement learning tool that significantly outperforms other state-of-the-art reinforcement learning algorithms on the tasks studied and derive upper bounds for the distance between the value functions computed by KBRL and KBSF using the same data.
Journal ArticleDOI

A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems

TL;DR: This paper adapt MCTS and RHO to two problems – a problem inspired by tactical wildfire management and a classical problem involving the control of queueing networks – and undertake an extensive computational study comparing the two methods on large scale instances of both problems in terms of both the state and the action spaces.
Journal ArticleDOI

Multi-period portfolio selection using kernel-based control policy with dimensionality reduction

TL;DR: Numerical experiments show that the nonlinear control policy implemented in this paper works not only to reduce the computation time, but also to improve out-of-sample investment performance.
Journal ArticleDOI

Shape Constraints in Economics and Operations Research

TL;DR: This paper briefly reviews an illustrative set of research utilizing shape constraints in the economics and operations research literature and highlights the methodological innovations and applications with a particular emphasis on utility functions, production economics and sequential decision making applications.
References
More filters
Proceedings Article

Bayes meets bellman: the Gaussian process approach to temporal difference learning

TL;DR: A novel Bayesian approach to the problem of value function estimation in continuous state spaces by imposing a Gaussian prior over value functions and assuming aGaussian noise model is presented.
Proceedings ArticleDOI

Regularization and feature selection in least-squares temporal difference learning

TL;DR: This paper proposes a regularization framework for the LSTD algorithm, which is robust to irrelevant features and also serves as a method for feature selection, and presents an algorithm similar to the Least Angle Regression algorithm that can efficiently compute the optimal solution.
Journal ArticleDOI

Scheduling networks of queues: heavy traffic analysis of a simple open network

TL;DR: It is shown via simulation that the relative difference between the performance of the proposed policy and the pathwise lower bound becomes small as the load on the network is increased toward the heavy traffic limit.
Proceedings Article

Regularized Policy Iteration

TL;DR: This paper proposes two novel regularized policy iteration algorithms by adding L2-regularization to two widely-used policy evaluation methods: Bellman residual minimization (BRM) and least-squares temporal difference learning (LSTD).
Journal ArticleDOI

Performance Loss Bounds for Approximate Value Iteration with State Aggregation

TL;DR: This work considers approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant to establish performance loss bounds for policies derived from approximations associated with fixed points.