Non-parametric Approximate Dynamic Programming via the Kernel Method

Home
/
Papers
/
Non-parametric Approximate Dynamic Programming via the Kernel Method

Proceedings Article•

Non-parametric Approximate Dynamic Programming via the Kernel Method

Nikhil Bhat¹, Vivek F. Farias², Ciamac C. Moallemi¹•Institutions (2)

Columbia University¹, Massachusetts Institute of Technology²

03 Dec 2012-Vol. 25, pp 386-394

TL;DR: A novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees and can serve as a viable alternative to state-of-the-art parametric ADP algorithms.

read less

Abstract: This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Q-learning with Nearest Neighbors

[...]

Devavrat Shah¹, Qiaomin Xie²•Institutions (2)

Massachusetts Institute of Technology¹, University of Illinois at Urbana–Champaign²

12 Feb 2018-arXiv: Learning

TL;DR: In this article, the authors considered a model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, and provided tight finite sample analysis of the convergence rate.

...read moreread less

Abstract: We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a $d$-dimensional state space and the discounted factor $\gamma \in (0,1)$, given an arbitrary sample path with "covering time" $ L $, we establish that the algorithm is guaranteed to output an $\varepsilon$-accurate estimate of the optimal Q-function using $\tilde{O}\big(L/(\varepsilon^3(1-\gamma)^7)\big)$ samples. For instance, for a well-behaved MDP, the covering time of the sample path under the purely random policy scales as $ \tilde{O}\big(1/\varepsilon^d\big),$ so the sample complexity scales as $\tilde{O}\big(1/\varepsilon^{d+3}\big).$ Indeed, we establish a lower bound that argues that the dependence of $ \tilde{\Omega}\big(1/\varepsilon^{d+2}\big)$ is necessary.

...read moreread less

43 citations

Journal Article•DOI•

Practical kernel-based reinforcement learning

[...]

Andre Barreto, Doina Precup¹, Joelle Pineau¹•Institutions (1)

McGill University¹

01 Jan 2016-Journal of Machine Learning Research

TL;DR: An algorithm that turns KBRL into a practical reinforcement learning tool that significantly outperforms other state-of-the-art reinforcement learning algorithms on the tasks studied and derive upper bounds for the distance between the value functions computed by KBRL and KBSF using the same data.

...read moreread less

Abstract: Kernel-based reinforcement learning (KBRL) stands out among approximate reinforcement learning algorithms for its strong theoretical guarantees. By casting the learning problem as a local kernel approximation, KBRL provides a way of computing a decision policy which converges to a unique solution and is statistically consistent. Unfortunately, the model constructed by KBRL grows with the number of sample transitions, resulting in a computational cost that precludes its application to large-scale or on-line domains. In this paper we introduce an algorithm that turns KBRL into a practical reinforcement learning tool. Kernel-based stochastic factorization (KBSF) builds on a simple idea: when a transition probability matrix is represented as the product of two stochastic matrices, one can swap the factors of the multiplication to obtain another transition matrix, potentially much smaller than the original, which retains some fundamental properties of its precursor. KBSF exploits such an insight to compress the information contained in KBRL's model into an approximator of fixed size. This makes it possible to build an approximation considering both the difficulty of the problem and the associated computational cost. KBSF's computational complexity is linear in the number of sample transitions, which is the best one can do without discarding data. Moreover, the algorithm's simple mechanics allow for a fully incremental implementation that makes the amount of memory used independent of the number of sample transitions. The result is a kernel-based reinforcement learning algorithm that can be applied to large-scale problems in both off-line and on-line regimes. We derive upper bounds for the distance between the value functions computed by KBRL and KBSF using the same data. We also prove that it is possible to control the magnitude of the variables appearing in our bounds, which means that, given enough computational resources, we can make KBSF's value function as close as desired to the value function that would be computed by KBRL using the same set of sample transitions. The potential of our algorithm is demonstrated in an extensive empirical study in which KBSF is applied to difficult tasks based on real-world data. Not only does KBSF solve problems that had never been solved before, but it also significantly outperforms other state-of-the-art reinforcement learning algorithms on the tasks studied.

...read moreread less

37 citations

Cites background from "Non-parametric Approximate Dynamic ..."

...Following a slightly different line of work, Bhat et al. (2012) propose to kernelize the linear programming formulation of dynamic programming....
[...]

Journal Article•DOI•

A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems

[...]

Dimitris Bertsimas¹, J. Daniel Griffith¹, Vishal Gupta², Mykel J. Kochenderfer³, Velibor V. Mišić⁴ - Show less +1 more•Institutions (4)

Massachusetts Institute of Technology¹, University of Southern California², Stanford University³, University of California, Los Angeles⁴

01 Dec 2017-European Journal of Operational Research

TL;DR: This paper adapt MCTS and RHO to two problems – a problem inspired by tactical wildfire management and a classical problem involving the control of queueing networks – and undertake an extensive computational study comparing the two methods on large scale instances of both problems in terms of both the state and the action spaces.

...read moreread less

29 citations

Journal Article•DOI•

Multi-period portfolio selection using kernel-based control policy with dimensionality reduction

[...]

Yuichi Takano¹, Jun-ya Gotoh²•Institutions (2)

Tokyo Institute of Technology¹, Chuo University²

01 Jun 2014-Expert Systems With Applications

TL;DR: Numerical experiments show that the nonlinear control policy implemented in this paper works not only to reduce the computation time, but also to improve out-of-sample investment performance.

...read moreread less

Abstract: This paper studies a nonlinear control policy for multi-period investment. The nonlinear strategy we implement is categorized as a kernel method, but solving large-scale instances of the resulting optimization problem in a direct manner is computationally intractable in the literature. In order to overcome this difficulty, we employ a dimensionality reduction technique which is often used in principal component analysis. Numerical experiments show that our strategy works not only to reduce the computation time, but also to improve out-of-sample investment performance.

...read moreread less

21 citations

Journal Article•DOI•

Shape Constraints in Economics and Operations Research

[...]

Andrew L. Johnson, Daniel R. Jiang

01 Nov 2018-Statistical Science

TL;DR: This paper briefly reviews an illustrative set of research utilizing shape constraints in the economics and operations research literature and highlights the methodological innovations and applications with a particular emphasis on utility functions, production economics and sequential decision making applications.

...read moreread less

Abstract: Shape constraints, motivated by either application-specific assumptions or existing theory, can be imposed during model estimation to restrict the feasible region of the parameters. Although such restrictions may not provide any benefits in an asymptotic analysis, they often improve finite sample performance of statistical estimators and the computational efficiency of finding near-optimal control policies. This paper briefly reviews an illustrative set of research utilizing shape constraints in the economics and operations research literature. We highlight the methodological innovations and applications, with a particular emphasis on utility functions, production economics and sequential decision making applications.

...read moreread less

21 citations

Cites methods from "Non-parametric Approximate Dynamic ..."

...…Farias and Van Roy, 2000; Tsitsiklis and Roy, 1996; Tsitsiklis and Van Roy, 1999; Geramifard et al., 2013), approximate linear programming (De Farias and Van Roy, 2003; De Farias and Van Roy, 2004; Desai et al., 2012a), and nonparametric methods are used (Ormoneit and Sen, 2002; Bhat et al., 2012)....
[...]

1
2
3
4
…
5
6
7

Collapse

References

PDF

Open Access

More filters

Book•

Dynamic Programming and Optimal Control

[...]

Dimitri P. Bertsekas¹•Institutions (1)

Massachusetts Institute of Technology¹

01 May 1995

TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.

...read moreread less

Abstract: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization. The treatment focuses on basic unifying themes, and conceptual foundations. It illustrates the versatility, power, and generality of the method with many examples and applications from engineering, operations research, and other fields. It also addresses extensively the practical application of the methodology, possibly through the use of approximations, and provides an extensive treatment of the far-reaching methodology of Neuro-Dynamic Programming/Reinforcement Learning.

...read moreread less

10,834 citations

Book•DOI•

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

[...]

Bernhard Schölkopf¹, Alexander J. Smola•Institutions (1)

Max Planck Society¹

01 Dec 2001

TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.

...read moreread less

Abstract: From the Publisher: In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs-kernels--for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.

...read moreread less

7,880 citations

"Non-parametric Approximate Dynamic ..." refers background in this paper

...For certain sets S, Mercer’s theorem provides another important construction of such a Hilbert space. more examples can be found in the text of Scholkopf and Smola (2001)....
[...]
...The Gaussian kernel is known to be full-dimensional (see, e.g., Theorem 2.18, Scholkopf and Smola, 2001), so that employing such a kernel in our setting would correspond to working with an infinite dimensional approximation architecture....
[...]
...more examples can be found in the text of Scholkopf and Smola (2001)....
[...]

Book•

Optimization by Vector Space Methods

[...]

David G. Luenberger

01 Jan 1968

TL;DR: This book shows engineers how to use optimization theory to solve complex problems with a minimum of mathematics and unifies the large field of optimization with a few geometric principles.

...read moreread less

Abstract: From the Publisher: Engineers must make decisions regarding the distribution of expensive resources in a manner that will be economically beneficial. This problem can be realistically formulated and logically analyzed with optimization theory. This book shows engineers how to use optimization theory to solve complex problems. Unifies the large field of optimization with a few geometric principles. Covers functional analysis with a minimum of mathematics. Contains problems that relate to the applications in the book.

...read moreread less

5,667 citations

Journal Article•DOI•

Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks

[...]

Leandros Tassiulas, Anthony Ephremides¹•Institutions (1)

University of Maryland, College Park¹

01 Dec 1992-IEEE Transactions on Automatic Control

TL;DR: The stability of a queueing network with interdependent servers is considered and a policy is obtained which is optimal in the sense that its Stability Region is a superset of the stability region of every other scheduling policy, and this stability region is characterized.

...read moreread less

Abstract: The stability of a queueing network with interdependent servers is considered. The dependency among the servers is described by the definition of their subsets that can be activated simultaneously. Multihop radio networks provide a motivation for the consideration of this system. The problem of scheduling the server activation under the constraints imposed by the dependency among servers is studied. The performance criterion of a scheduling policy is its throughput that is characterized by its stability region, that is, the set of vectors of arrival and service rates for which the system is stable. A policy is obtained which is optimal in the sense that its stability region is a superset of the stability region of every other scheduling policy, and this stability region is characterized. The behavior of the network is studied for arrival rates that lie outside the stability region. Implications of the results in certain types of concurrent database and parallel processing systems are discussed. >

...read moreread less

3,018 citations

"Non-parametric Approximate Dynamic ..." refers methods in this paper

...Max-Weight (Tassiulas and Ephremides, 1992)....
[...]
...We prepare the ground for the proof by developing appropriate uniform concentration guarantees for appropriate function classes....
[...]

Book Chapter•DOI•

Rademacher and gaussian complexities: risk bounds and structural results

[...]

Peter L. Bartlett¹, Shahar Mendelson¹•Institutions (1)

Australian National University¹

01 Mar 2003

TL;DR: In this paper, the authors investigate the use of data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities, in a decision theoretic setting and prove general risk bounds in terms of these complexities.

...read moreread less

Abstract: We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities. In a decision theoretic setting, we prove general risk bounds in terms of these complexities. We consider function classes that can be expressed as combinations of functions from basis classes and show how the Rademacher and Gaussian complexities of such a function class can be bounded in terms of the complexity of the basis classes. We give examples of the application of these techniques in finding data-dependent risk bounds for decision trees, neural networks and support vector machines.

...read moreread less

2,535 citations