Non-parametric Approximate Dynamic Programming via the Kernel Method

Home
/
Papers
/
Non-parametric Approximate Dynamic Programming via the Kernel Method

Proceedings Article•

Non-parametric Approximate Dynamic Programming via the Kernel Method

Nikhil Bhat¹, Vivek F. Farias², Ciamac C. Moallemi¹•Institutions (2)

Columbia University¹, Massachusetts Institute of Technology²

03 Dec 2012-Vol. 25, pp 386-394

TL;DR: A novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees and can serve as a viable alternative to state-of-the-art parametric ADP algorithms.

read less

Abstract: This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

A Linearly Relaxed Approximate Linear Program for Markov Decision Processes

[...]

Chandrashekar Lakshminarayanan¹, Shalabh Bhatnagar², Csaba Szepesvári¹•Institutions (2)

University of Alberta¹, Indian Institute of Science²

09 Apr 2017-arXiv: Systems and Control

TL;DR: A linearly relaxed approximation linear program (LRALP) that has a tractable number of constraints, obtained as positive linear combinations of the original constraints of the ALP is defined.

...read moreread less

Abstract: Approximate linear programming (ALP) and its variants have been widely applied to Markov Decision Processes (MDPs) with a large number of states. A serious limitation of ALP is that it has an intractable number of constraints, as a result of which constraint approximations are of interest. In this paper, we define a linearly relaxed approximation linear program (LRALP) that has a tractable number of constraints, obtained as positive linear combinations of the original constraints of the ALP. The main contribution is a novel performance bound for LRALP.

...read moreread less

18 citations

Proceedings Article•

Transaction costs-aware portfolio optimization via fast Löwner-John ellipsoid approximation

[...]

Weiwei Shen¹, Jun Wang²•Institutions (2)

General Electric¹, Alibaba Group²

25 Jan 2015

TL;DR: An approximate dynamic programing method of synergistically combining the Lowner-John ellipsoid approximation with conventional value function iteration to quantify the associated optimal trading policy and cut computational costs up to a factor of five hundred is developed.

...read moreread less

Abstract: Merton's portfolio optimization problem in the presence of transaction costs for multiple assets has been an important and challenging problem in both theory and practice. Most existing work suffers from curse of dimensionality and encounters with the difficulty of generalization. In this paper, we develop an approximate dynamic programing method of synergistically combining the Lowner-John ellipsoid approximation with conventional value function iteration to quantify the associated optimal trading policy. Through constructing Lowner-John ellipsoids to parameterize the optimal policy and taking Euclidean projections onto the constructed ellipsoids to implement the trading policy, the proposed algorithm has cut computational costs up to a factor of five hundred and meanwhile achieved near-optimal risk-adjusted returns across both synthetic and real-world market datasets.

...read moreread less

17 citations

Journal Article•DOI•

A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs

[...]

William B. Haskell¹, Rahul Jain², Hiteshi Sharma², Pengqian Yu¹•Institutions (2)

National University of Singapore¹, University of Southern California²

01 Jan 2020-IEEE Transactions on Automatic Control

TL;DR: This work proposes universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes using a random operator framework with techniques from the theory of stochastic dominance.

...read moreread less

Abstract: We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The “empirical” nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a nonparametric method for function approximation using a parametric function space and a reproducing kernel Hilbert space respectively are then combined with EVL. Both function spaces have the universal function approximation property. Basis functions are picked randomly. Convergence analysis is performed using a random operator framework with techniques from the theory of stochastic dominance. Finite time sample complexity bounds are derived for both universal approximate dynamic programming algorithms. Numerical experiments support the versatility and computational tractability of this approach.

...read moreread less

14 citations

Journal Article•DOI•

Approximate linear programming for networks

[...]

Michael H. Veatch¹•Institutions (1)

Gordon College¹

01 Nov 2015-Computers & Operations Research

TL;DR: This paper uses approximate linear programming (ALP) to compute average cost bounds for queueing network control problems and finds that the ALPs offer more accurate bounds than other methods and the simplicity of just solving an LP.

...read moreread less

12 citations

Posted Content•

Q-Learning for Mean-Field Controls

[...]

Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu

10 Feb 2020

TL;DR: This paper develops a model-free kernel-based Q-learning algorithm (CDD-Q) and shows that its convergence rate and sample complexity are independent of the number of agents, and can be applied to a general class of Markov decision problems (MDPs) with deterministic dynamics and continuous state-action space.

...read moreread less

Abstract: Multi-agent reinforcement learning (MARL) has been applied to many challenging problems including two-team computer games, autonomous drivings, and real-time biddings. Despite the empirical success, there is a conspicuous absence of theoretical study of different MARL algorithms: this is mainly due to the curse of dimensionality caused by the exponential growth of the joint state-action space as the number of agents increases. Mean-field controls (MFC) with infinitely many agents and deterministic flows, meanwhile, provide good approximations to $N$-agent collaborative games in terms of both game values and optimal strategies. In this paper, we study the collaborative MARL under an MFC approximation framework: we develop a model-free kernel-based Q-learning algorithm (CDD-Q) and show that its convergence rate and sample complexity are independent of the number of agents. Our empirical studies on MFC examples demonstrate strong performances of CDD-Q. Moreover, the CDD-Q algorithm can be applied to a general class of Markov decision problems (MDPs) with deterministic dynamics and continuous state-action space.

...read moreread less

12 citations

Cites methods from "Non-parametric Approximate Dynamic ..."

...The idea of non-parametric kernel regression has also been used in the context of discrete state-space problems [4]....
[...]

1
2
3
4
5
…
6
7

Collapse

References

PDF

Open Access

More filters

Book•

Dynamic Programming and Optimal Control

[...]

Dimitri P. Bertsekas¹•Institutions (1)

Massachusetts Institute of Technology¹

01 May 1995

TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.

...read moreread less

Abstract: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization. The treatment focuses on basic unifying themes, and conceptual foundations. It illustrates the versatility, power, and generality of the method with many examples and applications from engineering, operations research, and other fields. It also addresses extensively the practical application of the methodology, possibly through the use of approximations, and provides an extensive treatment of the far-reaching methodology of Neuro-Dynamic Programming/Reinforcement Learning.

...read moreread less

10,834 citations

Book•DOI•

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

[...]

Bernhard Schölkopf¹, Alexander J. Smola•Institutions (1)

Max Planck Society¹

01 Dec 2001

TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.

...read moreread less

Abstract: From the Publisher: In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs-kernels--for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.

...read moreread less

7,880 citations

"Non-parametric Approximate Dynamic ..." refers background in this paper

...For certain sets S, Mercer’s theorem provides another important construction of such a Hilbert space. more examples can be found in the text of Scholkopf and Smola (2001)....
[...]
...The Gaussian kernel is known to be full-dimensional (see, e.g., Theorem 2.18, Scholkopf and Smola, 2001), so that employing such a kernel in our setting would correspond to working with an infinite dimensional approximation architecture....
[...]
...more examples can be found in the text of Scholkopf and Smola (2001)....
[...]

Book•

Optimization by Vector Space Methods

[...]

David G. Luenberger

01 Jan 1968

TL;DR: This book shows engineers how to use optimization theory to solve complex problems with a minimum of mathematics and unifies the large field of optimization with a few geometric principles.

...read moreread less

Abstract: From the Publisher: Engineers must make decisions regarding the distribution of expensive resources in a manner that will be economically beneficial. This problem can be realistically formulated and logically analyzed with optimization theory. This book shows engineers how to use optimization theory to solve complex problems. Unifies the large field of optimization with a few geometric principles. Covers functional analysis with a minimum of mathematics. Contains problems that relate to the applications in the book.

...read moreread less

5,667 citations

Journal Article•DOI•

Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks

[...]

Leandros Tassiulas, Anthony Ephremides¹•Institutions (1)

University of Maryland, College Park¹

01 Dec 1992-IEEE Transactions on Automatic Control

TL;DR: The stability of a queueing network with interdependent servers is considered and a policy is obtained which is optimal in the sense that its Stability Region is a superset of the stability region of every other scheduling policy, and this stability region is characterized.

...read moreread less

Abstract: The stability of a queueing network with interdependent servers is considered. The dependency among the servers is described by the definition of their subsets that can be activated simultaneously. Multihop radio networks provide a motivation for the consideration of this system. The problem of scheduling the server activation under the constraints imposed by the dependency among servers is studied. The performance criterion of a scheduling policy is its throughput that is characterized by its stability region, that is, the set of vectors of arrival and service rates for which the system is stable. A policy is obtained which is optimal in the sense that its stability region is a superset of the stability region of every other scheduling policy, and this stability region is characterized. The behavior of the network is studied for arrival rates that lie outside the stability region. Implications of the results in certain types of concurrent database and parallel processing systems are discussed. >

...read moreread less

3,018 citations

"Non-parametric Approximate Dynamic ..." refers methods in this paper

...Max-Weight (Tassiulas and Ephremides, 1992)....
[...]
...We prepare the ground for the proof by developing appropriate uniform concentration guarantees for appropriate function classes....
[...]

Book Chapter•DOI•

Rademacher and gaussian complexities: risk bounds and structural results

[...]

Peter L. Bartlett¹, Shahar Mendelson¹•Institutions (1)

Australian National University¹

01 Mar 2003

TL;DR: In this paper, the authors investigate the use of data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities, in a decision theoretic setting and prove general risk bounds in terms of these complexities.

...read moreread less

Abstract: We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities. In a decision theoretic setting, we prove general risk bounds in terms of these complexities. We consider function classes that can be expressed as combinations of functions from basis classes and show how the Rademacher and Gaussian complexities of such a function class can be bounded in terms of the complexity of the basis classes. We give examples of the application of these techniques in finding data-dependent risk bounds for decision trees, neural networks and support vector machines.

...read moreread less

2,535 citations