scispace - formally typeset
Search or ask a question
Proceedings Article

Non-parametric Approximate Dynamic Programming via the Kernel Method

TL;DR: A novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees and can serve as a viable alternative to state-of-the-art parametric ADP algorithms.
Abstract: This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: A linearly relaxed approximation linear program (LRALP) that has a tractable number of constraints, obtained as positive linear combinations of the original constraints of the ALP is defined.
Abstract: Approximate linear programming (ALP) and its variants have been widely applied to Markov Decision Processes (MDPs) with a large number of states. A serious limitation of ALP is that it has an intractable number of constraints, as a result of which constraint approximations are of interest. In this paper, we define a linearly relaxed approximation linear program (LRALP) that has a tractable number of constraints, obtained as positive linear combinations of the original constraints of the ALP. The main contribution is a novel performance bound for LRALP.

18 citations

Proceedings Article
25 Jan 2015
TL;DR: An approximate dynamic programing method of synergistically combining the Lowner-John ellipsoid approximation with conventional value function iteration to quantify the associated optimal trading policy and cut computational costs up to a factor of five hundred is developed.
Abstract: Merton's portfolio optimization problem in the presence of transaction costs for multiple assets has been an important and challenging problem in both theory and practice. Most existing work suffers from curse of dimensionality and encounters with the difficulty of generalization. In this paper, we develop an approximate dynamic programing method of synergistically combining the Lowner-John ellipsoid approximation with conventional value function iteration to quantify the associated optimal trading policy. Through constructing Lowner-John ellipsoids to parameterize the optimal policy and taking Euclidean projections onto the constructed ellipsoids to implement the trading policy, the proposed algorithm has cut computational costs up to a factor of five hundred and meanwhile achieved near-optimal risk-adjusted returns across both synthetic and real-world market datasets.

17 citations

Journal ArticleDOI
TL;DR: This work proposes universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes using a random operator framework with techniques from the theory of stochastic dominance.
Abstract: We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The “empirical” nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a nonparametric method for function approximation using a parametric function space and a reproducing kernel Hilbert space respectively are then combined with EVL. Both function spaces have the universal function approximation property. Basis functions are picked randomly. Convergence analysis is performed using a random operator framework with techniques from the theory of stochastic dominance. Finite time sample complexity bounds are derived for both universal approximate dynamic programming algorithms. Numerical experiments support the versatility and computational tractability of this approach.

14 citations

Journal ArticleDOI
Michael H. Veatch1
TL;DR: This paper uses approximate linear programming (ALP) to compute average cost bounds for queueing network control problems and finds that the ALPs offer more accurate bounds than other methods and the simplicity of just solving an LP.

12 citations

Posted Content
10 Feb 2020
TL;DR: This paper develops a model-free kernel-based Q-learning algorithm (CDD-Q) and shows that its convergence rate and sample complexity are independent of the number of agents, and can be applied to a general class of Markov decision problems (MDPs) with deterministic dynamics and continuous state-action space.
Abstract: Multi-agent reinforcement learning (MARL) has been applied to many challenging problems including two-team computer games, autonomous drivings, and real-time biddings. Despite the empirical success, there is a conspicuous absence of theoretical study of different MARL algorithms: this is mainly due to the curse of dimensionality caused by the exponential growth of the joint state-action space as the number of agents increases. Mean-field controls (MFC) with infinitely many agents and deterministic flows, meanwhile, provide good approximations to $N$-agent collaborative games in terms of both game values and optimal strategies. In this paper, we study the collaborative MARL under an MFC approximation framework: we develop a model-free kernel-based Q-learning algorithm (CDD-Q) and show that its convergence rate and sample complexity are independent of the number of agents. Our empirical studies on MFC examples demonstrate strong performances of CDD-Q. Moreover, the CDD-Q algorithm can be applied to a general class of Markov decision problems (MDPs) with deterministic dynamics and continuous state-action space.

12 citations


Cites methods from "Non-parametric Approximate Dynamic ..."

  • ...The idea of non-parametric kernel regression has also been used in the context of discrete state-space problems [4]....

    [...]

References
More filters
Book
01 May 1995
TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.
Abstract: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization. The treatment focuses on basic unifying themes, and conceptual foundations. It illustrates the versatility, power, and generality of the method with many examples and applications from engineering, operations research, and other fields. It also addresses extensively the practical application of the methodology, possibly through the use of approximations, and provides an extensive treatment of the far-reaching methodology of Neuro-Dynamic Programming/Reinforcement Learning.

10,834 citations

BookDOI
01 Dec 2001
TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.
Abstract: From the Publisher: In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs—-kernels--for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.

7,880 citations


"Non-parametric Approximate Dynamic ..." refers background in this paper

  • ...For certain sets S, Mercer’s theorem provides another important construction of such a Hilbert space. more examples can be found in the text of Scholkopf and Smola (2001)....

    [...]

  • ...The Gaussian kernel is known to be full-dimensional (see, e.g., Theorem 2.18, Scholkopf and Smola, 2001), so that employing such a kernel in our setting would correspond to working with an infinite dimensional approximation architecture....

    [...]

  • ...more examples can be found in the text of Scholkopf and Smola (2001)....

    [...]

Book
01 Jan 1968
TL;DR: This book shows engineers how to use optimization theory to solve complex problems with a minimum of mathematics and unifies the large field of optimization with a few geometric principles.
Abstract: From the Publisher: Engineers must make decisions regarding the distribution of expensive resources in a manner that will be economically beneficial. This problem can be realistically formulated and logically analyzed with optimization theory. This book shows engineers how to use optimization theory to solve complex problems. Unifies the large field of optimization with a few geometric principles. Covers functional analysis with a minimum of mathematics. Contains problems that relate to the applications in the book.

5,667 citations

Journal ArticleDOI
TL;DR: The stability of a queueing network with interdependent servers is considered and a policy is obtained which is optimal in the sense that its Stability Region is a superset of the stability region of every other scheduling policy, and this stability region is characterized.
Abstract: The stability of a queueing network with interdependent servers is considered. The dependency among the servers is described by the definition of their subsets that can be activated simultaneously. Multihop radio networks provide a motivation for the consideration of this system. The problem of scheduling the server activation under the constraints imposed by the dependency among servers is studied. The performance criterion of a scheduling policy is its throughput that is characterized by its stability region, that is, the set of vectors of arrival and service rates for which the system is stable. A policy is obtained which is optimal in the sense that its stability region is a superset of the stability region of every other scheduling policy, and this stability region is characterized. The behavior of the network is studied for arrival rates that lie outside the stability region. Implications of the results in certain types of concurrent database and parallel processing systems are discussed. >

3,018 citations


"Non-parametric Approximate Dynamic ..." refers methods in this paper

  • ...Max-Weight (Tassiulas and Ephremides, 1992)....

    [...]

  • ...We prepare the ground for the proof by developing appropriate uniform concentration guarantees for appropriate function classes....

    [...]

Book ChapterDOI
01 Mar 2003
TL;DR: In this paper, the authors investigate the use of data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities, in a decision theoretic setting and prove general risk bounds in terms of these complexities.
Abstract: We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities. In a decision theoretic setting, we prove general risk bounds in terms of these complexities. We consider function classes that can be expressed as combinations of functions from basis classes and show how the Rademacher and Gaussian complexities of such a function class can be bounded in terms of the complexity of the basis classes. We give examples of the application of these techniques in finding data-dependent risk bounds for decision trees, neural networks and support vector machines.

2,535 citations