Finite-Sample Analysis of Proximal Gradient TD Algorithms

Open AccessPosted Content

Finite-Sample Analysis of Proximal Gradient TD Algorithms

Bo Liu, +4 more

- 06 Jun 2020 -

arXiv: Learning

Chats0

TLDR

Theoretical analysis of gradient TD (GTD) reinforcement learning methods implies that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity.

Abstract:

In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms In this paper, we formulate GTD methods as stochastic gradient algorithms wrt~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios

Citations

PDF

Open Access

More filters

Book ChapterDOI

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Kaiqing Zhang, +2 more

- 29 Apr 2021 -

arXiv: Learning

TL;DR: This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two.

...read moreread less

Proceedings Article

SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation

Bo Dai, +8 more

TL;DR: The authors reformulate the Bellman optimality equation into a primal-dual optimization problem using Nesterov smoothing technique and the Legendre-Fenchel transformation, and develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem.

...read moreread less

Journal ArticleDOI

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

Jalaj Bhandari, +2 more

- 19 Mar 2021 -

Operations Research

TL;DR: Temporal difference learning (TD) is a simple iterative algorithm widely used for policy evaluation in Markov reward processes as mentioned in this paper, and Bhandari et al. prove finite time convergence rates for TD learning w...

...read moreread less

Posted Content

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Mingyi Hong, +3 more

- 10 Jul 2020 -

arXiv: Optimization and Control

TL;DR: These are the first convergence rate results for using nonlinear TTSA algorithms on the concerned class of bilevel optimization problems and it is shown that a two-timescale actor-critic proximal policy optimization algorithm can be viewed as a special case of the framework.

...read moreread less

DOI

Safe Reinforcement Learning

Philip S. Thomas

Abstract: SAFE REINFORCEMENT LEARNING

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Book

Introduction to Reinforcement Learning

Richard S. Sutton, +1 more

TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.

...read moreread less

Neuro-Dynamic Programming.

Dimitri P. Bertsekas

TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.

...read moreread less

Book

Convex Analysis and Monotone Operator Theory in Hilbert Spaces

Heinz H. Bauschke, +1 more

TL;DR: This book provides a largely self-contained account of the main results of convex analysis and optimization in Hilbert space, and a concise exposition of related constructive fixed point theory that allows for a wide range of algorithms to construct solutions to problems in optimization, equilibrium theory, monotone inclusions, variational inequalities, and convex feasibility.

...read moreread less

Book

Neuro-dynamic programming

Dimitri P. Bertsekas, +1 more

TL;DR: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.

...read moreread less