scispace - formally typeset
Open AccessPosted Content

Finite-Sample Analysis of Proximal Gradient TD Algorithms

Reads0
Chats0
TLDR
Theoretical analysis of gradient TD (GTD) reinforcement learning methods implies that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity.
Abstract
In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms In this paper, we formulate GTD methods as stochastic gradient algorithms wrt~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios

read more

Citations
More filters
Book ChapterDOI

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

TL;DR: This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two.
Proceedings Article

SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation

TL;DR: The authors reformulate the Bellman optimality equation into a primal-dual optimization problem using Nesterov smoothing technique and the Legendre-Fenchel transformation, and develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem.
Journal ArticleDOI

A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

TL;DR: Temporal difference learning (TD) is a simple iterative algorithm widely used for policy evaluation in Markov reward processes as mentioned in this paper, and Bhandari et al. prove finite time convergence rates for TD learning w...
Posted Content

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

TL;DR: These are the first convergence rate results for using nonlinear TTSA algorithms on the concerned class of bilevel optimization problems and it is shown that a two-timescale actor-critic proximal policy optimization algorithm can be viewed as a special case of the framework.

Safe Reinforcement Learning

Abstract: SAFE REINFORCEMENT LEARNING
References
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Book

Introduction to Reinforcement Learning

TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.

Neuro-Dynamic Programming.

TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
Book

Convex Analysis and Monotone Operator Theory in Hilbert Spaces

TL;DR: This book provides a largely self-contained account of the main results of convex analysis and optimization in Hilbert space, and a concise exposition of related constructive fixed point theory that allows for a wide range of algorithms to construct solutions to problems in optimization, equilibrium theory, monotone inclusions, variational inequalities, and convex feasibility.
Book

Neuro-dynamic programming

TL;DR: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
Related Papers (5)