Open AccessPosted Content
Finite-Sample Analysis of Proximal Gradient TD Algorithms
Reads0
Chats0
TLDR
Theoretical analysis of gradient TD (GTD) reinforcement learning methods implies that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity.Abstract:
In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms In this paper, we formulate GTD methods as stochastic gradient algorithms wrt~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenariosread more
Citations
More filters
Book ChapterDOI
Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms
TL;DR: This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two.
Proceedings Article
SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation
TL;DR: The authors reformulate the Bellman optimality equation into a primal-dual optimization problem using Nesterov smoothing technique and the Legendre-Fenchel transformation, and develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem.
Journal ArticleDOI
A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation
TL;DR: Temporal difference learning (TD) is a simple iterative algorithm widely used for policy evaluation in Markov reward processes as mentioned in this paper, and Bhandari et al. prove finite time convergence rates for TD learning w...
Posted Content
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
TL;DR: These are the first convergence rate results for using nonlinear TTSA algorithms on the concerned class of bilevel optimization problems and it is shown that a two-timescale actor-critic proximal policy optimization algorithm can be viewed as a special case of the framework.
References
More filters
Book
Reinforcement Learning: An Introduction
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Book
Introduction to Reinforcement Learning
TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
Neuro-Dynamic Programming.
TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
Book
Convex Analysis and Monotone Operator Theory in Hilbert Spaces
TL;DR: This book provides a largely self-contained account of the main results of convex analysis and optimization in Hilbert space, and a concise exposition of related constructive fixed point theory that allows for a wide range of algorithms to construct solutions to problems in optimization, equilibrium theory, monotone inclusions, variational inequalities, and convex feasibility.
Book
Neuro-dynamic programming
TL;DR: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.