Asynchronous Stochastic Approximation and Q-Learning

doi:10.1023/A:1022689125041

Open AccessJournal ArticleDOI

Asynchronous Stochastic Approximation and Q-Learning

John N. Tsitsiklis

- 01 Sep 1994 -

Machine Learning

- Vol. 16, Iss: 3, pp 185-202

Chats0

TLDR

The Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, is studied to establish its convergence under conditions more general than previously available.

Abstract:

We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.

Citations

PDF

Open Access

More filters

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Journal ArticleDOI

Reinforcement learning: a survey

Leslie Pack Kaelbling, +2 more

- 01 Jan 1996 -

Journal of Artificial Intelligence Resea...

TL;DR: Central issues of reinforcement learning are discussed, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.

...read moreread less

Proceedings Article

Asynchronous methods for deep reinforcement learning

Volodymyr Mnih, +7 more

TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

...read moreread less

Book

Foundations of Machine Learning

Mehryar Mohri, +4 more

TL;DR: This graduate-level textbook introduces fundamental concepts and methods in machine learning, and provides the theoretical underpinnings of these algorithms, and illustrates key aspects for their application.

...read moreread less

Journal ArticleDOI

A Comprehensive Survey of Multiagent Reinforcement Learning

Lucian Busoniu, +2 more

TL;DR: The benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied, and an outlook for the field is provided.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Technical Note : \cal Q -Learning

Chris Watkins, +1 more

- 01 May 1992 -

Machine Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

...read moreread less

Book

Parallel and Distributed Computation: Numerical Methods

Dimitri P. Bertsekas, +1 more

TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.

...read moreread less

Learning from delayed rewards

Chris Watkins

Journal ArticleDOI

Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton

- 01 Aug 1988 -

Machine Learning

TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.

...read moreread less

Journal ArticleDOI

Technical Note Q-Learning

Chris Watkins, +1 more

- 01 May 1992 -

Machine Learning

TL;DR: In this article, it is shown that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action values are represented discretely.

...read moreread less

Asynchronous Stochastic Approximation and Q-Learning

Citations

Reinforcement Learning: An Introduction

Reinforcement learning: a survey

Asynchronous methods for deep reinforcement learning

Foundations of Machine Learning

A Comprehensive Survey of Multiagent Reinforcement Learning

References

Technical Note : \cal Q -Learning

Parallel and Distributed Computation: Numerical Methods

Learning from delayed rewards

Learning to Predict by the Methods of Temporal Differences

Technical Note Q-Learning

Related Papers (5)

Reinforcement Learning: An Introduction

Dynamic Programming and Optimal Control

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Technical Note : \cal Q -Learning

Reinforcement learning: a survey