scispace - formally typeset
Open Access

Technical Note:Q-Learning

C. J. C. H. Watkins
- pp 55-68
Reads0
Chats0
About
The article was published on 1993-01-01 and is currently open access. It has received 2697 citations till now.

read more

Citations
More filters
Journal ArticleDOI

Human-level control through deep reinforcement learning

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Journal ArticleDOI

Deep learning in neural networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
Journal ArticleDOI

The free-energy principle: a unified brain theory?

TL;DR: This Review looks at some key brain theories in the biological and physical sciences from the free-energy perspective, suggesting that several global brain theories might be unified within a free- energy framework.
Journal ArticleDOI

A Tutorial on the Cross-Entropy Method

TL;DR: This tutorial presents the CE methodology, the basic algorithm and its modifications, and discusses applications in combinatorial optimization and machine learning.
Book

Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations

TL;DR: This exciting and pioneering new overview of multiagent systems, which are online systems composed of multiple interacting intelligent agents, i.e., online trading, offers a newly seen computer science perspective on multi agent systems, while integrating ideas from operations research, game theory, economics, logic, and even philosophy and linguistics.
References
More filters
Journal ArticleDOI

Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton
- 01 Aug 1988 - 
TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.
Journal ArticleDOI

Learning from delayed rewards

TL;DR: The invention relates to a circuit for use in a receiver which can receive two-tone/stereo signals which is intended to make a choice between mono or stereo reproduction of signal A or of signal B and vice versa.
Journal ArticleDOI

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

TL;DR: This paper compares eight reinforcement learning frameworks: Adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning and two extensions are experience replay, learning action models for planning, and teaching.
Book ChapterDOI

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

TL;DR: This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning.