Technical Note:Q-Learning

Open Access

Technical Note:Q-Learning

C. J. C. H. Watkins

- pp 55-68

Chats0

About:

The article was published on 1993-01-01 and is currently open access. It has received 2697 citations till now.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning in neural networks

Jürgen Schmidhuber

- 01 Jan 2015 -

Neural Networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

...read moreread less

Journal ArticleDOI

The free-energy principle: a unified brain theory?

Karl J. Friston

- 01 Feb 2010 -

Nature Reviews Neuroscience

TL;DR: This Review looks at some key brain theories in the biological and physical sciences from the free-energy perspective, suggesting that several global brain theories might be unified within a free- energy framework.

...read moreread less

Journal ArticleDOI

A Tutorial on the Cross-Entropy Method

Pieter-Tjerk de Boer, +3 more

- 01 Jan 2005 -

Annals of Operations Research

TL;DR: This tutorial presents the CE methodology, the basic algorithm and its modifications, and discusses applications in combinatorial optimization and machine learning.

...read moreread less

Book

Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations

Yoav Shoham, +1 more

TL;DR: This exciting and pioneering new overview of multiagent systems, which are online systems composed of multiple interacting intelligent agents, i.e., online trading, offers a newly seen computer science perspective on multi agent systems, while integrating ideas from operations research, game theory, economics, logic, and even philosophy and linguistics.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton

- 01 Aug 1988 -

Machine Learning

TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.

...read moreread less

Journal ArticleDOI

Learning from delayed rewards

Ben Kröse

- 01 Oct 1995 -

Robotics and Autonomous Systems

TL;DR: The invention relates to a circuit for use in a receiver which can receive two-tone/stereo signals which is intended to make a choice between mono or stereo reproduction of signal A or of signal B and vice versa.

...read moreread less

Journal ArticleDOI

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Long-Ji Lin

- 01 May 1992 -

Machine Learning

TL;DR: This paper compares eight reinforcement learning frameworks: Adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning and two extensions are experience replay, learning action models for planning, and teaching.

...read moreread less

Book ChapterDOI

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Richard S. Sutton

TL;DR: This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning.

...read moreread less