Reinforcement Learning: An Introduction

Open AccessBook

Reinforcement Learning: An Introduction

TLDR

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Abstract:

Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

Citations

PDF

Open Access

More filters

Book

Deep Learning

Ian Goodfellow, +2 more

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Journal ArticleDOI

Deep learning in neural networks

Jürgen Schmidhuber

- 01 Jan 2015 -

Neural Networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

...read moreread less

Pattern Recognition and Machine Learning

Christopher M. Bishop

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Actor-critic models of the basal ganglia: new anatomical and computational perspectives

Daphna Joel, +2 more

- 01 Jun 2002 -

Neural Networks

TL;DR: An alternative model of the basal ganglia is described which takes into account several important, and previously neglected, anatomical and physiological characteristics of basalganglia-thalamocortical connectivity and suggests that the basal Ganglia performs reinforcement-biased dimensionality reduction of cortical inputs.

...read moreread less

Book

Decisions, Uncertainty, and the Brain: The Science of Neuroeconomics

Paul W. Glimcher

TL;DR: Paul Glimcher argues that economic theory may provide an alternative to the classical Cartesian model of the brain and behavior, and outlines what an economics-based cognitive model might look like and how one would begin to test it empirically.

...read moreread less

Journal ArticleDOI

Activity in human ventral striatum locked to errors of reward prediction.

Giuseppe Pagnoni, +3 more

- 01 Feb 2002 -

Nature Neuroscience

TL;DR: The mesolimbic dopaminergic system has long been known to be involved in the processing of rewarding stimuli, although recent evidence from animal research has suggested a more specific role of signaling errors in the prediction of rewards as discussed by the authors.

...read moreread less

Proceedings Article

Exploiting Structure in Policy Construction

Craig Boutilier, +2 more

TL;DR: This work presents an algorithm, called structured policy Iteration (SPI), that constructs optimal policies without explicit enumeration of the state space, and retains the fundamental computational steps of the commonly used modified policy iteration algorithm, but exploits the variable and prepositional independencies reflected in a temporal Bayesian network representation of MDPs.

...read moreread less

Journal ArticleDOI

Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network

Wei-Xing Pan, +3 more

- 29 Jun 2005 -

The Journal of Neuroscience

TL;DR: It is demonstrated that the persistent reward responses of DA cells during conditioning are only accurately replicated by a TD model with long-lasting eligibility traces (nonzero values for the parameter λ) and low learning rate (α), suggesting that eligibility traces and low per-trial rates of plastic modification may be essential features of neural circuits for reward learning in the brain.

...read moreread less