scispace - formally typeset
Open AccessPosted Content

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

TLDR
In this article, the TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(lambda), but requires very little computation per action and can be used with arbitrary function representation methods.
Abstract
Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This paper examines the issues of the efficient and general implementation of TD(lambda) for arbitrary lambda, for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suffer from both inefficiency and lack of generality. The TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(lambda), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using lambda &gt 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.

read more

Citations
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Journal ArticleDOI

Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network

TL;DR: It is demonstrated that the persistent reward responses of DA cells during conditioning are only accurately replicated by a TD model with long-lasting eligibility traces (nonzero values for the parameter λ) and low learning rate (α), suggesting that eligibility traces and low per-trial rates of plastic modification may be essential features of neural circuits for reward learning in the brain.
Journal ArticleDOI

Fuzzy inference system learning by reinforcement methods

Lionel Jouffe
TL;DR: Fuzzy Actor-Critic Learning (FACL) and Fuzzy Q-Learning are reinforcement learning methods based on dynamic programming (DP) principles and the genericity of these methods allows them to learn every kind of reinforcement learning problem.
Journal ArticleDOI

Experiments with reinforcement learning in problems with continuous state and action spaces

TL;DR: This article proposes a simple and modular technique that can be used to implement function approximators with nonuniform degrees of resolution so that the value function can be represented with higher accuracy in important regions of the state and action spaces.
Book ChapterDOI

Machine Learning in Medical Applications

TL;DR: It is argued that the successful implementation of ML methods can help the integration of computer-based systems in the healthcare environment providing opportunities to facilitate and enhance the work of medical experts and ultimately to improve the efficiency and quality of medical care.
References
More filters
Journal ArticleDOI

Neuronlike adaptive elements that can solve difficult learning control problems

TL;DR: In this article, a system consisting of two neuron-like adaptive elements can solve a difficult learning control problem, where the task is to balance a pole that is hinged to a movable cart by applying forces to the cart base.
Book ChapterDOI

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

TL;DR: This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning.
Journal ArticleDOI

Convergence of Stochastic Iterative Dynamic Programming Algorithms

TL;DR: A rigorous proof of convergence of DP-based learning algorithms is provided by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem, which establishes a general class of convergent algorithms to which both TD() and Q-learning belong.

Reinforcement learning for robots using neural networks

Long-Ji Lin
TL;DR: This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.