scispace - formally typeset
Open AccessJournal ArticleDOI

An Upper Bound on the Loss from Approximate Optimal-Value Functions

Satinder Singh, +1 more
- 01 Sep 1994 - 
- Vol. 16, Iss: 3, pp 227-233
Reads0
Chats0
TLDR
An upper bound on performance loss is derived that is slightly tighter than that in Bertsekas (1987), and the extension of the bound to Q-learning is shown to provide a partial theoretical rationale for the approximation of value functions.
Abstract
Many reinforcement learning approaches can be formulated using the theory of Markov decision processes and the associated method of dynamic programming (DP) The value of this theoretical understanding, however, is tempered by many practical concerns One important question is whether DP-based approaches that use function approximation rather than lookup tables can avoid catastrophic effects on performance This note presents a result of Bertsekas (1987) which guarantees that small errors in the approximation of a task's optimal value function cannot produce arbitrarily bad performance when actions are selected by a greedy policy We derive an upper bound on performance loss that is slightly tighter than that in Bertsekas (1987), and we show the extension of the bound to Q-learning (Watkins, 1989) These results provide a partial theoretical rationale for the approximation of value functions, an issue of great practical importance in reinforcement learning

read more

Citations
More filters
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Journal ArticleDOI

Learning to act using real-time dynamic programming

TL;DR: An algorithm based on dynamic programming, which is called Real-Time DP, is introduced, by which an embedded system can improve its performance with experience and illuminate aspects of other DP-based reinforcement learning methods such as Watkins'' Q-Learning algorithm.
Book

Algorithms for Reinforcement Learning

TL;DR: This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
Journal ArticleDOI

Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

TL;DR: This paper examines the convergence of single-step on-policy RL algorithms for control with both decaying exploration and persistent exploration and provides examples of exploration strategies that result in convergence to both optimal values and optimal policies.
Dissertation

On the Sample Complexity of Reinforcement Learning

TL;DR: Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time.
References
More filters
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Journal ArticleDOI

Technical Note : \cal Q -Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Journal ArticleDOI

Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton
- 01 Aug 1988 - 
TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.
Journal ArticleDOI

Technical Note Q-Learning

TL;DR: In this article, it is shown that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action values are represented discretely.