An Upper Bound on the Loss from Approximate Optimal-Value Functions

doi:10.1023/A:1022693225949

Open AccessJournal ArticleDOI

An Upper Bound on the Loss from Approximate Optimal-Value Functions

Satinder Singh, +1 more

- 01 Sep 1994 -

Machine Learning

- Vol. 16, Iss: 3, pp 227-233

Chats0

TLDR

An upper bound on performance loss is derived that is slightly tighter than that in Bertsekas (1987), and the extension of the bound to Q-learning is shown to provide a partial theoretical rationale for the approximation of value functions.

Abstract:

Many reinforcement learning approaches can be formulated using the theory of Markov decision processes and the associated method of dynamic programming (DP) The value of this theoretical understanding, however, is tempered by many practical concerns One important question is whether DP-based approaches that use function approximation rather than lookup tables can avoid catastrophic effects on performance This note presents a result of Bertsekas (1987) which guarantees that small errors in the approximation of a task's optimal value function cannot produce arbitrarily bad performance when actions are selected by a greedy policy We derive an upper bound on performance loss that is slightly tighter than that in Bertsekas (1987), and we show the extension of the bound to Q-learning (Watkins, 1989) These results provide a partial theoretical rationale for the approximation of value functions, an issue of great practical importance in reinforcement learning

An Upper Bound on the Loss from Approximate Optimal-Value Functions

Citations

Machine learning

Learning to act using real-time dynamic programming

Algorithms for Reinforcement Learning

Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

On the Sample Complexity of Reinforcement Learning

References

Machine learning

Technical Note : \cal Q -Learning

Learning from delayed rewards

Learning to Predict by the Methods of Temporal Differences

Technical Note Q-Learning

Related Papers (5)

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Reinforcement Learning: An Introduction

Neuro-dynamic programming

Learning from delayed rewards

Least-squares policy iteration