Addressing Function Approximation Error in Actor-Critic Methods

Open AccessProceedings Article

Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto, +2 more

- Vol. 80, pp 1587-1596

Chats0

TLDR

In this paper, the authors show that the overestimation bias persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic.

Abstract:

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

Citations

PDF

Open Access

More filters

Proceedings Article

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, +3 more

TL;DR: This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

...read moreread less

Posted Content

Soft Actor-Critic Algorithms and Applications

Tuomas Haarnoja, +10 more

- 13 Dec 2018 -

arXiv: Learning

TL;DR: Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.

...read moreread less

Proceedings Article

Off-Policy Deep Reinforcement Learning without Exploration

Scott Fujimoto, +2 more

TL;DR: This paper introduces a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data.

...read moreread less

Posted Content

Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar, +3 more

- 08 Jun 2020 -

arXiv: Learning

TL;DR: Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.

...read moreread less

Journal Article

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Justin Fu, +4 more

- 04 May 2021 -

arXiv: Learning

TL;DR: This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Book

Dynamic Programming and Optimal Control

Dimitri P. Bertsekas

TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.

...read moreread less

Posted Content

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less

Collapse

Related Papers (5)

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

Addressing Function Approximation Error in Actor-Critic Methods

Citations

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Soft Actor-Critic Algorithms and Applications

Off-Policy Deep Reinforcement Learning without Exploration

Conservative Q-Learning for Offline Reinforcement Learning

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

References

Adam: A Method for Stochastic Optimization

Reinforcement Learning: An Introduction

Human-level control through deep reinforcement learning

Dynamic Programming and Optimal Control

Proximal Policy Optimization Algorithms

Related Papers (5)

Human-level control through deep reinforcement learning

Proximal Policy Optimization Algorithms

Reinforcement Learning: An Introduction

Asynchronous methods for deep reinforcement learning

Adam: A Method for Stochastic Optimization