scispace - formally typeset
Open AccessProceedings Article

Prioritized Experience Replay

TLDR
Prioritized experience replay as mentioned in this paper is a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, achieving human-level performance across many Atari games.
Abstract
Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 41 out of 49 games.

read more

Citations
More filters
Proceedings Article

Asynchronous methods for deep reinforcement learning

TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
Journal ArticleDOI

Building machines that learn and think like people.

TL;DR: In this article, a review of recent progress in cognitive science suggests that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it.
Posted Content

Dueling Network Architectures for Deep Reinforcement Learning

TL;DR: This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.
Posted Content

Addressing Function Approximation Error in Actor-Critic Methods

TL;DR: This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.
References
More filters
Journal ArticleDOI

Dopaminergic neurons promote hippocampal reactivation and spatial memory persistence.

TL;DR: Findings reveal that midbrain dopaminergic neurons promote hippocampal network dynamics associated with memory persistence as well as improving the later recall of neural representations of space and stabilized memory performance.
Book ChapterDOI

To recognize shapes, first learn to generate images.

TL;DR: This chapter describes several of the proposed algorithms and shows how they can be combined to produce hybrid methods that work efficiently in networks with many layers and millions of adaptive connections.
Proceedings Article

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

TL;DR: The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play, and proposed new agents based on this idea are proposed and shown to outperform DQN.
Journal ArticleDOI

Rewarded Outcomes Enhance Reactivation of Experience in the Hippocampus

TL;DR: It is shown that rat hippocampal CA3 principal cells are significantly more active during SWRs following receipt of reward and this enhanced reactivation in response to reward could be a mechanism to bind rewarding outcomes to the experiences that precede them.
Journal ArticleDOI

Hippocampal place cells construct reward related sequences through unexplored space

TL;DR: It is reported that viewing the delivery of food to an unvisited portion of an environment leads to off-line pre-activation of place cells sequences corresponding to that space, suggesting goal-biased preplay may support preparation for future experiences in novel environments.
Related Papers (5)