Open AccessProceedings Article
Prioritized Experience Replay
TLDR
Prioritized experience replay as mentioned in this paper is a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, achieving human-level performance across many Atari games.Abstract:
Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new state-of-the-art, outperforming DQN with uniform replay on 41 out of 49 games.read more
Citations
More filters
Proceedings Article
Asynchronous methods for deep reinforcement learning
Volodymyr Mnih,Adrià Puigdomènech Badia,Mehdi Mirza,Alex Graves,Tim Harley,Timothy P. Lillicrap,David Silver,Koray Kavukcuoglu +7 more
TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
Journal ArticleDOI
Building machines that learn and think like people.
TL;DR: In this article, a review of recent progress in cognitive science suggests that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it.
Posted Content
Dueling Network Architectures for Deep Reinforcement Learning
TL;DR: This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.
Posted Content
Addressing Function Approximation Error in Actor-Critic Methods
TL;DR: This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.
References
More filters
Journal ArticleDOI
Dopaminergic neurons promote hippocampal reactivation and spatial memory persistence.
TL;DR: Findings reveal that midbrain dopaminergic neurons promote hippocampal network dynamics associated with memory persistence as well as improving the later recall of neural representations of space and stabilized memory performance.
Book ChapterDOI
To recognize shapes, first learn to generate images.
TL;DR: This chapter describes several of the proposed algorithms and shows how they can be combined to produce hybrid methods that work efficiently in networks with many layers and millions of adaptive connections.
Proceedings Article
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
TL;DR: The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play, and proposed new agents based on this idea are proposed and shown to outperform DQN.
Journal ArticleDOI
Rewarded Outcomes Enhance Reactivation of Experience in the Hippocampus
TL;DR: It is shown that rat hippocampal CA3 principal cells are significantly more active during SWRs following receipt of reward and this enhanced reactivation in response to reward could be a mechanism to bind rewarding outcomes to the experiences that precede them.
Journal ArticleDOI
Hippocampal place cells construct reward related sequences through unexplored space
TL;DR: It is reported that viewing the delivery of food to an unvisited portion of an environment leads to off-line pre-activation of place cells sequences corresponding to that space, suggesting goal-biased preplay may support preparation for future experiences in novel environments.
Related Papers (5)
Human-level control through deep reinforcement learning
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more