Deep Recurrent Q-Learning for Partially Observable MDPs

Open AccessProceedings Article

Deep Recurrent Q-Learning for Partially Observable MDPs

- pp 29-37

TLDR

Deep Recurrent Q-Network (DRQN) as discussed by the authors replaces the first post-convolutional fully-connected layer with a recurrent LSTM, which integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens.

Abstract:

Deep Reinforcement Learning has yielded proficient controllers for complex tasks. However, these controllers have limited memory and rely on being able to perceive the complete game screen at each decision point. To address these shortcomings, this article investigates the effects of adding recurrency to a Deep Q-Network (DQN) by replacing the first post-convolutional fully-connected layer with a recurrent LSTM. The resulting Deep Recurrent Q-Network (DRQN), although capable of seeing only a single frame at each timestep, successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens. Additionally, when trained with partial observations and evaluated with incrementally more complete observations, DRQN's performance scales as a function of observability. Conversely, when trained with full observations and evaluated with partial observations, DRQN's performance degrades less than DQN's. Thus, given the same length of history, recurrency is a viable alternative to stacking a history of frames in the DQN's input layer and while recurrency confers no systematic advantage when learning to play the game, the recurrent net can better adapt at evaluation time if the quality of observations changes.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Bayesian Reinforcement Learning in Factored POMDPs

Sammie Katt, +2 more

TL;DR: This paper provides an overview of research and development activities in the field of autonomous agents and multi-agent systems and aims to identify key concepts and applications, and to indicate how they relate to one-another.

...read moreread less

Journal ArticleDOI

Deep Reinforcement Learning: A Brief Survey

Kai Arulkumaran, +3 more

- 09 Nov 2017 -

IEEE Signal Processing Magazine

TL;DR: Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higher-level understanding of the visual world as discussed by the authors.

...read moreread less

Journal ArticleDOI

A brief survey of deep reinforcement learning

Kai Arulkumaran, +3 more

- 09 Nov 2017 -

arXiv: Learning

TL;DR: This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL.

...read moreread less

Journal ArticleDOI

Applications of Deep Reinforcement Learning in Communications and Networking: A Survey

Nguyen Cong Luong, +6 more

- 14 May 2019 -

IEEE Communications Surveys and Tutorial...

TL;DR: This paper presents a comprehensive literature review on applications of deep reinforcement learning (DRL) in communications and networking, and presents applications of DRL for traffic routing, resource sharing, and data collection.

...read moreread less

Posted Content

Deep Reinforcement Learning: An Overview

Yuxi Li

- 25 Jan 2017 -

arXiv: Learning

TL;DR: This work discusses core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration, and important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Posted Content

ADADELTA: An Adaptive Learning Rate Method

Matthew D. Zeiler

- 22 Dec 2012 -

arXiv: Learning

TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.

...read moreread less

Journal ArticleDOI

The arcade learning environment: an evaluation platform for general agents

Marc G. Bellemare, +3 more

- 01 May 2013 -

Journal of Artificial Intelligence Resea...

TL;DR: The Arcade Learning Environment (ALE) as discussed by the authors is a platform for evaluating the development of general, domain-independent AI technology, which provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players.

...read moreread less

Posted Content

Visualizing and Understanding Recurrent Networks

Andrej Karpathy, +2 more

- 05 Jun 2015 -

arXiv: Learning

TL;DR: This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets.

...read moreread less