scispace - formally typeset
Open AccessPosted Content

Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning

Reads0
Chats0
TLDR
It is found that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance, and empirically finds that image prediction accuracy, somewhat surprisingly, correlates more strongly with downstream task performance than reward prediction accuracy.
Abstract
Model-based reinforcement learning (MBRL) methods have shown strong sample efficiency and performance across a variety of tasks, including when faced with high-dimensional visual observations. These methods learn to predict the environment dynamics and expected reward from interaction and use this predictive model to plan and perform the task. However, MBRL methods vary in their fundamental design choices, and there is no strong consensus in the literature on how these design decisions affect performance. In this paper, we study a number of design decisions for the predictive model in visual MBRL algorithms, focusing specifically on methods that use a predictive model for planning. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. A big exception to this finding is that predicting future observations (i.e., images) leads to significant task performance improvement compared to only predicting rewards. We also empirically find that image prediction accuracy, somewhat surprisingly, correlates more strongly with downstream task performance than reward prediction accuracy. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks (that require exploration) will perform the same as the best-performing models when trained on the same training data. Simultaneously, in the absence of exploration, models that fit the data better usually perform better on the downstream task as well, but surprisingly, these are often not the same models that perform the best when learning and exploring from scratch. These findings suggest that performance and exploration place important and potentially contradictory requirements on the model.

read more

Citations
More filters
Proceedings ArticleDOI

Pathdreamer: A World Model for Indoor Navigation

TL;DR: It is shown that planning ahead with Pathdreamer brings about half the benefit of looking ahead at actual observations from unobserved parts of the environment, which will help unlock model-based approaches to challenging embodied navigation tasks such as navigating to specified objects and VLN.
Proceedings ArticleDOI

Replay Overshooting: Learning Stochastic Latent Dynamics with the Extended Kalman Filter

TL;DR: In this article, replay overshooting (RO) is used to learn nonlinear stochastic latent dynamics models suitable for long-horizon prediction, which outperforms several other prediction models on both quantitative and qualitative metrics.
Posted Content

Pathdreamer: A World Model for Indoor Navigation

TL;DR: Pathdreamer as discussed by the authors generates plausible high-resolution 360 visual observations (RGB, semantic segmentation and depth) for viewpoints that have not been visited, in buildings not seen during training.
Posted Content

FitVid: Overfitting in Pixel-Level Video Prediction.

TL;DR: FitVid as discussed by the authors proposes a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks while having similar parameter count as the current state-of-the-art models.
References
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Proceedings Article

Auto-Encoding Variational Bayes

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Posted Content

Proximal Policy Optimization Algorithms

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Book

Clinical Prediction Models

TL;DR: This paper presents a case study on survival analysis: Prediction of secondary cardiovascular events and lessons from case studies on overfitting and optimism in prediction models.
Journal Article

End-to-end training of deep visuomotor policies

TL;DR: In this article, a guided policy search method is used to map raw image observations directly to torques at the robot's motors, with supervision provided by a simple trajectory-centric reinforcement learning method.
Related Papers (5)