Open AccessPosted Content
Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning
Mohammad Babaeizadeh,Mohammad Taghi Saffar,Danijar Hafner,Harini Kannan,Chelsea Finn,Sergey Levine,Dumitru Erhan +6 more
Reads0
Chats0
TLDR
It is found that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance, and empirically finds that image prediction accuracy, somewhat surprisingly, correlates more strongly with downstream task performance than reward prediction accuracy.Abstract:
Model-based reinforcement learning (MBRL) methods have shown strong sample efficiency and performance across a variety of tasks, including when faced with high-dimensional visual observations. These methods learn to predict the environment dynamics and expected reward from interaction and use this predictive model to plan and perform the task. However, MBRL methods vary in their fundamental design choices, and there is no strong consensus in the literature on how these design decisions affect performance. In this paper, we study a number of design decisions for the predictive model in visual MBRL algorithms, focusing specifically on methods that use a predictive model for planning. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. A big exception to this finding is that predicting future observations (i.e., images) leads to significant task performance improvement compared to only predicting rewards. We also empirically find that image prediction accuracy, somewhat surprisingly, correlates more strongly with downstream task performance than reward prediction accuracy. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks (that require exploration) will perform the same as the best-performing models when trained on the same training data. Simultaneously, in the absence of exploration, models that fit the data better usually perform better on the downstream task as well, but surprisingly, these are often not the same models that perform the best when learning and exploring from scratch. These findings suggest that performance and exploration place important and potentially contradictory requirements on the model.read more
Citations
More filters
Proceedings ArticleDOI
Pathdreamer: A World Model for Indoor Navigation
TL;DR: It is shown that planning ahead with Pathdreamer brings about half the benefit of looking ahead at actual observations from unobserved parts of the environment, which will help unlock model-based approaches to challenging embodied navigation tasks such as navigating to specified objects and VLN.
Proceedings ArticleDOI
Replay Overshooting: Learning Stochastic Latent Dynamics with the Extended Kalman Filter
TL;DR: In this article, replay overshooting (RO) is used to learn nonlinear stochastic latent dynamics models suitable for long-horizon prediction, which outperforms several other prediction models on both quantitative and qualitative metrics.
Posted Content
Pathdreamer: A World Model for Indoor Navigation
TL;DR: Pathdreamer as discussed by the authors generates plausible high-resolution 360 visual observations (RGB, semantic segmentation and depth) for viewpoints that have not been visited, in buildings not seen during training.
Posted Content
FitVid: Overfitting in Pixel-Level Video Prediction.
Mohammad Babaeizadeh,Mohammad Taghi Saffar,Suraj Nair,Sergey Levine,Chelsea Finn,Dumitru Erhan +5 more
TL;DR: FitVid as discussed by the authors proposes a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks while having similar parameter count as the current state-of-the-art models.
References
More filters
Book
Reinforcement Learning: An Introduction
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Proceedings Article
Auto-Encoding Variational Bayes
Diederik P. Kingma,Max Welling +1 more
TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Posted Content
Proximal Policy Optimization Algorithms
TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Book
Clinical Prediction Models
TL;DR: This paper presents a case study on survival analysis: Prediction of secondary cardiovascular events and lessons from case studies on overfitting and optimism in prediction models.
Journal Article
End-to-end training of deep visuomotor policies
TL;DR: In this article, a guided policy search method is used to map raw image observations directly to torques at the robot's motors, with supervision provided by a simple trajectory-centric reinforcement learning method.
Related Papers (5)
An Ensemble Multivariate Model for Resource Performance Prediction in the Cloud
Jean Steve Hirwa,Jian Cao +1 more