Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control.

Open AccessPosted Content

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control.

Frederik Ebert, +5 more

- 03 Dec 2018 -

arXiv: Robotics

Chats0

TLDR

It is demonstrated that visual MPC can generalize to never-before-seen objects---both rigid and deformable---and solve a range of user-defined object manipulation tasks using the same model.

Abstract:

Deep reinforcement learning (RL) algorithms can learn complex robotic skills from raw sensory inputs, but have yet to achieve the kind of broad generalization and applicability demonstrated by deep learning methods in supervised domains. We present a deep RL method that is practical for real-world robotics tasks, such as robotic manipulation, and generalizes effectively to never-before-seen tasks and objects. In these settings, ground truth reward signals are typically unavailable, and we therefore propose a self-supervised model-based approach, where a predictive model learns to directly predict the future from raw sensory readings, such as camera images. At test time, we explore three distinct goal specification methods: designated pixels, where a user specifies desired object manipulation tasks by selecting particular pixels in an image and corresponding goal positions, goal images, where the desired goal state is specified with an image, and image classifiers, which define spaces of goal states. Our deep predictive models are trained using data collected autonomously and continuously by a robot interacting with hundreds of objects, without human supervision. We demonstrate that visual MPC can generalize to never-before-seen objects---both rigid and deformable---and solve a range of user-defined object manipulation tasks using the same model.

Citations

PDF

Open Access

More filters

Posted Content

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Sergey Levine, +3 more

- 04 May 2020 -

arXiv: Learning

TL;DR: This tutorial article aims to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcementlearning algorithms that utilize previously collected data, without additional online data collection.

...read moreread less

Posted Content

When to Trust Your Model: Model-Based Policy Optimization

Michael Janner, +3 more

- 19 Jun 2019 -

arXiv: Learning

TL;DR: This paper first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step, and demonstrates that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model- based algorithms without the usual pitfalls.

...read moreread less

Proceedings Article

MOPO: Model-based Offline Policy Optimization

Tianhe Yu, +7 more

TL;DR: Model-based offline policy optimization (MOPO) as discussed by the authors proposes to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics and theoretically shows that the algorithm maximizes a lower bound of the policy's return under the true MDP.

...read moreread less

Posted Content

Learning Latent Dynamics for Planning from Pixels

Danijar Hafner, +6 more

- 12 Nov 2018 -

arXiv: Learning

TL;DR: In this article, the Deep Planning Network (PlaNet) learns the environment dynamics from images and chooses actions through fast online planning in latent space, which achieves state-of-the-art performance on continuous control tasks with contact dynamics, partial observability and sparse rewards.

...read moreread less

Posted Content

Model-Based Reinforcement Learning for Atari

Lukasz Kaiser, +13 more

- 01 Mar 2019 -

arXiv: Learning

TL;DR: SimPLe as discussed by the authors is a model-based deep RL algorithm based on video prediction models, which can solve Atari games with fewer interactions than model-free methods and outperforms state-of-the-art RL algorithms by over an order of magnitude.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014 -

arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

Posted Content

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, +6 more

- 19 Dec 2013 -

arXiv: Learning

TL;DR: This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

...read moreread less

Proceedings Article

Model-agnostic meta-learning for fast adaptation of deep networks

Chelsea Finn, +2 more

TL;DR: An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.

...read moreread less

Posted Content

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

Xingjian Shi, +5 more

- 13 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes the convolutional LSTM (ConvLSTM) and uses it to build an end-to-end trainable model for the precipitation nowcasting problem and shows that it captures spatiotemporal correlations better and consistently outperforms FC-L STM and the state-of-the-art operational ROVER algorithm.

...read moreread less