Continuous control with deep reinforcement learning

Open AccessPosted Content

Continuous control with deep reinforcement learning

Timothy P. Lillicrap, +7 more

- 09 Sep 2015 -

arXiv: Learning

Chats0

TLDR

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Abstract:

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Machine learning

Thomas G. Dietterich

- 01 Dec 1996 -

ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Posted Content

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, +3 more

- 04 Jan 2018 -

arXiv: Learning

TL;DR: In this article, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework is proposed, where the actor aims to maximize expected reward while also maximizing entropy.

...read moreread less

Posted Content

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Jean-Bastien Grill, +13 more

- 13 Jun 2020 -

arXiv: Learning

TL;DR: This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.

...read moreread less

Posted Content

Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto, +2 more

- 26 Feb 2018 -

arXiv: Artificial Intelligence

TL;DR: This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.

...read moreread less

Journal Article

End-to-end training of deep visuomotor policies

Sergey Levine, +3 more

- 01 Jan 2016 -

Journal of Machine Learning Research

TL;DR: In this article, a guided policy search method is used to map raw image observations directly to torques at the robot's motors, with supervision provided by a simple trajectory-centric reinforcement learning method.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014 -

arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

Posted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

- 11 Feb 2015 -

arXiv: Learning

TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.

...read moreread less

Posted Content

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, +6 more

- 19 Dec 2013 -

arXiv: Learning

TL;DR: This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

...read moreread less

Proceedings ArticleDOI

MuJoCo: A physics engine for model-based control

Emanuel Todorov, +2 more

TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.

...read moreread less