scispace - formally typeset
Open AccessPosted Content

Continuous control with deep reinforcement learning

Reads0
Chats0
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Abstract
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

read more

Citations
More filters
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Posted Content

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

TL;DR: In this article, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework is proposed, where the actor aims to maximize expected reward while also maximizing entropy.
Posted Content

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

TL;DR: This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.
Posted Content

Addressing Function Approximation Error in Actor-Critic Methods

TL;DR: This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.
Journal Article

End-to-end training of deep visuomotor policies

TL;DR: In this article, a guided policy search method is used to map raw image observations directly to torques at the robot's motors, with supervision provided by a simple trajectory-centric reinforcement learning method.
References
More filters
Posted Content

Adam: A Method for Stochastic Optimization

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
Journal ArticleDOI

Human-level control through deep reinforcement learning

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Posted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Posted Content

Playing Atari with Deep Reinforcement Learning

TL;DR: This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Proceedings ArticleDOI

MuJoCo: A physics engine for model-based control

TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.
Related Papers (5)