scispace - formally typeset
S

Shixiang Gu

Researcher at Google

Publications -  84
Citations -  13594

Shixiang Gu is an academic researcher from Google. The author has contributed to research in topics: Reinforcement learning & Computer science. The author has an hindex of 35, co-authored 62 publications receiving 9130 citations. Previous affiliations of Shixiang Gu include Max Planck Society & University of Cambridge.

Papers
More filters
Proceedings Article

Categorical Reparameterization with Gumbel-Softmax

TL;DR: Gumbel-Softmax as mentioned in this paper replaces the non-differentiable samples from a categorical distribution with a differentiable sample from a novel Gumbel softmax distribution, which has the essential property that it can be smoothly annealed into the categorical distributions.
Posted Content

Categorical Reparameterization with Gumbel-Softmax

TL;DR: It is shown that the Gumbel-Softmax estimator outperforms state-of-the-art gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables, and enables large speedups on semi-supervised classification.
Proceedings ArticleDOI

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

TL;DR: In this article, a deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.
Posted Content

Towards Deep Neural Network Architectures Robust to Adversarial Examples

TL;DR: Deep Contractive Network as mentioned in this paper proposes a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE), which increases the network robustness to adversarial examples, without a significant performance penalty.
Posted Content

Continuous Deep Q-Learning with Model-based Acceleration

TL;DR: This paper proposed normalized advantage functions (NAF) as an alternative to the more commonly used policy gradient and actor-critic methods to accelerate model-free reinforcement learning for continuous control tasks.