Top 12 papers published by Xi Chen from University of California, Berkeley in 2017

Posted Content•

Evolution Strategies as a Scalable Alternative to Reinforcement Learning.

[...]

Tim Salimans, Jonathan Ho, Xi Chen, Ilya Sutskever

10 Mar 2017-arXiv: Machine Learning

TL;DR: This work explores the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients, and highlights several advantages of ES as a blackbox optimization technique.

...read moreread less

Abstract: We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, our ES implementation only needs to communicate scalars, making it possible to scale to over a thousand parallel workers. This allows us to solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.

...read moreread less

1,218 citations

Posted Content•

A Simple Neural Attentive Meta-Learner

[...]

Nikhil Mishra¹, Mostafa Rohaninejad, Xi Chen¹, Pieter Abbeel¹•Institutions (1)

University of California, Berkeley¹

11 Jul 2017-arXiv: Artificial Intelligence

TL;DR: This work proposes a class of simple and generic meta-learner architectures that use a novel combination of temporal convolutions and soft attention; the former to aggregate information from past experience and the latter to pinpoint specific pieces of information.

...read moreread less

Abstract: Deep neural networks excel in regimes with large amounts of data, but tend to struggle when data is scarce or when they need to adapt quickly to changes in the task. In response, recent work in meta-learning proposes training a meta-learner on a distribution of similar tasks, in the hopes of generalization to novel but related tasks by learning a high-level strategy that captures the essence of the problem it is asked to solve. However, many recent meta-learning approaches are extensively hand-designed, either using architectures specialized to a particular application, or hard-coding algorithmic components that constrain how the meta-learner solves the task. We propose a class of simple and generic meta-learner architectures that use a novel combination of temporal convolutions and soft attention; the former to aggregate information from past experience and the latter to pinpoint specific pieces of information. In the most extensive set of meta-learning experiments to date, we evaluate the resulting Simple Neural AttentIve Learner (or SNAIL) on several heavily-benchmarked tasks. On all tasks, in both supervised and reinforcement learning, SNAIL attains state-of-the-art performance by significant margins.

...read moreread less

815 citations

Proceedings Article•

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

[...]

Tim Salimans¹, Andrej Karpathy², Xi Chen³, Diederik P. Kingma¹•Institutions (3)

OpenAI¹, Stanford University², University of California, Berkeley³

01 Jan 2017

TL;DR: This work discusses the implementation of PixelCNNs, a recently proposed class of powerful generative models with tractable likelihood that contains a number of modifications to the original model that both simplify its structure and improve its performance.

...read moreread less

Abstract: PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at this https URL. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional short-cut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these modifications.

...read moreread less

581 citations

Posted Content•

Parameter Space Noise for Exploration

[...]

Matthias Plappert¹, Rein Houthooft², Prafulla Dhariwal³, Szymon Sidor³, Richard Chen³, Xi Chen⁴, Tamim Asfour¹, Pieter Abbeel⁴, Marcin Andrychowicz³ - Show less +5 more•Institutions (4)

Karlsruhe Institute of Technology¹, Ghent University², OpenAI³, University of California, Berkeley⁴

06 Jun 2017-arXiv: Learning

TL;DR: In this article, the authors combine parameter noise with traditional RL methods to combine the best of both worlds, and demonstrate that both off-and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.

...read moreread less

Abstract: Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.

...read moreread less

361 citations

Proceedings Article•

#Exploration: a study of count-based exploration for deep reinforcement learning

[...]

Haoran Tang¹, Rein Houthooft², Davis Foote¹, Adam Stooke¹, Xi Chen¹, Yan Duan¹, John Schulman³, Filip De Turck², Pieter Abbeel¹ - Show less +5 more•Institutions (3)

University of California, Berkeley¹, Ghent University², OpenAI³

04 Dec 2017

TL;DR: In this article, a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks.

...read moreread less

Abstract: Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results. Detailed analysis reveals important aspects of a good hash function: 1) having appropriate granularity and 2) encoding information relevant to solving the MDP. This exploration strategy achieves near state-of-the-art performance on both continuous control tasks and Atari 2600 games, hence providing a simple yet powerful baseline for solving MDPs that require considerable exploration.

...read moreread less

314 citations

Posted Content•

Equivalence Between Policy Gradients and Soft Q-Learning

[...]

John Schulman, Xi Chen, Pieter Abbeel

21 Apr 2017-arXiv: Learning

TL;DR: There is a precise equivalence between Q-learning and policy gradient methods in the setting of entropy-regularized reinforcement learning, and it is shown that "soft" $Q-learning is exactly equivalent to a policy gradient method.

...read moreread less

Abstract: Two of the leading approaches for model-free reinforcement learning are policy gradient methods and $Q$-learning methods. $Q$-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since empirically, the $Q$-values they estimate are very inaccurate. A partial explanation may be that $Q$-learning methods are secretly implementing policy gradient updates: we show that there is a precise equivalence between $Q$-learning and policy gradient methods in the setting of entropy-regularized reinforcement learning, that "soft" (entropy-regularized) $Q$-learning is exactly equivalent to a policy gradient method. We also point out a connection between $Q$-learning methods and natural policy gradient methods. Experimentally, we explore the entropy-regularized versions of $Q$-learning and policy gradients, and we find them to perform as well as (or slightly better than) the standard variants on the Atari benchmark. We also show that the equivalence holds in practical settings by constructing a $Q$-learning method that closely matches the learning dynamics of A3C without using a target network or $\epsilon$-greedy exploration schedule.

...read moreread less

271 citations

Posted Content•

Meta-Learning with Temporal Convolutions.

[...]

Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, Pieter Abbeel

11 Jul 2017

TL;DR: This work proposes a class of simple and generic meta-learner architectures, based on temporal convolutions, that is domain- agnostic and has no particular strategy or algorithm encoded into it and outperforms state-of-the-art methods that are less general and more complex.

...read moreread less

Abstract: Deep neural networks excel in regimes with large amounts of data, but tend to struggle when data is scarce or when they need to adapt quickly to changes in the task. Recent work in meta-learning seeks to overcome this shortcoming by training a meta-learner on a distribution of similar tasks; the goal is for the meta-learner to generalize to novel but related tasks by learning a high-level strategy that captures the essence of the problem it is asked to solve. However, most recent approaches to meta-learning are extensively hand-designed, either using architectures that are specialized to a particular application, or hard-coding algorithmic components that tell the meta-learner how to solve the task. We propose a class of simple and generic meta-learner architectures, based on temporal convolutions, that is domain- agnostic and has no particular strategy or algorithm encoded into it. We validate our temporal-convolution-based meta-learner (TCML) through experiments pertaining to both supervised and reinforcement learning, and demonstrate that it outperforms state-of-the-art methods that are less general and more complex.

...read moreread less

180 citations

Posted Content•

Meta Learning Shared Hierarchies

[...]

Kevin Frans¹, Jonathan Ho², Xi Chen³, Pieter Abbeel³, John Schulman⁴ - Show less +1 more•Institutions (4)

Massachusetts Institute of Technology¹, Stanford University², University of California, Berkeley³, OpenAI⁴

26 Oct 2017-arXiv: Learning

TL;DR: A metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps, and provides a concrete metric for measuring the strength of such hierarchies.

...read moreread less

Abstract: We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps. Specifically, a set of primitives are shared within a distribution of tasks, and are switched between by task-specific policies. We provide a concrete metric for measuring the strength of such hierarchies, leading to an optimization problem for quickly reaching high reward on unseen tasks. We then present an algorithm to solve this problem end-to-end through the use of any off-the-shelf reinforcement learning method, by repeatedly sampling new tasks and resetting task-specific policies. We successfully discover meaningful motor primitives for the directional movement of four-legged robots, solely by interacting with distributions of mazes. We also demonstrate the transferability of primitives to solve long-timescale sparse-reward obstacle courses, and we enable 3D humanoid robots to robustly walk and crawl with the same policy.

...read moreread less

127 citations

Proceedings Article•

Meta learning shared hierarchies

[...]

Kevin Frans¹, Jonathan Ho², Xi Chen³, Pieter Abbeel³, John Schulman⁴ - Show less +1 more•Institutions (4)

Massachusetts Institute of Technology¹, Stanford University², University of California, Berkeley³, OpenAI⁴

26 Oct 2017

TL;DR: In this article, a set of primitives are shared within a distribution of tasks and are switched between by task-specific policies, leading to an optimization problem for quickly reaching high reward on unseen tasks.

...read moreread less

Abstract: We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps. Specifically, a set of primitives are shared within a distribution of tasks, and are switched between by task-specific policies. We provide a concrete metric for measuring the strength of such hierarchies, leading to an optimization problem for quickly reaching high reward on unseen tasks. We then present an algorithm to solve this problem end-to-end through the use of any off-the-shelf reinforcement learning method, by repeatedly sampling new tasks and resetting task-specific policies. We successfully discover meaningful motor primitives for the directional movement of four-legged robots, solely by interacting with distributions of mazes. We also demonstrate the transferability of primitives to solve long-timescale sparse-reward obstacle courses, and we enable 3D humanoid robots to robustly walk and crawl with the same policy.

...read moreread less

113 citations

Posted Content•

PixelSNAIL: An Improved Autoregressive Generative Model

[...]

Xi Chen¹, Nikhil Mishra¹, Mostafa Rohaninejad, Pieter Abbeel¹•Institutions (1)

University of California, Berkeley¹

28 Dec 2017-arXiv: Learning

TL;DR: This work introduces a new generative model architecture that combines causal convolutions with self attention and presents state-of-the-art log-likelihood results on CIFAR-10 and ImageNet.

...read moreread less

Abstract: Autoregressive generative models consistently achieve the best results in density estimation tasks involving high dimensional data, such as images or audio. They pose density estimation as a sequence modeling task, where a recurrent neural network (RNN) models the conditional distribution over the next element conditioned on all previous elements. In this paradigm, the bottleneck is the extent to which the RNN can model long-range dependencies, and the most successful approaches rely on causal convolutions, which offer better access to earlier parts of the sequence than conventional RNNs. Taking inspiration from recent work in meta reinforcement learning, where dealing with long-range dependencies is also essential, we introduce a new generative model architecture that combines causal convolutions with self attention. In this note, we describe the resulting model and present state-of-the-art log-likelihood results on CIFAR-10 (2.85 bits per dim) and $32 \times 32$ ImageNet (3.80 bits per dim). Our implementation is available at this https URL

...read moreread less

80 citations

Posted Content•

Safer Classification by Synthesis.

[...]

William Yang Wang, Angelina Wang, Aviv Tamar, Xi Chen, Pieter Abbeel - Show less +1 more

22 Nov 2017-arXiv: Learning

TL;DR: This work shows that conventional discriminative methods can easily be fooled to provide incorrect labels with very high confidence to out of distribution examples, and posit that a generative approach is the natural remedy for this problem, and proposes a method for classification using generative models.

...read moreread less

Abstract: The discriminative approach to classification using deep neural networks has become the de-facto standard in various fields. Complementing recent reservations about safety against adversarial examples, we show that conventional discriminative methods can easily be fooled to provide incorrect labels with very high confidence to out of distribution examples. We posit that a generative approach is the natural remedy for this problem, and propose a method for classification using generative models. At training time, we learn a generative model for each class, while at test time, given an example to classify, we query each generator for its most similar generation, and select the class corresponding to the most similar one. Our approach is general and can be used with expressive models such as GANs and VAEs. At test time, our method accurately "knows when it does not know," and provides resilience to out of distribution examples while maintaining competitive performance for standard examples.

...read moreread less

Posted Content•

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

[...]

Tim Salimans¹, Andrej Karpathy², Xi Chen³, Diederik P. Kingma¹•Institutions (3)

OpenAI¹, Stanford University², University of California, Berkeley³

19 Jan 2017-arXiv: Learning

TL;DR: PixelCNN as mentioned in this paper uses a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which is found to speed up training and simplify the model structure.

...read moreread less

Abstract: PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at this https URL. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional short-cut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these modifications.

...read moreread less

Showing papers by "Xi Chen published in 2017"