Neural Architecture Search with Reinforcement Learning

Open AccessPosted Content

Neural Architecture Search with Reinforcement Learning

Barret Zoph, +1 more

- 05 Nov 2016 -

arXiv: Learning

Chats0

TLDR

This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.

Abstract:

Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A survey on Image Data Augmentation for Deep Learning

Connor Shorten, +1 more

- 06 Jul 2019 -

Journal of Big Data

TL;DR: This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing DataAugmentation, a data-space solution to the problem of limited data.

...read moreread less

Book ChapterDOI

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Ningning Ma, +3 more

TL;DR: ShuffleNet V2 as discussed by the authors proposes to evaluate the direct metric on the target platform, beyond only considering FLOPs, based on a series of controlled experiments, and derives several practical guidelines for efficient network design.

...read moreread less

Proceedings ArticleDOI

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.

Zihang Dai, +5 more

TL;DR: This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.

...read moreread less

Journal ArticleDOI

Deep Learning for Generic Object Detection: A Survey

Li Liu, +7 more

- 01 Feb 2020 -

International Journal of Computer Vision

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.

...read moreread less

Journal ArticleDOI

A brief survey of deep reinforcement learning

Kai Arulkumaran, +3 more

- 09 Nov 2017 -

arXiv: Learning

TL;DR: This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Posted Content

How to Construct Deep Recurrent Neural Networks

Razvan Pascanu, +3 more

- 20 Dec 2013 -

arXiv: Neural and Evolutionary Computing

TL;DR: In this article, the authors explore different ways to extend a recurrent neural network (RNN) to a \textit{deep} RNN by carefully analyzing and understanding the architecture of an RNN.

...read moreread less

Posted Content

Sequence Level Training with Recurrent Neural Networks

Marc'Aurelio Ranzato, +3 more

- 20 Nov 2015 -

arXiv: Learning

TL;DR: This work proposes a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE, and outperforms several strong baselines for greedy generation.

...read moreread less

Posted Content

Pointer Sentinel Mixture Models

Stephen Merity, +3 more

- 26 Sep 2016 -

arXiv: Computation and Language

TL;DR: The authors introduced the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word using a standard softmax classifier.

...read moreread less

Posted Content

Scalable Bayesian Optimization Using Deep Neural Networks

Jasper Snoek, +9 more

- 19 Feb 2015 -

arXiv: Machine Learning

TL;DR: In this article, the authors explore the use of neural networks as an alternative to GPs to model distributions over functions, and show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically.

...read moreread less

Posted Content

Character-Aware Neural Language Models

Yoon Kim, +3 more

- 26 Aug 2015 -

arXiv: Computation and Language

TL;DR: This article used a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM).

...read moreread less

Collapse

Neural Architecture Search with Reinforcement Learning

Citations

A survey on Image Data Augmentation for Deep Learning

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.

Deep Learning for Generic Object Detection: A Survey

A brief survey of deep reinforcement learning

References

How to Construct Deep Recurrent Neural Networks

Sequence Level Training with Recurrent Neural Networks

Pointer Sentinel Mixture Models

Scalable Bayesian Optimization Using Deep Neural Networks

Character-Aware Neural Language Models

Related Papers (5)

Deep Residual Learning for Image Recognition

Learning Multiple Layers of Features from Tiny Images

Very Deep Convolutional Networks for Large-Scale Image Recognition

Densely Connected Convolutional Networks

Going deeper with convolutions