Neural Architecture Search with Reinforcement Learning

Home
/
Papers
/
Neural Architecture Search with Reinforcement Learning

Posted Content•

Neural Architecture Search with Reinforcement Learning

Barret Zoph¹, Quoc V. Le¹•Institutions (1)

05 Nov 2016-arXiv: Learning-

TL;DR: This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.

read less

Abstract: Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Image Classification Based on Automatic Neural Architecture Search Using Binary Crow Search Algorithm

[...]

Mobeen Ahmad¹, Muhammad Abdullah¹, Hyeonjoon Moon¹, Seong Joon Yoo¹, Dongil Han¹ - Show less +1 more•Institutions (1)

Sejong University¹

16 Oct 2020-IEEE Access

TL;DR: A swarm intelligence algorithm is proposed to search for novel architectures without human intervention that can achieve comparable performance to those of human-designed architectures, with a comparatively simpler approach and minimum human intervention.

...read moreread less

Abstract: Neural architectures have accelerated the advancement in various domains by enabling automatic pattern detection, image classification, audio recognition, and face recognition etc. However, they are computationally expensive to design and expert knowledge in various domains is required. In this paper, a swarm intelligence algorithm is proposed to search for novel architectures without human intervention that can achieve comparable performance to those of human-designed architectures. This work is inspired by current neural architecture search approaches based on reinforcement learning and genetic algorithm. However, not much attention is paid towards swarm intelligence metaheuristics-based neural architecture search. A framework is proposed for automatically designing neural architectures based on a swarm intelligence metaheuristic: Crow Search Algorithm. First, Crow Search Algorithm is integrated with binary network representation. To make it compatible for Neural Architecture Search, the original distance metric is replaced with hamming distance-based similarity measure. Second, the tuning parameters of Crow Search Algorithm are reduced by replacing the static flight length parameter with our dynamic flight length distribution algorithm. Third, the target selection method (random selection) is replaced by tournament select method. The proposed framework is used to search for architectures on MNIST, CIFAR10, and CIFAR100 datasets and achieved 0.18%, 3.48%, and 15.64% test error, respectively. Furthermore, small-scale transfer experiments are conducted to search architectures for Tiny ImageNet and achieved 34.43% test error. Nonparametric statistical analysis is performed to validate the impact of each modification in improving the quality of search space exploration. The proposed framework has achieved comparable performance with the state-of-the-art approaches, with a comparatively simpler approach and minimum human intervention. The proposed framework can be used to develop completely automated systems for designing architectures for various data-based classification applications.

...read moreread less

9 citations

Cites background or methods from "Neural Architecture Search with Rei..."

...Among Reinforcement learning based methods, NAS using reinforcement learning [14] and NASNet [34] are popular methods....
[...]
...In order to reduce the required computation, NASNet introduces a new search space which also allows transferability from one dataset to another....
[...]
...BCSA can be used with other more sophisticated search spaces such as NASNet or DAG based search spaces....
[...]
...They apply a modified evolutionary algorithm on NASNet [14] search space....
[...]
...In [14], it is proposed to use a RNN as controller which can design a string to specify architectures, however, this requires extensive computational power....
[...]

Posted Content•

Intelligence, physics and information - the tradeoff between accuracy and simplicity in machine learning.

[...]

Tailin Wu

11 Jan 2020-arXiv: Learning

TL;DR: This thesis addresses several key questions in some aspects of intelligence, and study the phase transitions in the two-term tradeoff, using strategies and tools from physics and information.

...read moreread less

Abstract: How can we enable machines to make sense of the world, and become better at learning? To approach this goal, I believe viewing intelligence in terms of many integral aspects, and also a universal two-term tradeoff between task performance and complexity, provides two feasible perspectives. In this thesis, I address several key questions in some aspects of intelligence, and study the phase transitions in the two-term tradeoff, using strategies and tools from physics and information. Firstly, how can we make the learning models more flexible and efficient, so that agents can learn quickly with fewer examples? Inspired by how physicists model the world, we introduce a paradigm and an AI Physicist agent for simultaneously learning many small specialized models (theories) and the domain they are accurate, which can then be simplified, unified and stored, facilitating few-shot learning in a continual way. Secondly, for representation learning, when can we learn a good representation, and how does learning depend on the structure of the dataset? We approach this question by studying phase transitions when tuning the tradeoff hyperparameter. In the information bottleneck, we theoretically show that these phase transitions are predictable and reveal structure in the relationships between the data, the model, the learned representation and the loss landscape. Thirdly, how can agents discover causality from observations? We address part of this question by introducing an algorithm that combines prediction and minimizing information from the input, for exploratory causal discovery from observational time series. Fourthly, to make models more robust to label noise, we introduce Rank Pruning, a robust algorithm for classification with noisy labels. I believe that building on the work of my thesis we will be one step closer to enable more intelligent machines that can make sense of the world.

...read moreread less

9 citations

Posted Content•

Data-Driven Neuron Allocation for Scale Aggregation Networks

[...]

Yi Li¹, Zhanghui Kuang¹, Yimin Chen, Wayne Zhang¹•Institutions (1)

SenseTime¹

20 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: ScaleNet as mentioned in this paper proposes to learn the neuron allocation for aggregating multi-scale information in different building blocks of a deep network by repeating a scale aggregation (SA) block that concatenates feature maps at a wide range of scales.

...read moreread less

Abstract: Successful visual recognition networks benefit from aggregating information spanning from a wide range of scales. Previous research has investigated information fusion of connected layers or multiple branches in a block, seeking to strengthen the power of multi-scale representations. Despite their great successes, existing practices often allocate the neurons for each scale manually, and keep the same ratio in all aggregation blocks of an entire network, rendering suboptimal performance. In this paper, we propose to learn the neuron allocation for aggregating multi-scale information in different building blocks of a deep network. The most informative output neurons in each block are preserved while others are discarded, and thus neurons for multiple scales are competitively and adaptively allocated. Our scale aggregation network (ScaleNet) is constructed by repeating a scale aggregation (SA) block that concatenates feature maps at a wide range of scales. Feature maps for each scale are generated by a stack of downsampling, convolution and upsampling operations. The data-driven neuron allocation and SA block achieve strong representational power at the cost of considerably low computational complexity. The proposed ScaleNet, by replacing all 3x3 convolutions in ResNet with our SA blocks, achieves better performance than ResNet and its outstanding variants like ResNeXt and SE-ResNet, in the same computational complexity. On ImageNet classification, ScaleNets absolutely reduce the top-1 error rate of ResNets by 1.12 (101 layers) and 1.82 (50 layers). On COCO object detection, ScaleNets absolutely improve the mmAP with backbone of ResNets by 3.6 (101 layers) and 4.6 (50 layers) on Faster RCNN, respectively. Code and models are released at this https URL.

...read moreread less

9 citations

Journal Article•DOI•

Rotorcraft virtual sensors via deep regression

[...]

Daniel Martinez¹, Wesley H. Brewer², Andrew Strelzoff¹, Andrew Wilson³, Daniel R. Wade³ - Show less +1 more•Institutions (3)

Engineer Research and Development Center¹, Science Applications International Corporation², United States Department of the Army³

01 Jan 2020-Journal of Parallel and Distributed Computing

TL;DR: A machine learning model that performs deep regression to infer rotorcraft component vibration spectra from a few flight conditional indicators from a deep neural network of fully connected layers (DNN) that performs high-dimensional and non-linear multivariate regression to reconstruct raw accelerometer data.

...read moreread less

9 citations

Book Chapter•DOI•

Automated Machine Learning: Techniques and Frameworks

[...]

Radwa Elshawi¹, Sherif Sakr¹•Institutions (1)

University of Tartu¹

30 Jun 2019

TL;DR: An overview of the state-of-the-art efforts in tackling the challenges of machine learning automation can be found in this paper, where the authors provide a comprehensive coverage for the various tools and frameworks that have been introduced in this domain.

...read moreread less

Abstract: Nowadays, machine learning techniques and algorithms are employed in almost every application domain (e.g., financial applications, advertising, recommendation systems, user behavior analytics). In practice, they are playing a crucial role in harnessing the power of massive amounts of data which we are currently producing every day in our digital world. In general, the process of building a high-quality machine learning model is an iterative, complex and time-consuming process that involves trying different algorithms and techniques in addition to having a good experience with effectively tuning their hyper-parameters. In particular, conducting this process efficiently requires solid knowledge and experience with the various techniques that can be employed. With the continuous and vast increase of the amount of data in our digital world, it has been acknowledged that the number of knowledgeable data scientists can not scale to address these challenges. Thus, there was a crucial need for automating the process of building good machine learning models (AutoML). In the last few years, several techniques and frameworks have been introduced to tackle the challenge of automating the machine learning process. The main aim of these techniques is to reduce the role of humans in the loop and fill the gap for non-expert machine learning users by playing the role of the domain expert. In this chapter, we present an overview of the state-of-the-art efforts in tackling the challenges of machine learning automation. We provide a comprehensive coverage for the various tools and frameworks that have been introduced in this domain. In addition, we discuss some of the research directions and open challenges that need to be addressed in order to achieve the vision and goals of the AutoML process.

...read moreread less

9 citations

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

55,235 citations

"Neural Architecture Search with Rei..." refers methods in this paper

...Along with this success is a paradigm shift from feature designing to architecture designing, i.e., from SIFT (Lowe, 1999), and HOG (Dalal & Triggs, 2005), to AlexNet (Krizhevsky et al., 2012), VGGNet (Simonyan & Zisserman, 2014), GoogleNet (Szegedy et al., 2015), and ResNet (He et al., 2016a)....
[...]

Journal Article•DOI•

Gradient-based learning applied to document recognition

[...]

Yann LeCun¹, Léon Bottou², Léon Bottou³, Yoshua Bengio⁴, Yoshua Bengio⁵, Yoshua Bengio², Patrick Haffner² - Show less +3 more•Institutions (5)

Bell Labs¹, AT&T², École Normale Supérieure³, Alcatel-Lucent⁴, École Polytechnique de Montréal⁵

01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

...read moreread less

42,067 citations

Proceedings Article•DOI•

Histograms of oriented gradients for human detection

[...]

Navneet Dalal¹, Bill Triggs¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

20 Jun 2005

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

...read moreread less

31,952 citations

"Neural Architecture Search with Rei..." refers methods in this paper

...Along with this success is a paradigm shift from feature designing to architecture designing, i.e., from SIFT (Lowe, 1999), and HOG (Dalal & Triggs, 2005), to AlexNet (Krizhevsky et al., 2012), VGGNet (Simonyan & Zisserman, 2014), GoogleNet (Szegedy et al., 2015), and ResNet (He et al., 2016a)....
[...]