scispace - formally typeset
Search or ask a question
Posted Content

Neural Architecture Search with Reinforcement Learning

Barret Zoph1, Quoc V. Le1
05 Nov 2016-arXiv: Learning-
TL;DR: This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
Abstract: Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.
Citations
More filters
Proceedings ArticleDOI
28 Jun 2021
TL;DR: Zhang et al. as mentioned in this paper proposed a two-category comparator based random forest model as a surrogate to estimate the accuracy of the networks, thereby reducing heavy network training process and greatly saving search time.
Abstract: Neural Architecture Search (NAS) is studied to automatically design the deep neural network structure, freeing people from heavy network design tasks. Traditional NAS based on individual performance evaluation needs to train many networks generated by the search, and compare the performance of the networks according to their accuracy, which is very time-consuming. In this study, we propose to use a two-category comparator based random forest model as a surrogate to estimate the accuracy of the networks. thereby reducing heavy network training process and greatly saving search time. Instead of directly predicting the accuracy of each network, we propose to compare the relative performance between each two networks in our proposed two-category comparator. Furthermore, we implement the modeling process of the surrogate model in the sampling space of the original training data, which further accelerates the search process of the network in the NAS. Experimental results show that our proposed NAS framework can greatly reduce the search time, while the accuracy of the obtained network is comparable to that of other state-of-the art NAS algorithms.

5 citations

Journal ArticleDOI
01 Sep 2019
TL;DR: This paper used Auto-Keras to find the best architecture on several datasets, and demonstrated several automated machine learning features, as well as discussed the issue deeper.
Abstract: This paper aims at deeper exploration of the new field named auto-machine learning, as it shows promising results in specific machine learning tasks e.g. image classification. The following article is about to summarize the most successful approaches now available in the A.I. community. The automated machine learning method is very briefly described here, but the concept of automated task solving seems to be very promising, since it can significantly reduce expertise level of a person developing the machine learning model. We used Auto-Keras to find the best architecture on several datasets, and demonstrated several automated machine learning features, as well as discussed the issue deeper.

5 citations

Posted Content
TL;DR: In this paper, the authors proposed a CNN to classify local climate zone (LCZ) from Sentinel-2 images, Sen2LCZ-Net, which achieved state-of-the-art performance on the So2Sat LCZ42 dataset.
Abstract: As a unique classification scheme for urban forms and functions, the local climate zone (LCZ) system provides essential general information for any studies related to urban environments, especially on a large scale. Remote sensing data-based classification approaches are the key to large-scale mapping and monitoring of LCZs. The potential of deep learning-based approaches is not yet fully explored, even though advanced convolutional neural networks (CNNs) continue to push the frontiers for various computer vision tasks. One reason is that published studies are based on different datasets, usually at a regional scale, which makes it impossible to fairly and consistently compare the potential of different CNNs for real-world scenarios. This study is based on the big So2Sat LCZ42 benchmark dataset dedicated to LCZ classification. Using this dataset, we studied a range of CNNs of varying sizes. In addition, we proposed a CNN to classify LCZs from Sentinel-2 images, Sen2LCZ-Net. Using this base network, we propose fusing multi-level features using the extended Sen2LCZ-Net-MF. With this proposed simple network architecture and the highly competitive benchmark dataset, we obtain results that are better than those obtained by the state-of-the-art CNNs, while requiring less computation with fewer layers and parameters. Large-scale LCZ classification examples of completely unseen areas are presented, demonstrating the potential of our proposed Sen2LCZ-Net-MF as well as the So2Sat LCZ42 dataset. We also intensively investigated the influence of network depth and width and the effectiveness of the design choices made for Sen2LCZ-Net-MF. Our work will provide important baselines for future CNN-based algorithm developments for both LCZ classification and other urban land cover land use classification.

4 citations

Journal ArticleDOI
01 Oct 2022
TL;DR: The confident learning rate (CLR) is proposed and the combination of partial channel connections and edge normalization is introduced and BNAS-v2 achieves powerful generalization ability on multiple transfer tasks, e.g., MNIST, FashionMNIST, NORB, and SVHN.
Abstract: In this article, we propose BNAS-v2 to further improve the efficiency of broad neural architecture search (BNAS), which employs a broad convolutional neural network (BCNN) as the search space. In BNAS, the single-path sampling-updating strategy of an overparameterized BCNN leads to terrible unfair training issue, which restricts the efficiency improvement. To mitigate the unfair training issue, we employ a continuous relaxation strategy to optimize all paths of the overparameterized BCNN simultaneously. However, continuous relaxation leads to a performance collapse issue that leads to the unsatisfactory performance of the learned BCNN. For that, we propose the confident learning rate (CLR) and introduce the combination of partial channel connections and edge normalization. Experimental results show that 1) BNAS-v2 delivers state-of-the-art search efficiency on both CIFAR-10 (0.05 GPU days, which is $4\times $ faster than BNAS) and ImageNet (0.19 GPU days) with better or competitive performance; 2) the above two solutions are effectively alleviating the performance collapse issue; and 3) BNAS-v2 achieves powerful generalization ability on multiple transfer tasks, e.g., MNIST, FashionMNIST, NORB, and SVHN. The code is available at https://github.com/zixiangding/BNASv2.

4 citations

Journal ArticleDOI
TL;DR: A deep Q-network (DQN) algorithm was designed, using conventional feature engineering and deep convolutional neural network methods, to extract the optimal features from electroencephalogram and electrocardiogram measurements, and the results suggest that the DQN could be applied to investigating biomarkers for physiological responses and optimizing the classification system to reduce the input resources.
Abstract: A low level of vigilance is one of the main reasons for traffic and industrial accidents. We conducted experiments to evoke the low level of vigilance and record physiological data through single-channel electroencephalogram (EEG) and electrocardiogram (ECG) measurements. In this study, a deep Q-network (DQN) algorithm was designed, using conventional feature engineering and deep convolutional neural network (CNN) methods, to extract the optimal features. The DQN yielded the optimal features: two CNN features from ECG and two conventional features from EEG. The ECG features were more significant for tracking the transitions within the alertness continuum with the DQN. The classification was performed with a small number of features, and the results were similar to those from using all of the features. This suggests that the DQN could be applied to investigating biomarkers for physiological responses and optimizing the classification system to reduce the input resources.

4 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

111,197 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"Neural Architecture Search with Rei..." refers methods in this paper

  • ...Along with this success is a paradigm shift from feature designing to architecture designing, i.e., from SIFT (Lowe, 1999), and HOG (Dalal & Triggs, 2005), to AlexNet (Krizhevsky et al., 2012), VGGNet (Simonyan & Zisserman, 2014), GoogleNet (Szegedy et al., 2015), and ResNet (He et al., 2016a)....

    [...]

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Neural Architecture Search with Rei..." refers methods in this paper

  • ...Along with this success is a paradigm shift from feature designing to architecture designing, i.e., from SIFT (Lowe, 1999), and HOG (Dalal & Triggs, 2005), to AlexNet (Krizhevsky et al., 2012), VGGNet (Simonyan & Zisserman, 2014), GoogleNet (Szegedy et al., 2015), and ResNet (He et al., 2016a)....

    [...]