scispace - formally typeset
Search or ask a question
Posted Content

Neural Architecture Search with Reinforcement Learning

Barret Zoph1, Quoc V. Le1
05 Nov 2016-arXiv: Learning-
TL;DR: This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
Abstract: Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.
Citations
More filters
Proceedings ArticleDOI
01 Oct 2019
TL;DR: An evolution framework for graph data which is extensible to generic graphs is proposed, which mutates a population of neural networks to search the architecture and hyperparameter space and achieves state-of-the-art on MUTAG protein classification from a small population of 10 networks.
Abstract: Architecture design and hyperparameter selection for deep neural networks often involves guesswork. The parameter space is too large to try all possibilities, meaning one often settles for a suboptimal solution. Some works have proposed automatic architecture and hyperparameter search, but are constrained to image applications. We propose an evolution framework for graph data which is extensible to generic graphs. Our evolution mutates a population of neural networks to search the architecture and hyperparameter space. At each stage of the neuroevolution process, neural network layers can be added or removed, hyperparameters can be adjusted, or additional epochs of training can be applied. Probabilities of the mutation selection based on recent successes help guide the learning process for efficient and accurate learning. We achieve state-of-the-art on MUTAG protein classification from a small population of 10 networks and gain interesting insight into how to build effective network architectures incrementally.

2 citations


Cites background or methods from "Neural Architecture Search with Rei..."

  • ...Several recent works have proposed automatic learning algorithms for image classification tasks using evolution [1], genetic algorithms [2], and reinforcement learning [3, 4]....

    [...]

  • ...[3] used reinforcement learning on a deeper fixed-length architecture, adding one layer at a time....

    [...]

Posted Content
TL;DR: Zhang et al. as discussed by the authors proposed an end-to-end solution that integrates the AutoML components and returns a ready-touse model at the end of the search, which achieves joint optimization of Data augmentation policy, Hyperparameter and Architecture.
Abstract: Automated machine learning (AutoML) usually involves several crucial components, such as Data Augmentation (DA) policy, Hyper-Parameter Optimization (HPO), and Neural Architecture Search (NAS). Although many strategies have been developed for automating these components in separation, joint optimization of these components remains challenging due to the largely increased search dimension and the variant input types of each component. Meanwhile, conducting these components in a sequence often requires careful coordination by human experts and may lead to sub-optimal results. In parallel to this, the common practice of searching for the optimal architecture first and then retraining it before deployment in NAS often suffers from low performance correlation between the search and retraining stages. An end-to-end solution that integrates the AutoML components and returns a ready-to-use model at the end of the search is desirable. In view of these, we propose DHA, which achieves joint optimization of Data augmentation policy, Hyper-parameter and Architecture. Specifically, end-to-end NAS is achieved in a differentiable manner by optimizing a compressed lower-dimensional feature space, while DA policy and HPO are updated dynamically at the same time. Experiments show that DHA achieves state-of-the-art (SOTA) results on various datasets, especially 77.4\% accuracy on ImageNet with cell based search space, which is higher than current SOTA by 0.5\%. To the best of our knowledge, we are the first to efficiently and jointly optimize DA policy, NAS, and HPO in an end-to-end manner without retraining.

2 citations

Journal ArticleDOI
01 Dec 2022-Sensors
TL;DR: Li et al. as discussed by the authors compared the ability of three CNNs, namely VGG-16, ResNet-50, and MobileNet v1, to detect defects on metal surfaces.
Abstract: Metal workpieces are indispensable in the manufacturing industry. Surface defects affect the appearance and efficiency of a workpiece and reduce the safety of manufactured products. Therefore, products must be inspected for surface defects, such as scratches, dirt, and chips. The traditional manual inspection method is time-consuming and labor-intensive, and human error is unavoidable when thousands of products require inspection. Therefore, an automated optical inspection method is often adopted. Traditional automated optical inspection algorithms are insufficient in the detection of defects on metal surfaces, but a convolutional neural network (CNN) may aid in the inspection. However, considerable time is required to select the optimal hyperparameters for a CNN through training and testing. First, we compared the ability of three CNNs, namely VGG-16, ResNet-50, and MobileNet v1, to detect defects on metal surfaces. These models were hypothetically implemented for transfer learning (TL). However, in deploying TL, the phenomenon of apparent convergence in prediction accuracy, followed by divergence in validation accuracy, may create a problem when the image pattern is not known in advance. Second, our developed automated machine-learning (AutoML) model was trained through a random search with the core layers of the network architecture of the three TL models. We developed a retraining criterion for scenarios in which the model exhibited poor training results such that a new neural network architecture and new hyperparameters could be selected for retraining when the defect accuracy criterion in the first TL was not met. Third, we used AutoKeras to execute AutoML and identify a model suitable for a metal-surface-defect dataset. The performance of TL, AutoKeras, and our designed AutoML model was compared. The results of this study were obtained using a small number of metal defect samples. Based on TL, the detection accuracy of VGG-16, ResNet-50, and MobileNet v1 was 91%, 59.00%, and 50%, respectively. Moreover, the AutoKeras model exhibited the highest accuracy of 99.83%. The accuracy of the self-designed AutoML model reached 95.50% when using a core layer module, obtained by combining the modules of VGG-16, ResNet-50, and MobileNet v1. The designed AutoML model effectively and accurately recognized defective and low-quality samples despite low training costs. The defect accuracy of the developed model was close to that of the existing AutoKeras model and thus can contribute to the development of new diagnostic technologies for smart manufacturing.

2 citations

Book ChapterDOI
23 Aug 2020
TL;DR: This paper cast the neural architecture search problem as one of hyperparameter search to find non-uniform bit distributions throughout the layers of a CNN, assuming a Multi-Task Gaussian Processes prior, and explore the space by sampling those configurations that yield maximum information.
Abstract: We propose a novel method for neural network quantization that casts the neural architecture search problem as one of hyperparameter search to find non-uniform bit distributions throughout the layers of a CNN. We perform the search assuming a Multi-Task Gaussian Processes prior, which splits the problem to multiple tasks, each corresponding to different number of training epochs, and explore the space by sampling those configurations that yield maximum information. We then show that with significantly lower precision in the last layers we achieve a minimal loss of accuracy with appreciable memory savings. We test our findings on the CIFAR10 and ImageNet datasets using the VGG, ResNet and GoogLeNet architectures.

2 citations

Posted Content
TL;DR: This paper provides a novel yet systematic rethinking of PE in a resource constrained regime, termed budgeted PE (BPE), which precisely and effectively estimates the performance of an architecture sampled from an architecture space.
Abstract: Neural architecture search (NAS) remains a challenging problem, which is attributed to the indispensable and time-consuming component of performance estimation (PE). In this paper, we provide a novel yet systematic rethinking of PE in a resource constrained regime, termed budgeted PE (BPE), which precisely and effectively estimates the performance of an architecture sampled from an architecture space. Since searching an optimal BPE is extremely time-consuming as it requires to train a large number of networks for evaluation, we propose a Minimum Importance Pruning (MIP) approach. Given a dataset and a BPE search space, MIP estimates the importance of hyper-parameters using random forest and subsequently prunes the minimum one from the next iteration. In this way, MIP effectively prunes less important hyper-parameters to allocate more computational resource on more important ones, thus achieving an effective exploration. By combining BPE with various search algorithms including reinforcement learning, evolution algorithm, random search, and differentiable architecture search, we achieve 1, 000x of NAS speed up with a negligible performance drop comparing to the SOTA

2 citations


Cites methods from "Neural Architecture Search with Rei..."

  • ...In this paper, we follow the widely-used cell-based architecture search space in [49, 44, 11, 31, 34, 33, 51, 52, 50]: A network consists of a pre-defined number of cells [51], which can be either norm cells or reduction cells....

    [...]

  • ...For instance, reinforcement learning (RL) based methods [52, 51] search a suitable architecture on CIFAR10 by training and evaluating more than 20, 000 architectures by using 500 GPUs over 4 days....

    [...]

  • ...In this paper, we present a unified, fast and effective framework, termed Minimum Importance Pruning (MIP), to find an optimal BPE on a specific architecture search space such as cell-based search space [49, 44, 11, 31, 34, 33, 51, 52], as illustrated in Fig....

    [...]

  • ...has shown remarkable performance over manual designs in various computer vision tasks [9, 51, 28, 52, 10, 26]....

    [...]

  • ...As established by [51], cell based search space is now well adopted [49, 44, 11, 31, 34, 33, 51, 52], which is predefined and fixed during the architecture search to ensure a fair comparison among different NAS methods....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

111,197 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"Neural Architecture Search with Rei..." refers methods in this paper

  • ...Along with this success is a paradigm shift from feature designing to architecture designing, i.e., from SIFT (Lowe, 1999), and HOG (Dalal & Triggs, 2005), to AlexNet (Krizhevsky et al., 2012), VGGNet (Simonyan & Zisserman, 2014), GoogleNet (Szegedy et al., 2015), and ResNet (He et al., 2016a)....

    [...]

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Neural Architecture Search with Rei..." refers methods in this paper

  • ...Along with this success is a paradigm shift from feature designing to architecture designing, i.e., from SIFT (Lowe, 1999), and HOG (Dalal & Triggs, 2005), to AlexNet (Krizhevsky et al., 2012), VGGNet (Simonyan & Zisserman, 2014), GoogleNet (Szegedy et al., 2015), and ResNet (He et al., 2016a)....

    [...]