FP-NAS: Fast Probabilistic Neural Architecture Search

Open AccessPosted Content

FP-NAS: Fast Probabilistic Neural Architecture Search

Zhicheng Yan, +5 more

- 22 Nov 2020 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

This work proposes a sampling method adaptive to the distribution entropy, drawing more samples to encourage explorations at the beginning, and reducing samples as learning proceeds to search fast in the multivariate space, and calls this method Fast Probabilistic NAS.

Abstract:

Differential Neural Architecture Search (NAS) requires all layer choices to be held in memory simultaneously; this limits the size of both search space and final architecture. In contrast, Probabilistic NAS, such as PARSEC, learns a distribution over high-performing architectures, and uses only as much memory as needed to train a single model. Nevertheless, it needs to sample many architectures, making it computationally expensive for searching in an extensive space. To solve these problems, we propose a sampling method adaptive to the distribution entropy, drawing more samples to encourage explorations at the beginning, and reducing samples as learning proceeds. Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude. We call this method Fast Probabilistic NAS (FP-NAS). Compared with PARSEC, it can sample 64% fewer architectures and search 2.1x faster. Compared with FBNetV2, FP-NAS is 1.9x - 3.5x faster, and the searched models outperform FBNetV2 models on ImageNet. FP-NAS allows us to expand the giant FBNetV2 space to be wider (i.e. larger channel choices) and deeper (i.e. more blocks), while adding Split-Attention block and enabling the search over the number of splits. When searching a model of size 0.4G FLOPS, FP-NAS is 132x faster than EfficientNet, and the searched FP-NAS-L0 model outperforms EfficientNet-B0 by 0.7% accuracy. Without using any architecture surrogate or scaling tricks, we directly search large models up to 1.0G FLOPS. Our FP-NAS-L2 model with simple distillation outperforms BigNAS-XL with advanced in-place distillation by 0.7% accuracy using similar FLOPS.

FP-NAS: Fast Probabilistic Neural Architecture Search

Citations

Neural Architecture Search for Spiking Neural Networks

Evolving Search Space for Neural Architecture Search

Pareto-aware Neural Architecture Generation for Diverse Computational Budgets

Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

Transformer Meets Remote Sensing Video Detection and Tracking: A Comprehensive Survey

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

Going deeper with convolutions

Densely Connected Convolutional Networks

Related Papers (5)

Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours

Dynamic Distribution Pruning for Efficient Network Architecture Search

EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search

Few-shot Neural Architecture Search

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware