scispace - formally typeset
Open AccessPosted Content

FP-NAS: Fast Probabilistic Neural Architecture Search

Reads0
Chats0
TLDR
This work proposes a sampling method adaptive to the distribution entropy, drawing more samples to encourage explorations at the beginning, and reducing samples as learning proceeds to search fast in the multivariate space, and calls this method Fast Probabilistic NAS.
Abstract
Differential Neural Architecture Search (NAS) requires all layer choices to be held in memory simultaneously; this limits the size of both search space and final architecture. In contrast, Probabilistic NAS, such as PARSEC, learns a distribution over high-performing architectures, and uses only as much memory as needed to train a single model. Nevertheless, it needs to sample many architectures, making it computationally expensive for searching in an extensive space. To solve these problems, we propose a sampling method adaptive to the distribution entropy, drawing more samples to encourage explorations at the beginning, and reducing samples as learning proceeds. Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude. We call this method Fast Probabilistic NAS (FP-NAS). Compared with PARSEC, it can sample 64% fewer architectures and search 2.1x faster. Compared with FBNetV2, FP-NAS is 1.9x - 3.5x faster, and the searched models outperform FBNetV2 models on ImageNet. FP-NAS allows us to expand the giant FBNetV2 space to be wider (i.e. larger channel choices) and deeper (i.e. more blocks), while adding Split-Attention block and enabling the search over the number of splits. When searching a model of size 0.4G FLOPS, FP-NAS is 132x faster than EfficientNet, and the searched FP-NAS-L0 model outperforms EfficientNet-B0 by 0.7% accuracy. Without using any architecture surrogate or scaling tricks, we directly search large models up to 1.0G FLOPS. Our FP-NAS-L2 model with simple distillation outperforms BigNAS-XL with advanced in-place distillation by 0.7% accuracy using similar FLOPS.

read more

Citations
More filters
Book ChapterDOI

Neural Architecture Search for Spiking Neural Networks

TL;DR: SNASNet as mentioned in this paper proposes a Neural Architecture Search (NAS) approach for finding better SNN architectures by selecting the architecture that can represent diverse spike activation patterns across different data samples without training.
Posted Content

Evolving Search Space for Neural Architecture Search

TL;DR: The proposed NSE scheme achieves 77.3% top-1 retrain accuracy on ImageNet with 333M FLOPs, which yielded a state-of-the-art performance among previous auto-generated architectures that do not involve knowledge distillation or weight pruning.
Journal ArticleDOI

Pareto-aware Neural Architecture Generation for Diverse Computational Budgets

TL;DR: A Pareto-aware Neural Architecture Generator (PNAG) which only needs to be trained once and dynamically produces the Pare to optimal architecture for any given budget via inference, which demonstrates that the PNAG is able to generate very competitive architectures while satisfying diverse latency budgets.
Proceedings ArticleDOI

Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

TL;DR: This work theoretically characterize the impact of connectivity patterns on the convergence of DNNs under gradient descent training in granularity, and shows that by a simpleration on “unpromising" connectivity patterns, it can trim down the number of models to evaluate, and accelerate the large-scale neural architecture search without any overhead.
Journal ArticleDOI

Transformer Meets Remote Sensing Video Detection and Tracking: A Comprehensive Survey

TL;DR: In this paper , the authors comprehensively summarize the research prospects of transformers in remote sensing moving object detection and tracking, and discuss the potential significance of the transformers according to some thorny problems in RSV.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Proceedings ArticleDOI

Densely Connected Convolutional Networks

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Related Papers (5)