Open AccessPosted Content
Generalization Guarantees for Neural Architecture Search with Train-Validation Split
Reads0
Chats0
TLDR
It is revealed that the upper-level problem helps select the most generalizable model and prevent overfitting with a near-minimal validation sample size and generalization bounds are established for continuous search spaces which are highly relevant for popular differentiable search schemes.Abstract:
Neural Architecture Search (NAS) is a popular method for automatically designing optimized architectures for high-performance deep learning. In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (lower-level problem) and various hyperparameters such as the configuration of the architecture over the validation data (upper-level problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the lower-level problem is often overparameterized and can easily achieve zero loss. Thus, a-priori it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of the role of train-validation split. To this aim this work establishes the following results. (1) We show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss. This reveals that the upper-level problem helps select the most generalizable model and prevent overfitting with a near-minimal validation sample size. Importantly, this is established for continuous search spaces which are highly relevant for popular differentiable search schemes. (2) We establish generalization bounds for NAS problems with an emphasis on an activation search problem. When optimized with gradient-descent, we show that the train-validation procedure returns the best (model, architecture) pair even if all architectures can perfectly fit the training data to achieve zero error. (3) Finally, we highlight rigorous connections between NAS, multiple kernel learning, and low-rank matrix learning. The latter leads to novel algorithmic insights where the solution of the upper problem can be accurately learned via efficient spectral methods to achieve near-minimal risk.read more
Citations
More filters
Proceedings Article
AutoBalance: Optimized Loss Functions for Imbalanced Data
TL;DR: AutoBalance is proposed, a bi-level optimization framework that automatically designs a training loss function to optimize a blend of accuracy and fairness-seeking objectives and enables personalized treatment for classes/groups by employing a parametric cross-entropy loss and individualized data augmentation schemes.
Proceedings ArticleDOI
Neural Networks can Learn Representations with Gradient Descent
TL;DR: There is a large class of functions which cannot be efficiently learned by kernel methods but can be easily learned with gradient descent on a two layer neural network outside the kernel regime by learning representations that are relevant to the target task.
Journal ArticleDOI
Green Learning: Introduction, Examples and Outlook
C.-C. Jay Kuo,Azad M. Madni +1 more
TL;DR: This paper offers an introduction to GL, its demonstrated applications, and future outlook, and sees a few successful GL examples with performance comparable with state-of-the-art DL solutions.
Journal ArticleDOI
Provable and Efficient Continual Representation Learning
TL;DR: This work establishes theoretical guarantees for CRL by providing sample complexity and generalization error bounds for new tasks by formalizing the statistical benefits of previously-learned representations and proposes an inference-efficient variation of PackNet called Efficient Sparse PackNet (ESPN) which employs joint channel & weight pruning.
References
More filters
Journal ArticleDOI
A tutorial on spectral clustering
TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.
Proceedings Article
Model-agnostic meta-learning for fast adaptation of deep networks
TL;DR: An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.
Journal ArticleDOI
Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses
TL;DR: In this article, the authors propose simple and directional likelihood-ratio tests for discriminating and choosing between two competing models whether the models are nonnested, overlapping or nested and whether both, one, or neither is misspecified.
Journal ArticleDOI
Sparse Principal Component Analysis
TL;DR: This work introduces a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings and shows that PCA can be formulated as a regression-type optimization problem.
Posted Content
Neural Architecture Search with Reinforcement Learning
Barret Zoph,Quoc V. Le +1 more
TL;DR: This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
Related Papers (5)
Principled Architecture Selection for Neural Networks: Application to Corporate Bond Rating Prediction
John Moody,Joachim Utans +1 more