scispace - formally typeset
Open AccessPosted Content

Generalization Guarantees for Neural Architecture Search with Train-Validation Split

Reads0
Chats0
TLDR
It is revealed that the upper-level problem helps select the most generalizable model and prevent overfitting with a near-minimal validation sample size and generalization bounds are established for continuous search spaces which are highly relevant for popular differentiable search schemes.
Abstract
Neural Architecture Search (NAS) is a popular method for automatically designing optimized architectures for high-performance deep learning. In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (lower-level problem) and various hyperparameters such as the configuration of the architecture over the validation data (upper-level problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the lower-level problem is often overparameterized and can easily achieve zero loss. Thus, a-priori it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of the role of train-validation split. To this aim this work establishes the following results. (1) We show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss. This reveals that the upper-level problem helps select the most generalizable model and prevent overfitting with a near-minimal validation sample size. Importantly, this is established for continuous search spaces which are highly relevant for popular differentiable search schemes. (2) We establish generalization bounds for NAS problems with an emphasis on an activation search problem. When optimized with gradient-descent, we show that the train-validation procedure returns the best (model, architecture) pair even if all architectures can perfectly fit the training data to achieve zero error. (3) Finally, we highlight rigorous connections between NAS, multiple kernel learning, and low-rank matrix learning. The latter leads to novel algorithmic insights where the solution of the upper problem can be accurately learned via efficient spectral methods to achieve near-minimal risk.

read more

Citations
More filters
Proceedings Article

AutoBalance: Optimized Loss Functions for Imbalanced Data

TL;DR: AutoBalance is proposed, a bi-level optimization framework that automatically designs a training loss function to optimize a blend of accuracy and fairness-seeking objectives and enables personalized treatment for classes/groups by employing a parametric cross-entropy loss and individualized data augmentation schemes.
Proceedings ArticleDOI

Neural Networks can Learn Representations with Gradient Descent

TL;DR: There is a large class of functions which cannot be efficiently learned by kernel methods but can be easily learned with gradient descent on a two layer neural network outside the kernel regime by learning representations that are relevant to the target task.
Journal ArticleDOI

Green Learning: Introduction, Examples and Outlook

TL;DR: This paper offers an introduction to GL, its demonstrated applications, and future outlook, and sees a few successful GL examples with performance comparable with state-of-the-art DL solutions.
Journal ArticleDOI

Provable and Efficient Continual Representation Learning

TL;DR: This work establishes theoretical guarantees for CRL by providing sample complexity and generalization error bounds for new tasks by formalizing the statistical benefits of previously-learned representations and proposes an inference-efficient variation of PackNet called Efficient Sparse PackNet (ESPN) which employs joint channel & weight pruning.
References
More filters
Journal ArticleDOI

A tutorial on spectral clustering

TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.
Proceedings Article

Model-agnostic meta-learning for fast adaptation of deep networks

TL;DR: An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.
Journal ArticleDOI

Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses

Quang Vuong
- 01 Mar 1989 - 
TL;DR: In this article, the authors propose simple and directional likelihood-ratio tests for discriminating and choosing between two competing models whether the models are nonnested, overlapping or nested and whether both, one, or neither is misspecified.
Journal ArticleDOI

Sparse Principal Component Analysis

TL;DR: This work introduces a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings and shows that PCA can be formulated as a regression-type optimization problem.
Posted Content

Neural Architecture Search with Reinforcement Learning

Barret Zoph, +1 more
- 05 Nov 2016 - 
TL;DR: This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
Related Papers (5)
Trending Questions (1)
What is the best train, validation, and testing split ratio for deep learning models?

The paper does not provide a specific recommendation for the best train, validation, and testing split ratio for deep learning models.