scispace - formally typeset
Open AccessPosted Content

Complexity of Linear Regions in Deep Networks

Reads0
Chats0
TLDR
In this article, it is shown that the number of regions in a piecewise linear network grows linearly in the total number of neurons, far below the exponential upper bound, and that the average distance to the nearest region boundary at initialization scales like the inverse of the neurons.
Abstract
It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions. In the case of networks that compute piecewise linear functions, such as those with ReLU activation, the number of distinct linear regions is a natural measure of expressivity. It is possible to construct networks with merely a single region, or for which the number of linear regions grows exponentially with depth; it is not clear where within this range most networks fall in practice, either before or after training. In this paper, we provide a mathematical framework to count the number of linear regions of a piecewise linear network and measure the volume of the boundaries between these regions. In particular, we prove that for networks at initialization, the average number of regions along any one-dimensional subspace grows linearly in the total number of neurons, far below the exponential upper bound. We also find that the average distance to the nearest region boundary at initialization scales like the inverse of the number of neurons. Our theory suggests that, even after training, the number of linear regions is far below exponential, an intuition that matches our empirical observations. We conclude that the practical expressivity of neural networks is likely far below that of the theoretical maximum, and that this gap can be quantified.

read more

Citations
More filters
Posted Content

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks.

TL;DR: The success of GNNs in extrapolating algorithmic tasks to new data relies on encoding task-specific non-linearities in the architecture or features, and a hypothesis is suggested for which theoretical and empirical evidence is provided.
Journal ArticleDOI

Model complexity of deep learning: a survey

TL;DR: In this article, the authors conduct a systematic overview of the latest studies on model complexity in deep learning and propose several interesting future directions, including model generalization, model optimization, and model selection and design.
Posted Content

Liquid Time-constant Networks

TL;DR: This work introduces a new class of time-continuous recurrent neural network models that construct networks of linear first-order dynamical systems modulated via nonlinear interlinked gates, and demonstrates the approximation capability of Liquid Time-Constant Networks (LTCs) compared to modern RNNs.
Proceedings Article

Gradient Dynamics of Shallow Univariate ReLU Networks

TL;DR: A theoretical and empirical study of the gradient dynamics of overparameterized shallow ReLU networks with one-dimensional input, solving least-squares interpolation shows that learning in the kernel regime yields smooth interpolants, minimizing curvature, and reduces to cubic splines for uniform initializations.
Proceedings ArticleDOI

Interpreting Deep Learning-Based Networking Systems

TL;DR: Metis as discussed by the authors proposes a framework that provides interpretability for two general categories of network problems spanning local and global control, and introduces two different interpretation methods based on decision tree and hypergraph, where it converts DNN policies to interpretable rule-based controllers and highlights critical components based on analysis over hypergraph.
References
More filters
Posted Content

Understanding deep learning requires rethinking generalization

TL;DR: The authors showed that deep neural networks can fit a random labeling of the training data, and that this phenomenon is qualitatively unaffected by explicit regularization, and occurs even if the true images are replaced by completely unstructured random noise.
Proceedings Article

Understanding deep learning requires rethinking generalization.

TL;DR: This article showed that deep neural networks can fit a random labeling of the training data, and that this phenomenon is qualitatively unaffected by explicit regularization, and occurs even if the true images are replaced by completely unstructured random noise.
Proceedings Article

Do Deep Nets Really Need to be Deep

TL;DR: This paper empirically demonstrate that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models.
Posted Content

Do Deep Nets Really Need to be Deep

Lei Jimmy Ba, +1 more
- 21 Dec 2013 - 
TL;DR: This paper showed that shallow feed-forward networks can learn the complex functions previously learned by deep networks and achieve accuracies previously only achievable with deep models, and in some cases the shallow neural nets can learn these deep functions using a total number of parameters similar to the original deep model.
Proceedings Article

A closer look at memorization in deep networks

TL;DR: The analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
Related Papers (5)