Revealing the Structure of Deep Neural Networks via Convex Duality

Open AccessPosted Content

Revealing the Structure of Deep Neural Networks via Convex Duality

Tolga Ergen, +1 more

- 22 Feb 2020 -

arXiv: Learning

Chats0

TLDR

It is shown that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set and it is proved that each optimal weight matrix is rank-$K$ and aligns with the previous layers via duality.

Abstract:

We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks, we prove that each optimal weight matrix aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we also prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. The same analysis also applies to architectures with batch normalization even for arbitrary data. Therefore, we obtain a complete explanation for a recent empirical observation termed Neural Collapse where class means collapse to the vertices of a simplex equiangular tight frame.

Citations

PDF

Open Access

More filters

Posted Content

Banach Space Representer Theorems for Neural Networks and Ridge Splines

Rahul Parhi, +1 more

- 10 Jun 2020 -

arXiv: Machine Learning

TL;DR: A variational framework to understand the properties of the functions learned by neural networks fit to data and a representer theorem showing that finite-width, single-hidden layer neural networks are solutions to inverse problems with total variation-like regularization is derived.

...read moreread less

Proceedings ArticleDOI

On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

Jinxing Zhou, +5 more

TL;DR: In this paper , the authors provide the first global landscape analysis for vanilla nonconvex MSE loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions.

...read moreread less

Posted Content

Convex Geometry and Duality of Over-parameterized Neural Networks

Tolga Ergen, +1 more

- 25 Feb 2020 -

arXiv: Learning

TL;DR: A convex analytic framework for ReLU neural networks is developed which elucidates the inner workings of hidden neurons and their function space characteristics and establishes a connection to $\ell_0$-$\ell_1$ equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing.

...read moreread less

Proceedings Article

Extended Unconstrained Features Model for Exploring Deep Neural Collapse

Tom Tirer, +1 more

TL;DR: This paper studies the UFM for the regularized MSE loss, and shows that the minimizers’ features can have a more delicate structure than in the cross-entropy case, and extends the model by adding another layer of weights as well as ReLU nonlinearity to the model and generalize previous results.

...read moreread less

Journal Article

Limitations of Neural Collapse for Understanding Generalization in Deep Learning

Like Hui, +2 more

- 17 Feb 2022 -

arXiv.org

TL;DR: This paper investigated the role of neural collapse in feature learning and found that neural collapse is primarily an optimization phenomenon, with as yet-unclear connections to generalization, and showed that deep learning feature loss occurs not only for the last layer but in earlier layers as well.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Posted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

- 11 Feb 2015 -

arXiv: Learning

TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.

...read moreread less

Posted Content

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Andrew M. Saxe, +2 more

- 20 Dec 2013 -

arXiv: Neural and Evolutionary Computing

TL;DR: In this paper, the authors show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.

...read moreread less

Proceedings Article

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

Simon S. Du, +3 more

TL;DR: The authors showed that gradient descent converges at a global linear rate to the global optimum for two-layer fully connected ReLU activated neural networks, where over-parameterization and random initialization jointly restrict weight vector to be close to its initialization for all iterations.

...read moreread less

Proceedings Article

A Closer Look at Few-shot Classification

Wei-Yu Chen, +4 more

TL;DR: In this paper, a consistent comparative analysis of several representative few-shot classification algorithms is presented, with results showing that deeper backbones significantly reduce the gap across methods when domain differences are limited.

...read moreread less