scispace - formally typeset
Open AccessPosted Content

Revealing the Structure of Deep Neural Networks via Convex Duality

Reads0
Chats0
TLDR
It is shown that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set and it is proved that each optimal weight matrix is rank-$K$ and aligns with the previous layers via duality.
Abstract
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks, we prove that each optimal weight matrix aligns with the previous layers via duality. More importantly, we apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds. As a corollary, we also prove that norm regularized deep ReLU networks yield spline interpolation for one-dimensional datasets which was previously known only for two-layer networks. Furthermore, we provide closed-form solutions for the optimal layer weights when data is rank-one or whitened. The same analysis also applies to architectures with batch normalization even for arbitrary data. Therefore, we obtain a complete explanation for a recent empirical observation termed Neural Collapse where class means collapse to the vertices of a simplex equiangular tight frame.

read more

Citations
More filters
Posted Content

Banach Space Representer Theorems for Neural Networks and Ridge Splines

TL;DR: A variational framework to understand the properties of the functions learned by neural networks fit to data and a representer theorem showing that finite-width, single-hidden layer neural networks are solutions to inverse problems with total variation-like regularization is derived.
Proceedings ArticleDOI

On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

TL;DR: In this paper , the authors provide the first global landscape analysis for vanilla nonconvex MSE loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions.
Posted Content

Convex Geometry and Duality of Over-parameterized Neural Networks

TL;DR: A convex analytic framework for ReLU neural networks is developed which elucidates the inner workings of hidden neurons and their function space characteristics and establishes a connection to $\ell_0$-$\ell_1$ equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing.
Proceedings Article

Extended Unconstrained Features Model for Exploring Deep Neural Collapse

Tom Tirer, +1 more
TL;DR: This paper studies the UFM for the regularized MSE loss, and shows that the minimizers’ features can have a more delicate structure than in the cross-entropy case, and extends the model by adding another layer of weights as well as ReLU nonlinearity to the model and generalize previous results.
Journal Article

Limitations of Neural Collapse for Understanding Generalization in Deep Learning

TL;DR: This paper investigated the role of neural collapse in feature learning and found that neural collapse is primarily an optimization phenomenon, with as yet-unclear connections to generalization, and showed that deep learning feature loss occurs not only for the last layer but in earlier layers as well.
References
More filters
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Posted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Posted Content

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

TL;DR: In this paper, the authors show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.
Proceedings Article

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

TL;DR: The authors showed that gradient descent converges at a global linear rate to the global optimum for two-layer fully connected ReLU activated neural networks, where over-parameterization and random initialization jointly restrict weight vector to be close to its initialization for all iterations.
Proceedings Article

A Closer Look at Few-shot Classification

TL;DR: In this paper, a consistent comparative analysis of several representative few-shot classification algorithms is presented, with results showing that deeper backbones significantly reduce the gap across methods when domain differences are limited.
Related Papers (5)
Trending Questions (1)
How to decide the neural network structures for a deep learning model?

The structure of deep neural networks can be determined by characterizing hidden layer weights through convex duality, aligning weights with previous layers, and utilizing norm regularization.