Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization.

Open AccessPosted Content

Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization.

- 02 Mar 2021 -

TLDR

In this article, an analytic framework based on convex duality is introduced to obtain exact convex representations of weight-decay regularized ReLU networks with BN, which can be trained in polynomial-time.

Abstract:

Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training of deep neural networks. Despite its empirical success, a full theoretical understanding of BN is yet to be developed. In this work, we analyze BN through the lens of convex optimization. We introduce an analytic framework based on convex duality to obtain exact convex representations of weight-decay regularized ReLU networks with BN, which can be trained in polynomial-time. Our analyses also show that optimal layer weights can be obtained as simple closed-form formulas in the high-dimensional and/or overparameterized regimes. Furthermore, we find that Gradient Descent provides an algorithmic bias effect on the standard non-convex BN network, and we design an approach to explicitly encode this implicit regularization into the convex objective. Experiments with CIFAR image classification highlight the effectiveness of this explicit regularization for mimicking and substantially improving the performance of standard BN networks.

Citations

PDF

Open Access

More filters

Posted Content

Revealing the Structure of Deep Neural Networks via Convex Duality

Tolga Ergen, +1 more

- 22 Feb 2020 -

arXiv: Learning

TL;DR: It is shown that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set and it is proved that each optimal weight matrix is rank-$K$ and aligns with the previous layers via duality.

...read moreread less

Posted Content

Convex Geometry and Duality of Over-parameterized Neural Networks

Tolga Ergen, +1 more

- 25 Feb 2020 -

arXiv: Learning

TL;DR: A convex analytic framework for ReLU neural networks is developed which elucidates the inner workings of hidden neurons and their function space characteristics and establishes a connection to $\ell_0$-$\ell_1$ equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing.

...read moreread less

Posted Content

Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time

Tolga Ergen, +1 more

- 26 Jun 2020 -

arXiv: Learning

TL;DR: A convex analytic framework utilizing semi-infinite duality is developed to obtain equivalent convex optimization problems for several two- and three-layer CNN architectures, and it is proved that two-layerCNNs can be globally optimized via an $\ell_2$ norm regularized convex program.

...read moreread less

Posted Content

Scaled ReLU Matters for Training Vision Transformers

Pichao Wang, +7 more

- 08 Sep 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, a scaled ReLU operation in the convolutional stem of a vision transformer was shown to not only improve training stabilization, but also increase the diversity of patch tokens, thus boosting peak performance.

...read moreread less

Journal ArticleDOI

Scaled ReLU Matters for Training Vision Transformers

- 28 Jun 2022 -

Proceedings of the ... AAAI Conference o...

TL;DR: In this paper , a scaled ReLU operation in the convolutional stem (conv-stem) was shown to not only improve training stabilization, but also increase the diversity of patch tokens.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Proceedings ArticleDOI

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, +3 more

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.

...read moreread less

Collapse

Related Papers (5)

Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks

Mert Pilanci, +1 more

Breaking the curse of dimensionality with convex neural networks

Francis Bach

- 01 Jan 2017 -

Journal of Machine Learning Research

arXiv: Learning

Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization.

Citations

Revealing the Structure of Deep Neural Networks via Convex Duality

Convex Geometry and Duality of Over-parameterized Neural Networks

Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time

Scaled ReLU Matters for Training Vision Transformers

Scaled ReLU Matters for Training Vision Transformers

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Related Papers (5)

Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks

Breaking the curse of dimensionality with convex neural networks

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

l 1 regularization in infinite dimensional feature spaces

Convex Geometry and Duality of Over-parameterized Neural Networks