Open AccessPosted Content
Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization.
TLDR
In this article, an analytic framework based on convex duality is introduced to obtain exact convex representations of weight-decay regularized ReLU networks with BN, which can be trained in polynomial-time.Abstract:
Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training of deep neural networks. Despite its empirical success, a full theoretical understanding of BN is yet to be developed. In this work, we analyze BN through the lens of convex optimization. We introduce an analytic framework based on convex duality to obtain exact convex representations of weight-decay regularized ReLU networks with BN, which can be trained in polynomial-time. Our analyses also show that optimal layer weights can be obtained as simple closed-form formulas in the high-dimensional and/or overparameterized regimes. Furthermore, we find that Gradient Descent provides an algorithmic bias effect on the standard non-convex BN network, and we design an approach to explicitly encode this implicit regularization into the convex objective. Experiments with CIFAR image classification highlight the effectiveness of this explicit regularization for mimicking and substantially improving the performance of standard BN networks.read more
Citations
More filters
Posted Content
Revealing the Structure of Deep Neural Networks via Convex Duality
Tolga Ergen,Mert Pilanci +1 more
TL;DR: It is shown that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set and it is proved that each optimal weight matrix is rank-$K$ and aligns with the previous layers via duality.
Posted Content
Convex Geometry and Duality of Over-parameterized Neural Networks
Tolga Ergen,Mert Pilanci +1 more
TL;DR: A convex analytic framework for ReLU neural networks is developed which elucidates the inner workings of hidden neurons and their function space characteristics and establishes a connection to $\ell_0$-$\ell_1$ equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing.
Posted Content
Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time
Tolga Ergen,Mert Pilanci +1 more
TL;DR: A convex analytic framework utilizing semi-infinite duality is developed to obtain equivalent convex optimization problems for several two- and three-layer CNN architectures, and it is proved that two-layerCNNs can be globally optimized via an $\ell_2$ norm regularized convex program.
Posted Content
Scaled ReLU Matters for Training Vision Transformers
TL;DR: In this article, a scaled ReLU operation in the convolutional stem of a vision transformer was shown to not only improve training stabilization, but also increase the diversity of patch tokens, thus boosting peak performance.
Journal ArticleDOI
Scaled ReLU Matters for Training Vision Transformers
TL;DR: In this paper , a scaled ReLU operation in the convolutional stem (conv-stem) was shown to not only improve training stabilization, but also increase the diversity of patch tokens.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Proceedings ArticleDOI
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.
Related Papers (5)
Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks
Mert Pilanci,Tolga Ergen +1 more