Towards Understanding Regularization in Batch Normalization

Open AccessProceedings Article

Towards Understanding Regularization in Batch Normalization

TLDR

In this paper, a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function, is used to understand the impacts of batch normalization in training neural networks.

Abstract:

Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization. Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate. Third, generalization of BN is explored by using statistical mechanics. Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses.

Citations

PDF

Open Access

More filters

Posted Content

Quantifying Generalization in Reinforcement Learning

Karl Cobbe, +4 more

- 06 Dec 2018 -

arXiv: Learning

TL;DR: This paper investigated the problem of overfitting in deep reinforcement learning by using procedurally generated environments to construct distinct training and test sets, and found that agents overfit to surprisingly large training sets.

...read moreread less

Proceedings Article

Differentiable Learning-to-Normalize via Switchable Normalization

Ping Luo, +4 more

TL;DR: Switchable Normalization (SN), which learns to select different normalizers for different normalization layers of a deep neural network, is proposed, which will help ease the usage and understand the normalization techniques in deep learning.

...read moreread less

Journal ArticleDOI

A systematic review on overfitting control in shallow and deep neural networks

Mohammad Mahdi Bejani, +1 more

- 03 Mar 2021 -

Artificial Intelligence Review

TL;DR: A systematic review of the overfit controlling methods and categorizes them into passive, active, and semi-active subsets, which includes the theoretical and experimental backgrounds of these methods, their strengths and weaknesses, and the emerging techniques for overfitting detection.

...read moreread less

Posted Content

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

Jonathan Frankle, +2 more

- 29 Feb 2020 -

arXiv: Learning

TL;DR: The results highlight the under-appreciated role of the affine parameters in BatchNorm, but - in a broader sense - they characterize the expressive power of neural networks constructed simply by shifting and rescaling random features.

...read moreread less

Posted Content

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Soham De, +1 more

- 24 Feb 2020 -

arXiv: Learning

TL;DR: This work develops a simple initialization scheme that can train deep residual networks without normalization, and provides a detailed empirical study of residual networks, which clarifies that, although batch normalized networks can be trained with larger learning rates, this effect is only beneficial in specific compute regimes, and has minimal benefits when the batch size is small.

...read moreread less

Collapse

Towards Understanding Regularization in Batch Normalization

Citations

Quantifying Generalization in Reinforcement Learning

Differentiable Learning-to-Normalize via Switchable Normalization

A systematic review on overfitting control in shallow and deep neural networks

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Related Papers (5)

Deep Residual Learning for Image Recognition

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Learning Multiple Layers of Features from Tiny Images

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Densely Connected Convolutional Networks