scispace - formally typeset
Open AccessProceedings Article

Towards Understanding Regularization in Batch Normalization

TLDR
In this paper, a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function, is used to understand the impacts of batch normalization in training neural networks.
Abstract
Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization. Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate. Third, generalization of BN is explored by using statistical mechanics. Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Quantifying Generalization in Reinforcement Learning

TL;DR: This paper investigated the problem of overfitting in deep reinforcement learning by using procedurally generated environments to construct distinct training and test sets, and found that agents overfit to surprisingly large training sets.
Proceedings Article

Differentiable Learning-to-Normalize via Switchable Normalization

TL;DR: Switchable Normalization (SN), which learns to select different normalizers for different normalization layers of a deep neural network, is proposed, which will help ease the usage and understand the normalization techniques in deep learning.
Journal ArticleDOI

A systematic review on overfitting control in shallow and deep neural networks

TL;DR: A systematic review of the overfit controlling methods and categorizes them into passive, active, and semi-active subsets, which includes the theoretical and experimental backgrounds of these methods, their strengths and weaknesses, and the emerging techniques for overfitting detection.
Posted Content

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

TL;DR: The results highlight the under-appreciated role of the affine parameters in BatchNorm, but - in a broader sense - they characterize the expressive power of neural networks constructed simply by shifting and rescaling random features.
Posted Content

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Soham De, +1 more
- 24 Feb 2020 - 
TL;DR: This work develops a simple initialization scheme that can train deep residual networks without normalization, and provides a detailed empirical study of residual networks, which clarifies that, although batch normalized networks can be trained with larger learning rates, this effect is only beneficial in specific compute regimes, and has minimal benefits when the batch size is small.