Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Open AccessProceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

- Vol. 1, pp 448-456

Chats0

TLDR

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

Abstract:

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

Book

Deep Learning

Ian Goodfellow, +2 more

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Proceedings ArticleDOI

Densely Connected Convolutional Networks

Gao Huang, +3 more

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

...read moreread less

Proceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, +4 more

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Posted Content

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, +3 more

- 06 Feb 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.

...read moreread less

Proceedings ArticleDOI

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, +3 more

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.

...read moreread less

Proceedings Article

Understanding the difficulty of training deep feedforward neural networks

Xavier Glorot, +1 more

TL;DR: The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

...read moreread less

Journal ArticleDOI

Independent component analysis: algorithms and applications

Aapo Hyvärinen, +1 more

- 01 May 2000 -

Neural Networks

TL;DR: The basic theory and applications of ICA are presented, and the goal is to find a linear representation of non-Gaussian data so that the components are statistically independent, or as independent as possible.

...read moreread less

Proceedings Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

John C. Duchi, +2 more

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.

...read moreread less

Collapse

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Citations

Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition

Deep Learning

Densely Connected Convolutional Networks

Rethinking the Inception Architecture for Computer Vision

References

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Understanding the difficulty of training deep feedforward neural networks

Independent component analysis: algorithms and applications

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

Related Papers (5)

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

Going deeper with convolutions

Very Deep Convolutional Networks for Large-Scale Image Recognition