Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Open AccessProceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

- Vol. 1, pp 448-456

TLDR

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

Abstract:

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

Citations

PDF

Open Access

More filters

Posted Content

Learning Structured Sparsity in Deep Neural Networks

Wei Wen, +4 more

- 12 Aug 2016 -

arXiv: Neural and Evolutionary Computing

TL;DR: The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers.

...read moreread less

Posted Content

Fully-Convolutional Siamese Networks for Object Tracking

Luca Bertinetto, +4 more

- 30 Jun 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, a fully-convolutional Siamese network is trained end-to-end on the ILSVRC15 dataset for object detection in video, which achieves state-of-the-art performance.

...read moreread less

Posted Content

A Style-Based Generator Architecture for Generative Adversarial Networks

Tero Karras, +2 more

- 12 Dec 2018 -

arXiv: Neural and Evolutionary Computing

TL;DR: This article proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.

...read moreread less

Posted Content

Convolutional Two-Stream Network Fusion for Video Action Recognition

Christoph Feichtenhofer, +2 more

- 22 Apr 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, a spatial and temporal network can be fused at the last convolution layer without loss of performance, but with a substantial saving in parameters, and furthermore, pooling of abstract convolutional features over spatiotemporal neighbourhoods further boosts performance.

...read moreread less

Posted Content

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

Mehdi Noroozi, +1 more

- 30 Mar 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel unsupervised learning approach to build features suitable for object detection and classification and to facilitate the transfer of features to other tasks, the context-free network (CFN), a siamese-ennead convolutional neural network is introduced.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

Vinod Nair, +1 more

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

Collapse

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Citations

Learning Structured Sparsity in Deep Neural Networks

Fully-Convolutional Siamese Networks for Object Tracking

A Style-Based Generator Architecture for Generative Adversarial Networks

Convolutional Two-Stream Network Fusion for Video Action Recognition

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

References

Gradient-based learning applied to document recognition

Going deeper with convolutions

Dropout: a simple way to prevent neural networks from overfitting

ImageNet Large Scale Visual Recognition Challenge

Rectified Linear Units Improve Restricted Boltzmann Machines

Related Papers (5)

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

Going deeper with convolutions

Very Deep Convolutional Networks for Large-Scale Image Recognition