scispace - formally typeset
Open AccessPosted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Reads0
Chats0
TLDR
Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Abstract
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.

read more

Citations
More filters
Posted Content

Gauge Equivariant Convolutional Networks and the Icosahedral CNN

TL;DR: Gauge equivariant convolution using a single conv2d call is demonstrated, making it a highly scalable and practical alternative to Spherical CNNs and demonstrating substantial improvements over previous methods on the task of segmenting omnidirectional images and global climate patterns.
Posted Content

Residual Gated Graph ConvNets

Xavier Bresson, +1 more
- 20 Nov 2017 - 
TL;DR: This work reviews existing graph RNN and ConvNet architectures, and proposes natural extension of LSTM and Conv net to graphs with arbitrary size, and designs a set of analytically controlled experiments on two basic graph problems to test the different architectures.
Posted Content

Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimising Global Loss Functions

TL;DR: In this paper, triplet networks are used for local image descriptor learning and a global loss is proposed to minimize the overall classification error in the training set, which can improve the generalization capability of the model.
Proceedings ArticleDOI

Learning to Super-Resolve Blurry Face and Text Images

TL;DR: This work presents an algorithm to directly restore a clear highresolution image from a blurry low-resolution input and introduces novel training losses that help recover fine details.
Posted Content

FireCaffe: near-linear acceleration of deep neural network training on compute clusters

TL;DR: In this paper, the authors present FireCaffe, which scales deep neural network training across a cluster of GPUs by selecting network hardware that achieves high bandwidth between GPU servers and using reduction trees to reduce communication overhead.
References
More filters
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal Article

Dropout: a simple way to prevent neural networks from overfitting

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Proceedings ArticleDOI

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.
Journal ArticleDOI

Independent component analysis: algorithms and applications

TL;DR: The basic theory and applications of ICA are presented, and the goal is to find a linear representation of non-Gaussian data so that the components are statistically independent, or as independent as possible.
Journal Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Related Papers (5)