Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Citations
123,388 citations
44,703 citations
Cites background or methods from "Batch Normalization: Accelerating D..."
...We do not use dropout [14], following the practice in [16]....
[...]
...Recent evidence [41, 44] reveals that network depth is of crucial importance, and the leading results [41, 44, 13, 16] on the challenging ImageNet dataset [36] all exploit “very deep” [41] models, with a depth of sixteen [41] to thirty [16]....
[...]
...These plain networks are trained with BN [16], which ensures forward propagated signals to have non-zero variances....
[...]
...We adopt batch normalization (BN) [16] right after each convolution and before activation, following [16]....
[...]
...9, and adopt the weight initialization in [13] and BN [16] but with no dropout....
[...]
[...]
38,208 citations
27,821 citations
16,962 citations
References
2,586 citations
2,567 citations
"Batch Normalization: Accelerating D..." refers background or methods in this paper
...The details of ensemble and multicrop inference are similar to (Szegedy et al., 2014)....
[...]
...We refer to the change in the distributions of internal nodes of a deep network, in the course of training, asInternal Covariate Shift....
[...]
...We defineInternal Covariate Shiftas the change in the distribution of network activations due to the change in network parameters during training....
[...]
...The main difference to the network described in (Szegedy et al., 2014) is that the5 × 5 convolutional layers are replaced by two consecutive layers of3 × 3 convolutions with up to128 filters....
[...]
...We applied Batch Normalization to a new variant of the Inception network (Szegedy et al., 2014), trained on the ImageNet classification task (Russakovsky et al., 2014)....
[...]
1,767 citations
"Batch Normalization: Accelerating D..." refers methods in this paper
...Using an ensemble of batchnormalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters....
[...]
519 citations
"Batch Normalization: Accelerating D..." refers methods in this paper
...gresses, which aids the training. 4.2 ImageNet classification We applied Batch Normalization to a new variant of the Inception network Szegedy et al. (2014), trained on the ImageNet classification task Russakovsky et al. (2014). The network has a large number of convolutional and pooling layers, with a softmax layer to predict the image class, out of 1000 possibilities. Convolutional layers use ReLU as the nonlinearity. The...
[...]
...This improves upon the previous best result, and exceeds the estimated accuracy of human raters according to (Russakovsky et al., 2014)....
[...]
... test set), which improves upon the previous best result despite using 15X fewer parameters and lower resolution receptive field. Our system exceeds the estimated accuracy of human raters according to Russakovsky et al. (2014). For our ensemble, we used 6 networks. Each was based onBN-x30,modifiedvia someof thefollowing: increased initial weights in the convolutional layers; using Dropout (with the Dropout probability of 5%...
[...]
...We applied Batch Normalization to a new variant of the Inception network (Szegedy et al., 2014), trained on the ImageNet classification task (Russakovsky et al., 2014)....
[...]