Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Citations
123,388 citations
44,703 citations
Cites background or methods from "Batch Normalization: Accelerating D..."
...We do not use dropout [14], following the practice in [16]....
[...]
...Recent evidence [41, 44] reveals that network depth is of crucial importance, and the leading results [41, 44, 13, 16] on the challenging ImageNet dataset [36] all exploit “very deep” [41] models, with a depth of sixteen [41] to thirty [16]....
[...]
...These plain networks are trained with BN [16], which ensures forward propagated signals to have non-zero variances....
[...]
...We adopt batch normalization (BN) [16] right after each convolution and before activation, following [16]....
[...]
...9, and adopt the weight initialization in [13] and BN [16] but with no dropout....
[...]
[...]
38,208 citations
27,821 citations
16,962 citations
References
453 citations
363 citations
Additional excerpts
...The current reported best results on the ImageNet Large Scale Visual Recognition Competition are reached by the Deep Image ensemble of traditional models (Wu et al., 2015) and the ensemble model of (He et al., 2015)....
[...]
214 citations
169 citations
134 citations