GradientBased Learning Applied to Document Recognition
Citations
30,843 citations
Cites background or methods from "GradientBased Learning Applied to D..."
...As shown in (LeCun et al., 1998b), such normalization speeds up convergence, even when the features are not decorrelated....
[...]
...It has been long known (LeCun et al., 1998b; Wiesler & Ney, 2011) that the network training converges faster if its inputs are whitened – i.e., linearly transformed to have zero means and unit variances, and decorrelated....
[...]
...To verify the effects of internal covariate shift on training, and the ability of Batch Normalization to combat it, we considered the problem of predicting the digit class on the MNIST dataset (LeCun et al., 1998a)....
[...]
13,081 citations
Cites background from "GradientBased Learning Applied to D..."
...[26] rekindled broader interest in convolutional neural networks (CNNs) [27, 28] by showing substantially lower error rates on the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [9, 10]....
[...]
...) However, until recently these results were isolated to datasets such as CIFAR [25] and MNIST [28], slowing their adoption by computer vision researchers for use on other tasks and image domains....
[...]
9,457 citations