Practical Variational Inference for Neural Networks
Citations
20,769 citations
14,635 citations
Cites background from "Practical Variational Inference for..."
...MDL-based stochastic variationalmethods (Graves, 2011) are also related to FMS....
[...]
...Compare Graves and Jaitly (2014), Graves and Schmidhuber (2005), Graves et al. (2009), Graves et al. (2013) and Schmidhuber, Ciresan, Meier, Masci, and Graves (2011) (Section 5.22)....
[...]
9,478 citations
Cites methods from "Practical Variational Inference for..."
...We train each model with RMSProp [see, e.g., Hinton, 2012] and use weight noise with standard deviation fixed to 0.075 [Graves, 2011]....
[...]
...6 Table 2: The average negative log-probabilities of the training and test sets. We train each model with RMSProp [see, e.g., Hinton, 2012] and use weight noise with standard deviation fixed to 0:075 [Graves, 2011]. At every update, we rescale the norm of the gradient to 1, if it is larger than 1 [Pascanu et al., 2013] to prevent exploding gradients. We select a learning rate (scalar multiplier in RMSProp) to ...
[...]
7,316 citations
Cites background from "Practical Variational Inference for..."
...tends to ‘simplify’ neural networks, in the sense of reducing the amount of information required to transmit the parameters [23, 24], which improves generalisation....
[...]
5,310 citations
References
72,897 citations
"Practical Variational Inference for..." refers background in this paper
...Hierarchical multidimensional recurrent neural networks containing Long Short-Term Memory [11] hidden layers and a CTC output layer [8] have proven effective for offline handwriting recognition [9]....
[...]
65,425 citations
23,814 citations
"Practical Variational Inference for..." refers methods in this paper
...We assume that the partial derivatives of L (w,D) with respect to the network weights can be efficiently calculated (using, for example, backpropagation or backpropagation through time [22])....
[...]
6,254 citations
"Practical Variational Inference for..." refers methods in this paper
...Variational inference can be reformulated as the optimisation of a Minimum Description length (MDL; [21]) loss function; indeed it was in this form that variational inference was first considered for neural networks....
[...]
5,188 citations
"Practical Variational Inference for..." refers background or methods in this paper
...Prefix search CTC decoding [8] was used to transcribe the test set, with probability threshold 0....
[...]
...Hierarchical multidimensional recurrent neural networks containing Long Short-Term Memory [11] hidden layers and a CTC output layer [8] have proven effective for offline handwriting recognition [9]....
[...]