Dropout as a Bayesian approximation: representing model uncertainty in deep learning
Citations
797 citations
754 citations
Cites background or methods from "Dropout as a Bayesian approximation..."
...…Bayesian approaches include Laplace approximation (MacKay, 1992), variational inference (Graves, 2011; Blundell et al., 2015), dropout-based variational inference (Gal and Ghahramani, 2016; Kingma et al., 2015), expectation propagation and stochastic gradient MCMC (Welling and Teh, 2011)....
[...]
...…and Gimpel, 2017) • (Temp Scaling) Post-hoc calibration by temperature scaling using a validation set (Guo et al., 2017) • (Dropout) Monte-Carlo Dropout (Gal and Ghahramani, 2016; Srivastava et al., 2015) with rate p • (Ensembles) Ensembles of M networks trained independently on the entire dataset…...
[...]
..., 2017) • (Dropout) Monte-Carlo Dropout (Gal and Ghahramani, 2016; Srivastava et al., 2015) with rate p • (Ensembles) Ensembles of M networks trained independently on the entire dataset using random initialization (Lakshminarayanan et al....
[...]
...• (Vanilla) Maximum softmax probability (Hendrycks and Gimpel, 2017) • (Temp Scaling) Post-hoc calibration by temperature scaling using a validation set (Guo et al., 2017) • (Dropout) Monte-Carlo Dropout (Gal and Ghahramani, 2016; Srivastava et al., 2015) with rate p • (Ensembles) Ensembles of M networks trained independently on the entire dataset using random initialization (Lakshminarayanan et al., 2017) (we set M = 10 in experiments below) • (SVI) Stochastic Variational Bayesian Inference for deep learning (Blundell et al., 2015; Graves, 2011; Louizos and Welling, 2017; 2016; Wen et al., 2018)....
[...]
..., 2015), dropout-based variational inference (Gal and Ghahramani, 2016; Kingma et al., 2015), expectation propagation and stochastic gradient MCMC (Welling and Teh, 2011)....
[...]
730 citations
715 citations
629 citations
References
111,197 citations
"Dropout as a Bayesian approximation..." refers methods in this paper
...Finally, we used mini-batches of size 32 and the Adam optimiser [38]....
[...]
...Finally, we used mini-batches of size 32 and the Adam optimiser (Kingma & Ba, 2014)....
[...]
42,067 citations
33,597 citations
"Dropout as a Bayesian approximation..." refers background or methods in this paper
...Dropout is used in many models in deep learning as a way to avoid over-fitting (Srivastava et al., 2014), and our interpretation suggests that dropout approximately integrates over the models’ weights....
[...]
...Furthermore, our results carry to other variants of dropout as well (such as drop-connect (Wan et al., 2013), multiplicative Gaussian noise (Srivastava et al., 2014), etc.)....
[...]
...In this paper we give a complete theoretical treatment of the link between Gaussian processes and dropout, and develop the tools necessary to represent uncertainty in deep learning....
[...]
23,074 citations
20,769 citations
"Dropout as a Bayesian approximation..." refers background or methods in this paper
...Recent advances in variational inference introduced new techniques into the field such as sampling-based variational inference and stochastic variational inference (Blei et al., 2012; Kingma & Welling, 2013; Rezende et al., 2014; Titsias & LázaroGredilla, 2014; Hoffman et al., 2013)....
[...]
...Recent advances in variational inference introduced new techniques such as sampling-based variational inference and stochastic variational inference [21, 22, 23, 24, 25]....
[...]