Deep Neural Networks as Gaussian Processes
Citations
1,787 citations
1,504 citations
809 citations
Cites background from "Deep Neural Networks as Gaussian Pr..."
...[360] (GP), Ravi and Beatson [361] (AVI: Amortized VI), Lu et al....
[...]
738 citations
626 citations
References
111,197 citations
"Deep Neural Networks as Gaussian Pr..." refers methods in this paper
...Training used the Adam optimizer (Kingma & Ba (2014)) with learning rate and initial weight/bias variances optimized over validation error using the Vizier hyperparameter tuner (Golovin et al., 2017)....
[...]
3,846 citations
"Deep Neural Networks as Gaussian Pr..." refers background or methods in this paper
...They are exactly the relations derived in the mean field theory of signal propagation in fully-connected random neural networks (Poole et al. (2016); Schoenholz et al. (2017)) and also appear in the literature on compositional kernels (Cho & Saul (2009); Daniely et al....
[...]
...They are exactly the relations derived in the mean field theory of signal propagation in fully-connected random neural networks (Poole et al. (2016); Schoenholz et al....
[...]
...In the case of single hidden-layer networks, the form of the kernel of this GP is well known (Neal (1994a); Williams (1997))....
[...]
...In fact, a correspondence due to Neal (1994a) equates these two models in the limit of infinite width....
[...]
...…the parameters have zero mean, we have that µ1(x) = E [ z1i (x) ] = 0 and, K1(x, x′) ≡ E [ z1i (x)z 1 i (x ′) ] = σ2b + σ 2 w E [ x1i (x)x 1 i (x ′) ] ≡ σ2b + σ2wC(x, x′), (2) where we have introduced C(x, x′) as in Neal (1994a); it is obtained by integrating against the distribution of W 0, b0....
[...]
1,881 citations
1,841 citations
"Deep Neural Networks as Gaussian Pr..." refers background or methods in this paper
...Future work may involve evaluating the NNGP on a cross entropy loss using the approach in (Williams & Barber, 1998; Rasmussen & Williams, 2006). Training used the Adam optimizer (Kingma & Ba (2014)) with learning rate and initial weight/bias variances optimized over validation error using the Vizier hyperparameter tuner (Golovin et al....
[...]
...Formulating classification as regression often leads to good results (Rifkin & Klautau, 2004)....
[...]
898 citations