Open AccessProceedings Article
Deep Neural Networks as Gaussian Processes
Jaehoon Lee,Yasaman Bahri,Roman Novak,Samuel S. Schoenholz,Jeffrey Pennington,Jascha Sohl-Dickstein +5 more
TLDR
The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.Abstract:
It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network.
In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.read more
Citations
More filters
Posted Content
Deep kernel learning for integral measurements
TL;DR: A method is presented that makes this approach feasible for problems where the data consists of line integral measurements of the target function and the performance is illustrated on computed tomography reconstruction examples.
DissertationDOI
Structure in machine learning : graphical models and Monte Carlo methods
TL;DR: Structure in Machine Learning: Graphical Models and Monte Carlo Methods shows how models constructed in this book changed the way that models were constructed and how these models changed over time.
Posted Content
On the effect of the activation function on the distribution of hidden nodes in a deep network
Philip M. Long,Hanie Sedghi +1 more
TL;DR: In this paper, the authors analyzed the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to Gaussian distributions, and the input is in
Journal ArticleDOI
On Neural Network Kernels and the Storage Capacity Problem
TL;DR: In this article , the authors reify the connection between work on the storage capacity problem in wide two-layer treelike neural networks and the rapidly growing body of literature on kernel limits of wide neural networks.
Journal ArticleDOI
Automation of some macromolecular properties using a machine learning approach
Merjem Hoxha,Hiqmet Kamberaj +1 more
TL;DR: A newly developed method to predict macromolecular properties using a swarm artificial neural network (ANN) method as a machine learning approach and introduces an error model controlling the reliability of the prediction confidence interval using a bootstrapping swarm approach.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book
Bayesian learning for neural networks
TL;DR: Bayesian Learning for Neural Networks shows that Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional neural network learning methods.
Journal ArticleDOI
A Unifying View of Sparse Approximate Gaussian Process Regression
TL;DR: A new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression, relies on expressing the effective prior which the methods are using, and highlights the relationship between existing methods.
Journal Article
In Defense of One-Vs-All Classification
Ryan Rifkin,Aldebaro Klautau +1 more
TL;DR: It is argued that a simple "one-vs-all" scheme is as accurate as any other approach, assuming that the underlying binary classifiers are well-tuned regularized classifiers such as support vector machines.
Proceedings Article
Gaussian processes for Big data
TL;DR: In this article, the authors introduce stochastic variational inference for Gaussian process models, which enables the application of Gaussian Process (GP) models to data sets containing millions of data points.