scispace - formally typeset
Open AccessProceedings Article

Deep Neural Networks as Gaussian Processes

TLDR
The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.
Abstract
It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Regressor Relearning Architecture Adapting to Traffic Trend Changes in NFV Platforms

TL;DR: This paper proposes a traffic prediction framework based on ensemble learning, comprising weak regressors trained by ML models, such as recurrent neural networks, random forest, and elastic net, and an adjustment mechanism for regressors based on forgetting and dynamic ensemble to reduce the prediction errors.
Posted Content

Artificial Neural Network Modeling for Airline Disruption Management.

TL;DR: In this article, the authors used historical data on airline scheduling and operations recovery to develop a system of artificial neural networks (ANNs) which describe a predictive transfer function model (PTFM) for estimating the recovery impact of disruption resolutions at separate phases of flight schedule execution during ADM.
Posted Content

Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks

TL;DR: In this paper, it was shown that the loss of spatial correlation arising from spatial weight sharing disappears in the infinite limit of deep neural networks, and the loss is not a consequence of the infinite-width limit, but rather of choosing an independent weight prior.
Journal ArticleDOI

Fitting Spatial-Temporal Data via a Physics Regularized Multi-Output Grid Gaussian Process: Case Studies of a Bike-Sharing System

TL;DR: In this article , a physics regularized multi-output grid Gaussian Process Model (PRMGGP) is proposed for fast and multioutput fitting of large-scale spatial-temporal processes in transportation systems.
Journal ArticleDOI

On the Explainability of Graph Convolutional Network With GCN Tangent Kernel

Xianchen Zhou, +1 more
- 14 Oct 2022 - 
TL;DR: For GCN with wide hidden feature dimension, the output for semisupervised problem can be described by a simple differential equation and the solution of node classification can be explained directly by the differential equation for a semisuPervised problem.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book

Bayesian learning for neural networks

TL;DR: Bayesian Learning for Neural Networks shows that Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional neural network learning methods.
Journal ArticleDOI

A Unifying View of Sparse Approximate Gaussian Process Regression

TL;DR: A new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression, relies on expressing the effective prior which the methods are using, and highlights the relationship between existing methods.
Journal Article

In Defense of One-Vs-All Classification

TL;DR: It is argued that a simple "one-vs-all" scheme is as accurate as any other approach, assuming that the underlying binary classifiers are well-tuned regularized classifiers such as support vector machines.
Proceedings Article

Gaussian processes for Big data

TL;DR: In this article, the authors introduce stochastic variational inference for Gaussian process models, which enables the application of Gaussian Process (GP) models to data sets containing millions of data points.