scispace - formally typeset
Open AccessPosted Content

Gaussian Process Behaviour in Wide Deep Neural Networks

Reads0
Chats0
TLDR
In this paper, the authors study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition and show that, under broad conditions, as they make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process.
Abstract
Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. We show that, under broad conditions, as we make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks. To evaluate convergence rates empirically, we use maximum mean discrepancy. We then compare finite Bayesian deep networks from the literature to Gaussian processes in terms of the key predictive quantities of interest, finding that in some cases the agreement can be very close. We discuss the desirability of Gaussian process behaviour and review non-Gaussian alternative models from the literature.

read more

Citations
More filters
Proceedings Article

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

TL;DR: This talk will introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.
Proceedings Article

Deep Neural Networks as Gaussian Processes

TL;DR: The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.
Journal ArticleDOI

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

TL;DR: In this article, the authors show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.
Proceedings Article

Gradient descent finds global minima of deep neural networks

TL;DR: This paper showed that gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet) and further extended their analysis to deep residual convolutional neural networks and obtained a similar convergence result.
Journal ArticleDOI

A mobile robotic chemist

TL;DR: A mobile robot autonomously operates analytical instruments in a wet chemistry laboratory, performing a photocatalyst optimization task much faster than a human would be able to.
References
More filters
Book

Bayesian learning for neural networks

TL;DR: Bayesian Learning for Neural Networks shows that Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional neural network learning methods.
Journal ArticleDOI

A kernel two-sample test

TL;DR: This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD).
Book ChapterDOI

Convergence of probability measures

TL;DR: Weakconvergence methods in metric spaces were studied in this article, with applications sufficient to show their power and utility, and the results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables.
BookDOI

MCMC using Hamiltonian dynamics

Radford M. Neal
- 09 Jun 2012 - 
TL;DR: In this paper, the authors discuss theoretical and practical aspects of Hamiltonian Monte Carlo, and present some of its variations, including using windows of states for deciding on acceptance or rejection, computing trajectories using fast approximations, tempering during the course of a trajectory to handle isolated modes, and short-cut methods that prevent useless trajectories from taking much computation time.
Journal ArticleDOI

Probability and Measure.

Related Papers (5)