Gaussian Process Behaviour in Wide Deep Neural Networks

Open AccessPosted Content

Gaussian Process Behaviour in Wide Deep Neural Networks

Alexander G. de G. Matthews, +4 more

- 30 Apr 2018 -

arXiv: Machine Learning

Chats0

TLDR

In this paper, the authors study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition and show that, under broad conditions, as they make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process.

Abstract:

Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. We show that, under broad conditions, as we make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks. To evaluate convergence rates empirically, we use maximum mean discrepancy. We then compare finite Bayesian deep networks from the literature to Gaussian processes in terms of the key predictive quantities of interest, finding that in some cases the agreement can be very close. We discuss the desirability of Gaussian process behaviour and review non-Gaussian alternative models from the literature.

Citations

PDF

Open Access

More filters

Proceedings Article

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Arthur Jacot, +2 more

TL;DR: This talk will introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.

...read moreread less

Proceedings Article

Deep Neural Networks as Gaussian Processes

Jaehoon Lee, +5 more

TL;DR: The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

...read moreread less

Journal ArticleDOI

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Jaehoon Lee, +6 more

- 18 Feb 2019 -

arXiv: Machine Learning

TL;DR: In this article, the authors show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

...read moreread less

Proceedings Article

Gradient descent finds global minima of deep neural networks

Simon S. Du, +4 more

TL;DR: This paper showed that gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet) and further extended their analysis to deep residual convolutional neural networks and obtained a similar convergence result.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Bayesian learning for neural networks

Geoffrey E. Hinton, +1 more

TL;DR: Bayesian Learning for Neural Networks shows that Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional neural network learning methods.

...read moreread less

Journal ArticleDOI

A kernel two-sample test

Arthur Gretton, +4 more

- 01 Mar 2012 -

Journal of Machine Learning Research

TL;DR: This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD).

...read moreread less

Book ChapterDOI

Convergence of probability measures

Richard F. Bass

TL;DR: Weakconvergence methods in metric spaces were studied in this article, with applications sufficient to show their power and utility, and the results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables.

...read moreread less

BookDOI

MCMC using Hamiltonian dynamics

Radford M. Neal

- 09 Jun 2012 -

arXiv: Computation

TL;DR: In this paper, the authors discuss theoretical and practical aspects of Hamiltonian Monte Carlo, and present some of its variations, including using windows of states for deciding on acceptance or rejection, computing trajectories using fast approximations, tempering during the course of a trajectory to handle isolated modes, and short-cut methods that prevent useless trajectories from taking much computation time.

...read moreread less

Journal ArticleDOI