scispace - formally typeset
Search or ask a question
Topic

Gaussian process

About: Gaussian process is a research topic. Over the lifetime, 18944 publications have been published within this topic receiving 486645 citations. The topic is also known as: Gaussian stochastic process.


Papers
More filters
Journal ArticleDOI
TL;DR: A new iterative methodology is developed that estimates a non-Gaussian PSDF that is compatible with the prescribed non- Gaussian PDF, and closely approximates the prescribed incompatible non- Gaia PSDF.

125 citations

Posted Content
TL;DR: Functional variational Bayesian neural networks (fBNNs), which maximize an Evidence Lower BOund defined directly on stochastic processes, are introduced and it is proved that the KL divergence between stoChastic processes equals the supremum of marginal KL divergences over all finite sets of inputs.
Abstract: Variational Bayesian neural networks (BNNs) perform variational inference over weights, but it is difficult to specify meaningful priors and approximate posteriors in a high-dimensional weight space. We introduce functional variational Bayesian neural networks (fBNNs), which maximize an Evidence Lower BOund (ELBO) defined directly on stochastic processes, i.e. distributions over functions. We prove that the KL divergence between stochastic processes equals the supremum of marginal KL divergences over all finite sets of inputs. Based on this, we introduce a practical training objective which approximates the functional ELBO using finite measurement sets and the spectral Stein gradient estimator. With fBNNs, we can specify priors entailing rich structures, including Gaussian processes and implicit stochastic processes. Empirically, we find fBNNs extrapolate well using various structured priors, provide reliable uncertainty estimates, and scale to large datasets.

125 citations

Posted Content
TL;DR: In this article, the authors derive the exact equivalence between infinitely wide deep networks and Gaussian Processes (GP) and develop a computationally efficient pipeline to compute the covariance function for these GPs.
Abstract: It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.

125 citations

Posted Content
TL;DR: The Variational Gaussian Process (VGP) as discussed by the authors generates approximate posterior samples by generating latent inputs and warping them through random nonlinear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity.
Abstract: Variational inference is a powerful tool for approximate inference, and it has been recently applied for representation learning with deep generative models. We develop the variational Gaussian process (VGP), a Bayesian nonparametric variational family, which adapts its shape to match complex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity. We prove a universal approximation theorem for the VGP, demonstrating its representative power for learning any model. For inference we present a variational objective inspired by auto-encoders and perform black box inference over a wide class of models. The VGP achieves new state-of-the-art results for unsupervised learning, inferring models such as the deep latent Gaussian model and the recently proposed DRAW.

125 citations

Proceedings Article
31 Jul 2020
TL;DR: Improved best practices for using NNGP and NT kernels for prediction are developed, including a novel ensembling technique that achieves state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class the authors consider.
Abstract: We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; centered and ensembled finite networks have reduced posterior variance and behave more similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts similarly to early stopping; floating point precision limits kernel performance beyond a critical dataset size; regularized ZCA whitening improves accuracy; finite network performance depends non-monotonically on width in ways not captured by double descent phenomena; equivariance of CNNs is only beneficial for narrow networks far from the kernel regime. Our experiments additionally motivate an improved layer-wise scaling for weight decay which improves generalization in finite-width networks. Finally, we develop improved best practices for using NNGP and NT kernels for prediction, including a novel ensembling technique. Using these best practices we achieve state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class we consider.

125 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
87% related
Optimization problem
96.4K papers, 2.1M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023502
20221,181
20211,132
20201,220
20191,119
2018978