scispace - formally typeset
Search or ask a question

Showing papers on "Gaussian process published in 2020"


Journal ArticleDOI
TL;DR: This work describes a principled way of formulating the chance-constrained MPC problem, which takes into account residual uncertainties provided by the GP model to enable cautious control and presents a model predictive control approach that integrates a nominal system with an additive nonlinear part of the dynamics modeled as a GP.
Abstract: Gaussian process (GP) regression has been widely used in supervised machine learning due to its flexibility and inherent ability to describe uncertainty in function estimation. In the context of control, it is seeing increasing use for modeling of nonlinear dynamical systems from data, as it allows the direct assessment of residual model uncertainty. We present a model predictive control (MPC) approach that integrates a nominal system with an additive nonlinear part of the dynamics modeled as a GP. We describe a principled way of formulating the chance-constrained MPC problem, which takes into account residual uncertainties provided by the GP model to enable cautious control. Using additional approximations for efficient computation, we finally demonstrate the approach in a simulation example, as well as in a hardware implementation for autonomous racing of remote-controlled race cars with fast sampling times of 20 ms, highlighting improvements with regard to both performance and safety over a nominal controller.

383 citations


Posted Content
TL;DR: In this article, deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.
Abstract: The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. We also show that Bayesian model averaging alleviates double descent, resulting in monotonic performance improvements with increased flexibility. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.

328 citations



Proceedings Article
01 Jan 2020
TL;DR: Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process and outperforms the other single-model approaches.
Abstract: Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.

175 citations


Journal ArticleDOI
TL;DR: In this article, an approximate series expansion of the covariance function in terms of an eigenfunction expansion of Laplace operator in a compact subset of the Gaussian process is proposed.
Abstract: This paper proposes a novel scheme for reduced-rank Gaussian process regression. The method is based on an approximate series expansion of the covariance function in terms of an eigenfunction expansion of the Laplace operator in a compact subset of $$\mathbb {R}^d$$. On this approximate eigenbasis, the eigenvalues of the covariance function can be expressed as simple functions of the spectral density of the Gaussian process, which allows the GP inference to be solved under a computational cost scaling as $$\mathcal {O}(nm^2)$$ (initial) and $$\mathcal {O}(m^3)$$ (hyperparameter learning) with m basis functions and n data points. Furthermore, the basis functions are independent of the parameters of the covariance function, which allows for very fast hyperparameter learning. The approach also allows for rigorous error analysis with Hilbert space theory, and we show that the approximation becomes exact when the size of the compact subset and the number of eigenfunctions go to infinity. We also show that the convergence rate of the truncation error is independent of the input dimensionality provided that the differentiability order of the covariance function increases appropriately, and for the squared exponential covariance function it is always bounded by $${\sim }1/m$$ regardless of the input dimensionality. The expansion generalizes to Hilbert spaces with an inner product which is defined as an integral over a specified input density. The method is compared to previously proposed methods theoretically and through empirical tests with simulated and real data.

138 citations


Journal ArticleDOI
TL;DR: In this article, a probabilistic model of the objective is used to compute an acquisition function that estimates the expected utility (for solving the optimization problem) of evaluating the objective at each potential new point.

134 citations


Proceedings Article
31 Jul 2020
TL;DR: Improved best practices for using NNGP and NT kernels for prediction are developed, including a novel ensembling technique that achieves state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class the authors consider.
Abstract: We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; centered and ensembled finite networks have reduced posterior variance and behave more similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts similarly to early stopping; floating point precision limits kernel performance beyond a critical dataset size; regularized ZCA whitening improves accuracy; finite network performance depends non-monotonically on width in ways not captured by double descent phenomena; equivariance of CNNs is only beneficial for narrow networks far from the kernel regime. Our experiments additionally motivate an improved layer-wise scaling for weight decay which improves generalization in finite-width networks. Finally, we develop improved best practices for using NNGP and NT kernels for prediction, including a novel ensembling technique. Using these best practices we achieve state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class we consider.

125 citations


Proceedings Article
30 Apr 2020
TL;DR: This method, referred to as functional regularisation for Continual Learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function.
Abstract: We introduce a framework for Continual Learning (CL) based on Bayesian inference over the function space rather than the parameters of a deep neural network. This method, referred to as functional regularisation for Continual Learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function. To achieve this we rely on a Gaussian process obtained by treating the weights of the last layer of a neural network as random and Gaussian distributed. Then, the training algorithm sequentially encounters tasks and constructs posterior beliefs over the task-specific functions by using inducing point sparse Gaussian process methods. At each step a new task is first learnt and then a summary is constructed consisting of (i) inducing inputs – a fixed-size subset of the task inputs selected such that it optimally represents the task – and (ii) a posterior distribution over the function values at these inputs. This summary then regularises learning of future tasks, through Kullback-Leibler regularisation terms. Our method thus unites approaches focused on (pseudo-)rehearsal with those derived from a sequential Bayesian inference perspective in a principled way, leading to strong results on accepted benchmarks.

115 citations


Journal ArticleDOI
TL;DR: Deep architecture of the proposed algorithm enables capacity estimation using the partial charge-discharge time-series data, in the form of voltage, temperature and current, eliminating need for input feature extraction.

108 citations


Journal ArticleDOI
Haoshu Cai1, Xiaodong Jia1, Jianshe Feng1, Wenzhe Li1, Yuan-Ming Hsu1, Jay Lee1 
TL;DR: The short-term prediction accuracy after enhancement of the Multi-Task Gaussian Process regression model is found comparable or even better than the cutting-edge statistical models for short- term extrapolations.

98 citations


Journal ArticleDOI
TL;DR: A learning feedback linearizing control law using online closed-loop identification that ensures high data efficiency and thereby reduces computational complexity, which is a major barrier for using Gaussian processes under real-time constraints.
Abstract: Combining control engineering with nonparametric modeling techniques from machine learning allows for the control of systems without analytic description using data-driven models. Most of the existing approaches separate learning , i.e., the system identification based on a fixed dataset, and control , i.e., the execution of the model-based control law. This separation makes the performance highly sensitive to the initial selection of training data and possibly requires very large datasets. This article proposes a learning feedback linearizing control law using online closed-loop identification. The employed Gaussian process model updates its training data only if the model uncertainty becomes too large. This event-triggered online learning ensures high data efficiency and thereby reduces computational complexity, which is a major barrier for using Gaussian processes under real-time constraints. We propose safe forgetting strategies of data points to adhere to budget constraints and to further increase data efficiency. We show asymptotic stability for the tracking error under the proposed event-triggering law and illustrate the effective identification and control in simulation.

Posted Content
TL;DR: In this article, the authors derived analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics.
Abstract: We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the total generalization error due to different spectral components of the kernel, we identify a new spectral principle: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function. When data are sampled from a uniform distribution on a high-dimensional hypersphere, dot product kernels, including NTK, exhibit learning stages where different frequency modes of the target function are learned. We verify our theory with simulations on synthetic data and MNIST dataset.

Journal ArticleDOI
30 Mar 2020-Sensors
TL;DR: It is observed that the proposed combined path loss and shadowing model is more accurate and flexible compared to the conventional linear path loss plus log-normalshadowing model.
Abstract: Although various linear log-distance path loss models have been developed for wireless sensor networks, advanced models are required to more accurately and flexibly represent the path loss for complex environments. This paper proposes a machine learning framework for modeling path loss using a combination of three key techniques: artificial neural network (ANN)-based multi-dimensional regression, Gaussian process-based variance analysis, and principle component analysis (PCA)-aided feature selection. In general, the measured path loss dataset comprises multiple features such as distance, antenna height, etc. First, PCA is adopted to reduce the number of features of the dataset and simplify the learning model accordingly. ANN then learns the path loss structure from the dataset with reduced dimension, and Gaussian process learns the shadowing effect. Path loss data measured in a suburban area in Korea are employed. We observe that the proposed combined path loss and shadowing model is more accurate and flexible compared to the conventional linear path loss plus log-normal shadowing model.

Journal ArticleDOI
TL;DR: The GP-DRT model is shown to be able to manage considerable noise, overlapping timescales, truncated data, and inductive features, and is tested using synthetic experiments for analyzing the consistency of the method and “real” experiments to gauge its performance for real data.

Journal ArticleDOI
TL;DR: In this article, a Gaussian Process (GP) method for handling both qualitative and numerical inputs is proposed. But this method assumes a different response surface for each combination of inputs.
Abstract: Computer simulations often involve both qualitative and numerical inputs. Existing Gaussian process (GP) methods for handling this mainly assume a different response surface for each combination of...

Journal ArticleDOI
TL;DR: A brief introduction to the phenomenon of non-Gaussianity and the stochastic modelling in terms of superstatistical and diffusing-diffusivity approaches is provided.
Abstract: Brownian motion and viscoelastic anomalous diffusion in homogeneous environments are intrinsically Gaussian processes. In a growing number of systems, however, non-Gaussian displacement distributions of these processes are being reported. The physical cause of the non-Gaussianity is typically seen in different forms of disorder. These include, for instance, imperfect “ensembles” of tracer particles, the presence of local variations of the tracer mobility in heteroegenous environments, or cases in which the speed or persistence of moving nematodes or cells are distributed. From a theoretical point of view stochastic descriptions based on distributed (“superstatistical”) transport coefficients as well as time-dependent generalisations based on stochastic transport parameters with built-in finite correlation time are invoked. After a brief review of the history of Brownian motion and the famed Gaussian displacement distribution, we here provide a brief introduction to the phenomenon of non-Gaussianity and the stochastic modelling in terms of superstatistical and diffusing-diffusivity approaches.

Journal ArticleDOI
TL;DR: In this paper, the authors used variational inference and Gaussian processes to model the dust extinction density, exploiting its intrinsic correlations, and reconstructed a highly resolved dust map, showing the nearest dust clouds at a distance of up to 400 pc with a resolution of 1 pc.
Abstract: Aims. Mapping the interstellar medium in 3D provides a wealth of insights into its inner working. The Milky Way is the only galaxy for which detailed 3D mapping can be achieved in principle. In this paper, we reconstruct the dust density in and around the local super-bubble.Methods. The combined data from surveys such as Gaia , 2MASS, PANSTARRS, and ALLWISE provide the necessary information to make detailed maps of the interstellar medium in our surrounding. To this end, we used variational inference and Gaussian processes to model the dust extinction density, exploiting its intrinsic correlations.Results. We reconstructed a highly resolved dust map, showing the nearest dust clouds at a distance of up to 400 pc with a resolution of 1 pc.Conclusions. Our reconstruction provides insights into the structure of the interstellar medium. We compute summary statistics of the spectral index and the 1-point function of the logarithmic dust extinction density, which may constrain simulations of the interstellar medium that achieve a similar resolution.

Posted Content
TL;DR: This work identifies a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data, and proposes an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time.
Abstract: Gaussian processes are the gold standard for many real-world modeling problems, especially in cases where a model's success hinges upon its ability to faithfully represent predictive uncertainty. These problems typically exist as parts of larger frameworks, wherein quantities of interest are ultimately defined by integrating over posterior distributions. These quantities are frequently intractable, motivating the use of Monte Carlo methods. Despite substantial progress in scaling up Gaussian processes to large training sets, methods for accurately generating draws from their posterior distributions still scale cubically in the number of test locations. We identify a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data. Building off of this factorization, we propose an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time. In a series of experiments designed to test competing sampling schemes' statistical properties and practical ramifications, we demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.

Journal ArticleDOI
TL;DR: This work proposes the use of a heteroscedastic Gaussian Process model, which exists within a Bayesian framework which exhibits built-in protection against over-fitting and robustness to noisy measurements, and is shown to be effective on data collected from an operational wind turbine.

Proceedings Article
01 Jan 2020
TL;DR: Federated Thompson sampling (FTS) is presented which overcomes a number of key challenges of FBO and FL in a principled way and provides a theoretical convergence guarantee that is robust against heterogeneous agents, which is a major challenge in FL and FBO.
Abstract: Bayesian optimization (BO) is a prominent approach to optimizing expensive-to-evaluate black-box functions. The massive computational capability of edge devices such as mobile phones, coupled with privacy concerns, has led to a surging interest in federated learning (FL) which focuses on collaborative training of deep neural networks (DNNs) via first-order optimization techniques. However, some common machine learning tasks such as hyperparameter tuning of DNNs lack access to gradients and thus require zeroth-order/black-box optimization. This hints at the possibility of extending BO to the FL setting (FBO) for agents to collaborate in these black-box optimization tasks. This paper presents federated Thompson sampling (FTS) which overcomes a number of key challenges of FBO and FL in a principled way: We (a) use random Fourier features to approximate the Gaussian process surrogate model used in BO, which naturally produces the parameters to be exchanged between agents, (b) design FTS based on Thompson sampling, which significantly reduces the number of parameters to be exchanged, and (c) provide a theoretical convergence guarantee that is robust against heterogeneous agents, which is a major challenge in FL and FBO. We empirically demonstrate the effectiveness of FTS in terms of communication efficiency, computational efficiency, and practical performance.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed to recover spectral details from RGB images of known spectral quantization by modeling natural spectra under Gaussian Processes and combining them with the RGB images.
Abstract: We propose to recover spectral details from RGB images of known spectral quantization by modeling natural spectra under Gaussian Processes and combining them with the RGB images. Our technique exploits Process Kernels to model the relative smoothness of reflectance spectra, and encourages non-negativity in the resulting signals for better estimation of the reflectance values. The Gaussian Processes are inferred in sets using clusters of spatio-spectrally correlated hyperspectral training patches. Each set is transformed to match the spectral quantization of the test RGB image. We extract overlapping patches from the RGB image and match them to the hyperspectral training patches by spectrally transforming the latter. The RGB patches are encoded over the transformed Gaussian Processes related to those hyperspectral patches and the resulting image is constructed by combining the codes with the original processes. Our approach infers the desired Gaussian Processes under a fully Bayesian model inspired by Beta-Bernoulli Process, for which we also present the inference procedure. A thorough evaluation using three hyperspectral datasets demonstrates the effective extraction of spectral details from RGB images by the proposed technique.

Journal ArticleDOI
TL;DR: In this paper, it was shown that large deviations of the number of steps of a spreading random walker can lead to exponential decay of the density function of the random walkers, with logarithmic corrections.
Abstract: Brownian motion is a Gaussian process described by the central limit theorem. However, exponential decays of the positional probability density function $P(X,t)$ of packets of spreading random walkers, were observed in numerous situations that include glasses, live cells, and bacteria suspensions. We show that such exponential behavior is generally valid in a large class of problems of transport in random media. By extending the large deviations approach for a continuous time random walk, we uncover a general universal behavior for the decay of the density. It is found that fluctuations in the number of steps of the random walker, performed at finite time, lead to exponential decay (with logarithmic corrections) of $P(X,t)$. This universal behavior also holds for short times, a fact that makes experimental observations readily achievable.

Posted Content
TL;DR: In this article, the authors propose Spectral Normalized Neural Gaussian Process (SNGP) to improve the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process.
Abstract: Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.

Journal ArticleDOI
TL;DR: A general Vecchia framework for GP predictions is considered, which contains some novel and some existing special cases, and it is shown that certain choices within the framework can have a strong effect on uncertainty quantification and computational cost, which leads to specific recommendations on which methods are most suitable for various settings.
Abstract: Gaussian processes (GPs) are highly flexible function estimators used for geospatial analysis, nonparametric regression, and machine learning, but they are computationally infeasible for large datasets Vecchia approximations of GPs have been used to enable fast evaluation of the likelihood for parameter inference Here, we study Vecchia approximations of spatial predictions at observed and unobserved locations, including obtaining joint predictive distributions at large sets of locations We consider a general Vecchia framework for GP predictions, which contains some novel and some existing special cases We study the accuracy and computational properties of these approaches theoretically and numerically, proving that our new methods exhibit linear computational complexity in the total number of spatial locations We show that certain choices within the framework can have a strong effect on uncertainty quantification and computational cost, which leads to specific recommendations on which methods are most suitable for various settings We also apply our methods to a satellite dataset of chlorophyll fluorescence, showing that the new methods are faster or more accurate than existing methods and reduce unrealistic artifacts in prediction maps Supplementary materials accompanying this paper appear on-line

Journal ArticleDOI
TL;DR: It is demonstrated that GP time-series modelling succeeds in evaluating and isolating the influence of different EOPs in the features of the vibration response of the wind turbine blade, and at the same time, normalize their effects to enhance the detectability of damage.

Journal ArticleDOI
21 Dec 2020
TL;DR: This work shows that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.
Abstract: A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.

Posted Content
Jie Wang1
TL;DR: The paper starts with explaining mathematical basics that Gaussian processes built on including multivariate normal distribution, kernels, non-parametric models, joint and conditional probability, then describes theGaussian processes regression in an accessible way by balancing showing unnecessary mathematical derivation steps and missing key conclusive results.
Abstract: This tutorial aims to provide an intuitive understanding of the Gaussian processes regression. Gaussian processes regression (GPR) models have been widely used in machine learning applications because of their representation flexibility and inherently uncertainty measures over predictions. The basic concepts that a Gaussian process is built on, including multivariate normal distribution, kernels, non-parametric models, joint and conditional probability were explained first. Next, the GPR was described concisely together with an implementation of a standard GPR algorithm. Beyond the standard GPR, packages to implement state-of-the-art Gaussian processes algorithms were reviewed. This tutorial was written in an accessible way to make sure readers without a machine learning background can obtain a good understanding of the GPR basics.

Posted Content
TL;DR: It is argued that in Bayesian deep learning, the frequently utilized generalized Gauss-Newton (GGN) approximation should be understood as a modification of the underlying probabilistic model and should be considered separately from further approximate inference techniques.
Abstract: The generalized Gauss-Newton (GGN) approximation is often used to make practical Bayesian deep learning approaches scalable by replacing a second order derivative with a product of first order derivatives In this paper we argue that the GGN approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN), which turns the BNN into a generalized linear model (GLM) Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation It extends previous results in this vein to general likelihoods and has an equivalent Gaussian process formulation, which enables alternative inference schemes for BNNs in function space We demonstrate the effectiveness of our approach on several standard classification datasets as well as on out-of-distribution detection We provide an implementation at this https URL

Journal ArticleDOI
TL;DR: This work has shown that the main challenge in developing an accurate dam behavior prediction model lies in the uncertainty in the model’s predictions of how dam behavior changes over time will affect dam performance.
Abstract: Structural health monitoring models provide important information for safety control of large dams. The main challenge in developing an accurate dam behavior prediction model lies in the mo...

Journal ArticleDOI
TL;DR: An active learning methodology for adaptively constructing surrogate models, i.e. emulators, of such costly computer codes in a multi-output setting based on the optimization of a suitable acquisition function for accurate approximations, model tractability, as well as compact and expressive simulated datasets is introduced.