Showing papers on "Gaussian process published in 2020"

PDF

Open Access

Journal Article•DOI•

Cautious Model Predictive Control Using Gaussian Process Regression

[...]

Lukas Hewing¹, Juraj Kabzan¹, Melanie N. Zeilinger¹•Institutions (1)

01 Nov 2020-IEEE Transactions on Control Systems and Technology

TL;DR: This work describes a principled way of formulating the chance-constrained MPC problem, which takes into account residual uncertainties provided by the GP model to enable cautious control and presents a model predictive control approach that integrates a nominal system with an additive nonlinear part of the dynamics modeled as a GP.

...read moreread less

Abstract: Gaussian process (GP) regression has been widely used in supervised machine learning due to its flexibility and inherent ability to describe uncertainty in function estimation. In the context of control, it is seeing increasing use for modeling of nonlinear dynamical systems from data, as it allows the direct assessment of residual model uncertainty. We present a model predictive control (MPC) approach that integrates a nominal system with an additive nonlinear part of the dynamics modeled as a GP. We describe a principled way of formulating the chance-constrained MPC problem, which takes into account residual uncertainties provided by the GP model to enable cautious control. Using additional approximations for efficient computation, we finally demonstrate the approach in a simulation example, as well as in a hardware implementation for autonomous racing of remote-controlled race cars with fast sampling times of 20 ms, highlighting improvements with regard to both performance and safety over a nominal controller.

...read moreread less

383 citations

Posted Content•

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

[...]

Andrew Gordon Wilson¹, Pavel Izmailov¹•Institutions (1)

New York University¹

20 Feb 2020-arXiv: Learning

TL;DR: In this article, deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.

...read moreread less

Abstract: The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. We also show that Bayesian model averaging alleviates double descent, resulting in monotonic performance improvements with increased flexibility. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.

...read moreread less

328 citations

Book•

Surrogates: Gaussian Process Modeling, Design, and Optimization for the Applied Sciences

[...]

Robert B. Gramacy¹•Institutions (1)

Virginia Tech¹

08 Jan 2020

194 citations

Proceedings Article•

Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness.

[...]

Jeremiah Liu¹, Zi Lin², Shreyas Padhy¹, Dustin Tran¹, Tania Bedrax Weiss, Balaji Lakshminarayanan¹ - Show less +2 more•Institutions (2)

Google¹, Peking University²

01 Jan 2020

TL;DR: Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process and outperforms the other single-model approaches.

...read moreread less

Abstract: Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.

...read moreread less

175 citations

Journal Article•DOI•

Hilbert space methods for reduced-rank Gaussian process regression

[...]

Arno Solin¹, Simo Särkkä¹•Institutions (1)

Aalto University¹

01 Mar 2020-Statistics and Computing

TL;DR: In this article, an approximate series expansion of the covariance function in terms of an eigenfunction expansion of Laplace operator in a compact subset of the Gaussian process is proposed.

...read moreread less

Abstract: This paper proposes a novel scheme for reduced-rank Gaussian process regression. The method is based on an approximate series expansion of the covariance function in terms of an eigenfunction expansion of the Laplace operator in a compact subset of $$\mathbb {R}^d$$. On this approximate eigenbasis, the eigenvalues of the covariance function can be expressed as simple functions of the spectral density of the Gaussian process, which allows the GP inference to be solved under a computational cost scaling as $$\mathcal {O}(nm^2)$$ (initial) and $$\mathcal {O}(m^3)$$ (hyperparameter learning) with m basis functions and n data points. Furthermore, the basis functions are independent of the parameters of the covariance function, which allows for very fast hyperparameter learning. The approach also allows for rigorous error analysis with Hilbert space theory, and we show that the approximation becomes exact when the size of the compact subset and the number of eigenfunctions go to infinity. We also show that the convergence rate of the truncation error is independent of the input dimensionality provided that the differentiability order of the covariance function increases appropriately, and for the squared exponential covariance function it is always bounded by $${\sim }1/m$$ regardless of the input dimensionality. The expansion generalizes to Hilbert spaces with an inner product which is defined as an integral over a specified input density. The method is compared to previously proposed methods theoretically and through empirical tests with simulated and real data.

...read moreread less

138 citations

Journal Article•DOI•

Dealing with categorical and integer-valued variables in Bayesian Optimization with Gaussian processes

[...]

Eduardo C. Garrido-Merchán¹, Daniel Hernández-Lobato¹•Institutions (1)

Autonomous University of Madrid¹

08 Jan 2020-Neurocomputing

TL;DR: In this article, a probabilistic model of the objective is used to compute an acquisition function that estimates the expected utility (for solving the optimization problem) of evaluating the objective at each potential new point.

...read moreread less

134 citations

Proceedings Article•

Finite Versus Infinite Neural Networks: an Empirical Study

[...]

Jaehoon Lee¹, Samuel S. Schoenholz¹, Jeffrey Pennington¹, Ben Adlam¹, Lechao Xiao¹, Roman Novak¹, Jascha Sohl-Dickstein¹ - Show less +3 more•Institutions (1)

Google¹

31 Jul 2020

TL;DR: Improved best practices for using NNGP and NT kernels for prediction are developed, including a novel ensembling technique that achieves state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class the authors consider.

...read moreread less

Abstract: We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; centered and ensembled finite networks have reduced posterior variance and behave more similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts similarly to early stopping; floating point precision limits kernel performance beyond a critical dataset size; regularized ZCA whitening improves accuracy; finite network performance depends non-monotonically on width in ways not captured by double descent phenomena; equivariance of CNNs is only beneficial for narrow networks far from the kernel regime. Our experiments additionally motivate an improved layer-wise scaling for weight decay which improves generalization in finite-width networks. Finally, we develop improved best practices for using NNGP and NT kernels for prediction, including a novel ensembling technique. Using these best practices we achieve state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class we consider.

...read moreread less

125 citations

Proceedings Article•

Functional Regularisation for Continual Learning with Gaussian Processes

[...]

Michalis K. Titsias¹, Jonathan Schwarz¹, Alexander G. de G. Matthews¹, Razvan Pascanu¹, Yee Whye Teh² - Show less +1 more•Institutions (2)

Google¹, University of Oxford²

30 Apr 2020

TL;DR: This method, referred to as functional regularisation for Continual Learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function.

...read moreread less

Abstract: We introduce a framework for Continual Learning (CL) based on Bayesian inference over the function space rather than the parameters of a deep neural network. This method, referred to as functional regularisation for Continual Learning, avoids forgetting a previous task by constructing and memorising an approximate posterior belief over the underlying task-specific function. To achieve this we rely on a Gaussian process obtained by treating the weights of the last layer of a neural network as random and Gaussian distributed. Then, the training algorithm sequentially encounters tasks and constructs posterior beliefs over the task-specific functions by using inducing point sparse Gaussian process methods. At each step a new task is first learnt and then a summary is constructed consisting of (i) inducing inputs – a fixed-size subset of the task inputs selected such that it optimally represents the task – and (ii) a posterior distribution over the function values at these inputs. This summary then regularises learning of future tasks, through Kullback-Leibler regularisation terms. Our method thus unites approaches focused on (pseudo-)rehearsal with those derived from a sequential Bayesian inference perspective in a principled way, leading to strong results on accepted benchmarks.

...read moreread less

115 citations

Journal Article•DOI•

Deep Gaussian process regression for lithium-ion battery health prognosis and degradation mode diagnosis

[...]

Piyush Tagade¹, Krishnan S. Hariharan¹, Sanoop Ramachandran¹, Ashish Khandelwal¹, Arunava Naha¹, Subramanya Mayya Kolake¹, Seong Ho Han¹ - Show less +3 more•Institutions (1)

Samsung¹

01 Jan 2020-Journal of Power Sources

TL;DR: Deep architecture of the proposed algorithm enables capacity estimation using the partial charge-discharge time-series data, in the form of voltage, temperature and current, eliminating need for input feature extraction.

...read moreread less

108 citations

Journal Article•DOI•

Gaussian Process Regression for numerical wind speed prediction enhancement

[...]

Haoshu Cai¹, Xiaodong Jia¹, Jianshe Feng¹, Wenzhe Li¹, Yuan-Ming Hsu¹, Jay Lee¹ - Show less +2 more•Institutions (1)

University of Cincinnati¹

01 Feb 2020-Renewable Energy

TL;DR: The short-term prediction accuracy after enhancement of the Multi-Task Gaussian Process regression model is found comparable or even better than the cutting-edge statistical models for short- term extrapolations.

...read moreread less

98 citations

Journal Article•DOI•

Feedback Linearization Based on Gaussian Processes With Event-Triggered Online Learning

[...]

Jonas Umlauft¹, Sandra Hirche¹•Institutions (1)

Technische Universität München¹

01 Oct 2020-IEEE Transactions on Automatic Control

TL;DR: A learning feedback linearizing control law using online closed-loop identification that ensures high data efficiency and thereby reduces computational complexity, which is a major barrier for using Gaussian processes under real-time constraints.

...read moreread less

Abstract: Combining control engineering with nonparametric modeling techniques from machine learning allows for the control of systems without analytic description using data-driven models. Most of the existing approaches separate learning , i.e., the system identification based on a fixed dataset, and control , i.e., the execution of the model-based control law. This separation makes the performance highly sensitive to the initial selection of training data and possibly requires very large datasets. This article proposes a learning feedback linearizing control law using online closed-loop identification. The employed Gaussian process model updates its training data only if the model uncertainty becomes too large. This event-triggered online learning ensures high data efficiency and thereby reduces computational complexity, which is a major barrier for using Gaussian processes under real-time constraints. We propose safe forgetting strategies of data points to adhere to budget constraints and to further increase data efficiency. We show asymptotic stability for the tracking error under the proposed event-triggering law and illustrate the effective identification and control in simulation.

...read moreread less

Posted Content•

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

[...]

Blake Bordelon¹, Abdulkadir Canatar¹, Cengiz Pehlevan¹•Institutions (1)

Harvard University¹

07 Feb 2020-arXiv: Learning

TL;DR: In this article, the authors derived analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics.

...read moreread less

Abstract: We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the total generalization error due to different spectral components of the kernel, we identify a new spectral principle: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function. When data are sampled from a uniform distribution on a high-dimensional hypersphere, dot product kernels, including NTK, exhibit learning stages where different frequency modes of the target function are learned. We verify our theory with simulations on synthetic data and MNIST dataset.

...read moreread less

Journal Article•DOI•

Path Loss Prediction based on Machine Learning Techniques: Principal Component Analysis, Artificial Neural Network and Gaussian Process.

[...]

Han-Shin Jo¹, Chanshin Park², Eunhyoung Lee³, Haing Kun Choi, Jaedon Park³ - Show less +1 more•Institutions (3)

Hanbat National University¹, University of Southern California², Agency for Defense Development³

30 Mar 2020-Sensors

TL;DR: It is observed that the proposed combined path loss and shadowing model is more accurate and flexible compared to the conventional linear path loss plus log-normalshadowing model.

...read moreread less

Abstract: Although various linear log-distance path loss models have been developed for wireless sensor networks, advanced models are required to more accurately and flexibly represent the path loss for complex environments. This paper proposes a machine learning framework for modeling path loss using a combination of three key techniques: artificial neural network (ANN)-based multi-dimensional regression, Gaussian process-based variance analysis, and principle component analysis (PCA)-aided feature selection. In general, the measured path loss dataset comprises multiple features such as distance, antenna height, etc. First, PCA is adopted to reduce the number of features of the dataset and simplify the learning model accordingly. ANN then learns the path loss structure from the dataset with reduced dimension, and Gaussian process learns the shadowing effect. Path loss data measured in a suburban area in Korea are employed. We observe that the proposed combined path loss and shadowing model is more accurate and flexible compared to the conventional linear path loss plus log-normal shadowing model.

...read moreread less

Journal Article•DOI•

The Gaussian process distribution of relaxation times: A machine learning tool for the analysis and prediction of electrochemical impedance spectroscopy data

[...]

Jiapeng Liu¹, Francesco Ciucci¹•Institutions (1)

Hong Kong University of Science and Technology¹

20 Jan 2020-Electrochimica Acta

TL;DR: The GP-DRT model is shown to be able to manage considerable noise, overlapping timescales, truncated data, and inductive features, and is tested using synthetic experiments for analyzing the consistency of the method and “real” experiments to gauge its performance for real data.

...read moreread less

Journal Article•DOI•

A Latent Variable Approach to Gaussian Process Modeling with Qualitative and Quantitative Factors

[...]

Yichi Zhang¹, Siyu Tao¹, Wei Chen¹, Daniel W. Apley•Institutions (1)

Northwestern University¹

02 Jul 2020-Technometrics

TL;DR: In this article, a Gaussian Process (GP) method for handling both qualitative and numerical inputs is proposed. But this method assumes a different response surface for each combination of inputs.

...read moreread less

Abstract: Computer simulations often involve both qualitative and numerical inputs. Existing Gaussian process (GP) methods for handling this mainly assume a different response surface for each combination of...

...read moreread less

Journal Article•DOI•

Superstatistics and non-Gaussian diffusion

[...]

Ralf Metzler¹•Institutions (1)

University of Potsdam¹

01 Mar 2020-European Physical Journal-special Topics

TL;DR: A brief introduction to the phenomenon of non-Gaussianity and the stochastic modelling in terms of superstatistical and diffusing-diffusivity approaches is provided.

...read moreread less

Abstract: Brownian motion and viscoelastic anomalous diffusion in homogeneous environments are intrinsically Gaussian processes. In a growing number of systems, however, non-Gaussian displacement distributions of these processes are being reported. The physical cause of the non-Gaussianity is typically seen in different forms of disorder. These include, for instance, imperfect “ensembles” of tracer particles, the presence of local variations of the tracer mobility in heteroegenous environments, or cases in which the speed or persistence of moving nematodes or cells are distributed. From a theoretical point of view stochastic descriptions based on distributed (“superstatistical”) transport coefficients as well as time-dependent generalisations based on stochastic transport parameters with built-in finite correlation time are invoked. After a brief review of the history of Brownian motion and the famed Gaussian displacement distribution, we here provide a brief introduction to the phenomenon of non-Gaussianity and the stochastic modelling in terms of superstatistical and diffusing-diffusivity approaches.

...read moreread less

Journal Article•DOI•

Resolving nearby dust clouds

[...]

Reimar Leike¹, Reimar Leike², M. Glatzle³, M. Glatzle¹, Torsten A. Enßlin¹, Torsten A. Enßlin² - Show less +2 more•Institutions (3)

Max Planck Society¹, Ludwig Maximilian University of Munich², Technische Universität München³

01 Jul 2020-Astronomy and Astrophysics

TL;DR: In this paper, the authors used variational inference and Gaussian processes to model the dust extinction density, exploiting its intrinsic correlations, and reconstructed a highly resolved dust map, showing the nearest dust clouds at a distance of up to 400 pc with a resolution of 1 pc.

...read moreread less

Abstract: Aims. Mapping the interstellar medium in 3D provides a wealth of insights into its inner working. The Milky Way is the only galaxy for which detailed 3D mapping can be achieved in principle. In this paper, we reconstruct the dust density in and around the local super-bubble.Methods. The combined data from surveys such as Gaia , 2MASS, PANSTARRS, and ALLWISE provide the necessary information to make detailed maps of the interstellar medium in our surrounding. To this end, we used variational inference and Gaussian processes to model the dust extinction density, exploiting its intrinsic correlations.Results. We reconstructed a highly resolved dust map, showing the nearest dust clouds at a distance of up to 400 pc with a resolution of 1 pc.Conclusions. Our reconstruction provides insights into the structure of the interstellar medium. We compute summary statistics of the spectral index and the 1-point function of the logarithmic dust extinction density, which may constrain simulations of the interstellar medium that achieve a similar resolution.

...read moreread less

Posted Content•

Efficiently Sampling Functions from Gaussian Process Posteriors

[...]

James Wilson¹, Viacheslav Borovitskiy², Alexander Terenin¹, Peter Mostowsky³, Marc Peter Deisenroth⁴ - Show less +1 more•Institutions (4)

Imperial College London¹, Russian Academy of Sciences², Saint Petersburg State University³, University College London⁴

21 Feb 2020-arXiv: Machine Learning

TL;DR: This work identifies a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data, and proposes an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time.

...read moreread less

Abstract: Gaussian processes are the gold standard for many real-world modeling problems, especially in cases where a model's success hinges upon its ability to faithfully represent predictive uncertainty. These problems typically exist as parts of larger frameworks, wherein quantities of interest are ultimately defined by integrating over posterior distributions. These quantities are frequently intractable, motivating the use of Monte Carlo methods. Despite substantial progress in scaling up Gaussian processes to large training sets, methods for accurately generating draws from their posterior distributions still scale cubically in the number of test locations. We identify a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data. Building off of this factorization, we propose an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time. In a series of experiments designed to test competing sampling schemes' statistical properties and practical ramifications, we demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.

...read moreread less

Journal Article•DOI•

Probabilistic modelling of wind turbine power curves with application of heteroscedastic Gaussian Process regression

[...]

Timothy J. Rogers¹, Paul Gardner¹, Nikolaos Dervilis¹, Keith Worden¹, A.E. Maguire², Evangelos Papatheou³, Elizabeth J. Cross¹ - Show less +3 more•Institutions (3)

University of Sheffield¹, Vattenfall², University of Exeter³

01 Apr 2020-Renewable Energy

TL;DR: This work proposes the use of a heteroscedastic Gaussian Process model, which exists within a Bayesian framework which exhibits built-in protection against over-fitting and robustness to noisy measurements, and is shown to be effective on data collected from an operational wind turbine.

...read moreread less

Proceedings Article•

Federated Bayesian Optimization via Thompson Sampling

[...]

Zhongxiang Dai¹, Bryan Kian Hsiang Low¹, Patrick Jaillet²•Institutions (2)

National University of Singapore¹, Massachusetts Institute of Technology²

01 Jan 2020

TL;DR: Federated Thompson sampling (FTS) is presented which overcomes a number of key challenges of FBO and FL in a principled way and provides a theoretical convergence guarantee that is robust against heterogeneous agents, which is a major challenge in FL and FBO.

...read moreread less

Abstract: Bayesian optimization (BO) is a prominent approach to optimizing expensive-to-evaluate black-box functions. The massive computational capability of edge devices such as mobile phones, coupled with privacy concerns, has led to a surging interest in federated learning (FL) which focuses on collaborative training of deep neural networks (DNNs) via first-order optimization techniques. However, some common machine learning tasks such as hyperparameter tuning of DNNs lack access to gradients and thus require zeroth-order/black-box optimization. This hints at the possibility of extending BO to the FL setting (FBO) for agents to collaborate in these black-box optimization tasks. This paper presents federated Thompson sampling (FTS) which overcomes a number of key challenges of FBO and FL in a principled way: We (a) use random Fourier features to approximate the Gaussian process surrogate model used in BO, which naturally produces the parameters to be exchanged between agents, (b) design FTS based on Thompson sampling, which significantly reduces the number of parameters to be exchanged, and (c) provide a theoretical convergence guarantee that is robust against heterogeneous agents, which is a major challenge in FL and FBO. We empirically demonstrate the effectiveness of FTS in terms of communication efficiency, computational efficiency, and practical performance.

...read moreread less

Journal Article•DOI•

Hyperspectral Recovery from RGB Images using Gaussian Processes

[...]

Naveed Akhtar¹, Ajmal Mian¹•Institutions (1)

University of Western Australia¹

01 Jan 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, the authors proposed to recover spectral details from RGB images of known spectral quantization by modeling natural spectra under Gaussian Processes and combining them with the RGB images.

...read moreread less

Abstract: We propose to recover spectral details from RGB images of known spectral quantization by modeling natural spectra under Gaussian Processes and combining them with the RGB images. Our technique exploits Process Kernels to model the relative smoothness of reflectance spectra, and encourages non-negativity in the resulting signals for better estimation of the reflectance values. The Gaussian Processes are inferred in sets using clusters of spatio-spectrally correlated hyperspectral training patches. Each set is transformed to match the spectral quantization of the test RGB image. We extract overlapping patches from the RGB image and match them to the hyperspectral training patches by spectrally transforming the latter. The RGB patches are encoded over the transformed Gaussian Processes related to those hyperspectral patches and the resulting image is constructed by combining the codes with the original processes. Our approach infers the desired Gaussian Processes under a fully Bayesian model inspired by Beta-Bernoulli Process, for which we also present the inference procedure. A thorough evaluation using three hyperspectral datasets demonstrates the effective extraction of spectral details from RGB images by the proposed technique.

...read moreread less

Journal Article•DOI•

Packets of Diffusing Particles Exhibit Universal Exponential Tails.

[...]

Eli Barkai¹, Stanislav Burov¹•Institutions (1)

Bar-Ilan University¹

14 Feb 2020-Physical Review Letters

TL;DR: In this paper, it was shown that large deviations of the number of steps of a spreading random walker can lead to exponential decay of the density function of the random walkers, with logarithmic corrections.

...read moreread less

Abstract: Brownian motion is a Gaussian process described by the central limit theorem. However, exponential decays of the positional probability density function $P(X,t)$ of packets of spreading random walkers, were observed in numerous situations that include glasses, live cells, and bacteria suspensions. We show that such exponential behavior is generally valid in a large class of problems of transport in random media. By extending the large deviations approach for a continuous time random walk, we uncover a general universal behavior for the decay of the density. It is found that fluctuations in the number of steps of the random walker, performed at finite time, lead to exponential decay (with logarithmic corrections) of $P(X,t)$. This universal behavior also holds for short times, a fact that makes experimental observations readily achievable.

...read moreread less

Posted Content•

Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

[...]

Jeremiah Zhe Liu¹, Zi Lin¹, Shreyas Padhy¹, Dustin Tran¹, Tania Bedrax-Weiss¹, Balaji Lakshminarayanan¹ - Show less +2 more•Institutions (1)

Google¹

17 Jun 2020-arXiv: Learning

TL;DR: In this article, the authors propose Spectral Normalized Neural Gaussian Process (SNGP) to improve the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process.

...read moreread less

Journal Article•DOI•

Vecchia Approximations of Gaussian-Process Predictions

[...]

Matthias Katzfuss¹, Joseph Guinness², Wenlong Gong³, Wenlong Gong¹, Daniel Zilber¹ - Show less +1 more•Institutions (3)

Texas A&M University¹, Cornell University², North Carolina State University³

01 Sep 2020-Journal of Agricultural Biological and Environmental Statistics

TL;DR: A general Vecchia framework for GP predictions is considered, which contains some novel and some existing special cases, and it is shown that certain choices within the framework can have a strong effect on uncertainty quantification and computational cost, which leads to specific recommendations on which methods are most suitable for various settings.

...read moreread less

Abstract: Gaussian processes (GPs) are highly flexible function estimators used for geospatial analysis, nonparametric regression, and machine learning, but they are computationally infeasible for large datasets Vecchia approximations of GPs have been used to enable fast evaluation of the likelihood for parameter inference Here, we study Vecchia approximations of spatial predictions at observed and unobserved locations, including obtaining joint predictive distributions at large sets of locations We consider a general Vecchia framework for GP predictions, which contains some novel and some existing special cases We study the accuracy and computational properties of these approaches theoretically and numerically, proving that our new methods exhibit linear computational complexity in the total number of spatial locations We show that certain choices within the framework can have a strong effect on uncertainty quantification and computational cost, which leads to specific recommendations on which methods are most suitable for various settings We also apply our methods to a satellite dataset of chlorophyll fluorescence, showing that the new methods are faster or more accurate than existing methods and reduce unrealistic artifacts in prediction maps Supplementary materials accompanying this paper appear on-line

...read moreread less

Journal Article•DOI•

Gaussian process models for mitigation of operational variability in the structural health monitoring of wind turbines

[...]

Luis David Avendaño-Valencia¹, Eleni Chatzi², Dmitri Tcherniak³•Institutions (3)

University of Southern Denmark¹, ETH Zurich², Brüel & Kjær³

01 Aug 2020-Mechanical Systems and Signal Processing

TL;DR: It is demonstrated that GP time-series modelling succeeds in evaluating and isolating the influence of different EOPs in the features of the vibration response of the wind turbine blade, and at the same time, normalize their effects to enhance the detectability of damage.

...read moreread less

Journal Article•DOI•

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

[...]

Jaehoon Lee¹, Lechao Xiao¹, Samuel S. Schoenholz¹, Yasaman Bahri¹, Roman Novak¹, Jascha Sohl-Dickstein¹, Jeffrey Pennington¹ - Show less +3 more•Institutions (1)

Google¹

21 Dec 2020

TL;DR: This work shows that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

...read moreread less

Abstract: A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.

...read moreread less

Posted Content•

An Intuitive Tutorial to Gaussian Processes Regression.

[...]

Jie Wang¹•Institutions (1)

Queen's University¹

22 Sep 2020-arXiv: Machine Learning

TL;DR: The paper starts with explaining mathematical basics that Gaussian processes built on including multivariate normal distribution, kernels, non-parametric models, joint and conditional probability, then describes theGaussian processes regression in an accessible way by balancing showing unnecessary mathematical derivation steps and missing key conclusive results.

...read moreread less

Abstract: This tutorial aims to provide an intuitive understanding of the Gaussian processes regression. Gaussian processes regression (GPR) models have been widely used in machine learning applications because of their representation flexibility and inherently uncertainty measures over predictions. The basic concepts that a Gaussian process is built on, including multivariate normal distribution, kernels, non-parametric models, joint and conditional probability were explained first. Next, the GPR was described concisely together with an implementation of a standard GPR algorithm. Beyond the standard GPR, packages to implement state-of-the-art Gaussian processes algorithms were reviewed. This tutorial was written in an accessible way to make sure readers without a machine learning background can obtain a good understanding of the GPR basics.

...read moreread less

Posted Content•

Improving predictions of Bayesian neural nets via local linearization.

[...]

Alexander Immer¹, Maciej Jan Korzepa², Matthias Bauer³•Institutions (3)

ETH Zurich¹, Technical University of Denmark², MODUL University Vienna³

19 Aug 2020-arXiv: Machine Learning

TL;DR: It is argued that in Bayesian deep learning, the frequently utilized generalized Gauss-Newton (GGN) approximation should be understood as a modification of the underlying probabilistic model and should be considered separately from further approximate inference techniques.

...read moreread less

Abstract: The generalized Gauss-Newton (GGN) approximation is often used to make practical Bayesian deep learning approaches scalable by replacing a second order derivative with a product of first order derivatives In this paper we argue that the GGN approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN), which turns the BNN into a generalized linear model (GLM) Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation It extends previous results in this vein to general likelihoods and has an equivalent Gaussian process formulation, which enables alternative inference schemes for BNNs in function space We demonstrate the effectiveness of our approach on several standard classification datasets as well as on out-of-distribution detection We provide an implementation at this https URL

...read moreread less

Journal Article•DOI•

Displacement Model for Concrete Dam Safety Monitoring via Gaussian Process Regression Considering Extreme Air Temperature

[...]

Fei Kang¹, Junjie Li¹•Institutions (1)

Dalian University of Technology¹

01 Jan 2020-Journal of Structural Engineering-asce

TL;DR: This work has shown that the main challenge in developing an accurate dam behavior prediction model lies in the uncertainty in the model’s predictions of how dam behavior changes over time will affect dam performance.

...read moreread less

Abstract: Structural health monitoring models provide important information for safety control of large dams. The main challenge in developing an accurate dam behavior prediction model lies in the mo...

...read moreread less

Journal Article•DOI•

Active emulation of computer codes with Gaussian processes - Application to remote sensing

[...]

Daniel Heestermans Svendsen¹, Luca Martino², Gustau Camps-Valls¹•Institutions (2)

University of Valencia¹, King Juan Carlos University²

01 Apr 2020-Pattern Recognition

TL;DR: An active learning methodology for adaptively constructing surrogate models, i.e. emulators, of such costly computer codes in a multi-output setting based on the optimization of a suitable acquisition function for accurate approximations, model tractability, as well as compact and expressive simulated datasets is introduced.

...read moreread less

Collapse